Article
27 Oct
2023

Man, Monkey or Martial Artist - An introduction to using azure AI to create a custom vision model (part 1)

We begin a 3 part series where we’ll explore how to use low code development platforms together to create an image classifier that can determine if an image contains a man, a woman, or a monkey wearing a business suit, a Kung Fu uniform or casual wear (e.g., jeans & sneakers).
Anthony Allen
|
25
min read
man-monkey-or-martial-artist-an-introduction-to-using-azure-ai-to-create-a-custom-vision-mode

In a previous post, we discussed the Microsoft Power Platform and Low Code development. But how powerful are these low code solutions? When we combine Power Platform and Azure Cognitive services, the citizen developer gains seemingly unprecedented AI and ML capabilities.  

Here, we begin a 3-part series where we’ll explore how to use these platforms together to create an image classifier that can determine if an image contains a man, a woman, or a monkey wearing a business suit, a Kung Fu uniform or casual wear (e.g., jeans & sneakers).  For part 1, we’ll discuss some background information, how to perform the initial required setup operations in Azure, and how we’ve built the model used for our classifier. 

In part 2, we’ll show you step-by-step how to integrate your Custom Vision model into a mobile app you build from scratch using Microsoft Power apps low-code platform.  In part 3, we’ll discuss in detail how to refine your model and improve prediction accuracy.  

First, let’s review the technology we’re going to use to accomplish this.

The technologies used

Azure Cognitive Services

Azure Cognitive Services is a suite of cloud-based artificial intelligence (AI) and machine learning services provided by Microsoft. These services are designed to enable developers to easily integrate various AI capabilities into their applications without the need for extensive expertise in AI or machine learning. Azure Cognitive Services cover a wide range of AI functionalities, including vision, speech, language, knowledge, and search.

Azure AI Vision Service

Within the Microsoft Azure ecosystem, the Azure AI Vision service offers ready-made models for typical computer vision activities. These tasks encompass providing descriptive captions and tags for images, identifying and classifying commonplace objects, recognizing landmarks, celebrities, brands, and detecting adult content. Additionally, Azure AI Vision can be employed for evaluating image characteristics like color and formats, as well as creating intelligently cropped thumbnail images.

Midjourney

Midjourney represents an innovative generative artificial intelligence program and service, developed by the independent research lab, Midjourney, Inc., headquartered in San Francisco, California. This cutting-edge technology harnesses the power of natural language descriptions, called "prompts," to translate text into images. This transformative process unfolds seamlessly through user interaction with the AI via a dedicated bot integrated into the popular chat platform, Discord. By issuing commands with varying descriptive complexity, users are able to use language to create intricate visual landscapes.  The bot returns four unique artistic interpretations based on the supplied text.  The user can either upscale each image for export (U1-U4) or the user can have the bot generate another set of interpretations based on one of the previously generated images (V1-V4).

Since its unveiling in open beta on July 12, 2022, individuals from many different fields have been tapping into the capabilities of Midjourney to manifest their creative visions into visual expressions, exploring new possibilities and functionality within the realm of digital creativity.

Midjourney

But wait… What is the difference between object recognition and image classification?

Before we go any further, it is important to note the distinction between object recognition and image classification.  

Object recognition encompasses a broad category of computer vision tasks aimed at identifying and locating specific objects within digital images. It focuses on recognizing the presence of particular objects and providing information about their positions or bounding boxes (rectangular borders that surround the object). Common use cases include object tracking, counting objects, autonomous navigation, and augmented reality. Image classification, on the other hand, is used when you want to categorize entire images into predefined classes or labels. Image classification aims to determine what the image represents as a whole, without necessarily specifying the positions of individual objects. Object recognition can be more complex than image classification because it requires identifying multiple objects within an image, often with varying positions, sizes, and orientations.

Image classification is generally considered a relatively simpler task since it focuses on determining the most dominant or representative class for the entire image. Image classification is employed in applications such as image search, recommendation systems, content tagging, and classifying images for medical diagnosis. Which AI Vision task you choose depends on the specific requirements of the application and the nature of the visual data being analyzed.  

For this blog post, we are interested in image classification using Azure Custom Vision.

Azure Custom Vision

Azure Custom Vision is a cloud-based machine learning service provided by Microsoft Azure. It is designed to enable users to easily build, train, and deploy custom image classification models without requiring deep expertise in machine learning or computer vision. Custom Vision models can be accessed through APIs, SDKs, or a dedicated website.

However, you don’t have to be a data scientist or code-first developer to use these vision models in simple everyday applications. You have the capability to access your image classifier model on your mobile or tablet device via a quickly designed application built in Power Apps, a Microsoft Power Platform low-code application.  Integrating Azure Custom Vision into Power Apps typically involves using Power Apps' capabilities to call Azure Custom Vision's REST API. This allows you to leverage your custom image classification or object detection models created with Azure Custom Vision within your own set of Power Apps built specifically for your business. 

What is an image classifier?

An image classifier is a piece of software capable of identifying entities within an image.  An Azure Custom Vision image classifier is a machine learning model created and trained using Azure Custom Vision, a cloud-based service offered by Microsoft Azure. 

This custom image classifier is designed to categorize or classify images into specific predefined categories or classes. You can train custom machine learning models to recognize specific objects, patterns, or attributes unique to your domain. Azure AI Vision can classify images into predefined categories such as “people” or “animals”. Azure Custom Vision image classifiers are applicable to a wide range of real-world applications and are useful for tasks like content moderation, where you want to filter out inappropriate or sensitive content from user-generated images.

Building a Custom Vision Model

Using Azure AI Vision for image classification involves several steps, including setting up an Azure account, creating a custom vision model, training the model, and integrating it into your application. Here's a step-by-step guide on how to use Azure AI Vision for image classification:

1. If you don't already have an Azure account, you'll need to create one. Azure offers a free trial with some credits to get started. If you don't have one, you can sign up for a free trial or a paid subscription at https://azure.microsoft.com/free/.

Microsoft Azure Free Account

2. An Azure Custom Vision resource is part of Microsoft's Azure cloud computing platform and is specifically designed for building custom machine learning models for image classification and object detection tasks. You will need to create this resource for your model to use during training and for after deployment. Log in to your Azure portal: https://portal.azure.com/. Click on "Create a resource". Search for "Custom Vision".

Building a Custom Vision Model - Create a resource and search for "custom vision"
Create a resource and then search for "custom vision"

3. Select "Custom Vision" from the list of services, then click "Create" to start creating your Custom Vision resource.

4. You’ll then receive a form to configure your vision resource.  Fill in the necessary details, including the resource group, name, region, and pricing tier for the training and prediction resources.  Settings used for the application created for this blog post appear below.

5. Choose the pricing tier that suits your needs. You can start with the free tier for small-scale projects.  After you have filled in the required fields, you can create the resource by clicking the blue “Review + create” button at the bottom of the page.  Once your resource has been created and deployed, navigate to it in the Azure portal. 

6. Now that your custom vision resource has been created, view your resource’s Keys and Endpoint page.  You will need this information later on when you connect to it via Power Apps.

Building a Custom Vision Model - Resource keys and endpoint page
Resource keys and endpoint page

Create a custom vision project

Now that you have created a resource, you can create the Custom Vision project:

1. In a separate tab, open customvision.ai and sign in with the Microsoft account associated with your Azure subscription.

Custom Vision homepage

2. Create a new Custom Vision project within the Azure Custom Vision portal.

New Project

3. Specify the name, resource group, and other project details. For our purposes, we used the General (A2) domain. This will create an empty model.

Create New Project

Gathering training data

Now that your empty model has been created, you need to gather a dataset of images with the categories, classes, or labels of interest.

For example, if you want to classify images of animals, you might have classes like cats, dogs, and birds.  Another common class used when learning to use image classifiers is fruit (i.e., apples, oranges, bananas).  

These items have relatively simple and distinctive characteristics that make them relatively easy to train a model to identify and therefore make them a natural choice when creating your first Custom Vision model.  However, we wanted to make this image classification experiment a little more fun and interesting.  

Therefore, we decided to train the model to identify if the central character in the image was a man, woman, or monkey wearing a business suit, a Kung Fu uniform, or Casual Wear (i.e., jeans and sneakers).  There are websites that have a lot of data sets available for Machine Learning projects. 

However, given the low likelihood of finding many royalty-free photos of actual monkeys in Business suits or Casual Wear, we choose to use the MIdjourney via a Discord bot to generate over 2,000 photorealistic images containing our different classes of interest.  This allowed us to quickly gather a wide variety of training images in which we have some degree of control over the content and style.  For more information on Midjourney licensing, image ownership, and usage restrictions please see their terms of service document.

Enhancing the accuracy of the custom vision model

There are four key factors that must be taken into account when training a classification model to improve prediction accuracy: overfitting, data balance, data quality, and data variety.

Overfitting 

The presence of contextual information can help or hinder classification, depending on the classifier's ability to focus on relevant objects. If certain contextual items appear frequently and consistently across different images, the classifier may focus on arbitrary characteristics more than the item you are trying to classify.

For example, since monkeys and people in business suits don’t usually inhabit the same biomes, we wanted to ensure that the classifier wasn’t using arbitrary environmental characteristics that the images had in common. Images containing a tree or other vegetation could be more strongly associated with monkeys whereas desks and books could be more strongly associated with business people.  

If during training, the model used these arbitrary characteristics then it might wrongly predict that a character sitting on a chair in the office must be wearing a business suit or if there is any vegetation in the background or if the character is jumping around, it must be a monkey.  Here are some examples:

Data balance

 Imbalanced class distributions, where some classes have significantly more examples than others, can affect the classifier's ability to generalize to minority classes. In other words, we do not want to have an image sample size of 200 men but only 50 women and 75 monkeys because the model would become much better at identifying men than identifying women and monkeys. It is recommended that any one class shouldn’t have more than a 2:1 ratio to another class (more details available here).

Data quantity

Microsoft recommends using at least 50 images per tag when starting to train your model. Having fewer images increases the likelihood of overfitting where the classifier picks an arbitrary yet common element across images (e.g., trees in the background as opposed to the monkey in the foreground). Although your performance metrics may initially appear promising, your model might encounter challenges when faced with real-world data. We started with approximately 200 images for each two-way class interaction of (Man / Woman / Monkey) by (Business suit, Kung Fu uniform / Casual Wear).  

After a second round of quality checks, these are the remaining counts:

Class Interaction

Data variety

You will need to include a variety of images in the training set to ensure that your model can generalize well. The accuracy of an image classifier is influenced by various characteristics of the images being classified. Some of the key characteristics that can affect the performance of an image classifier include:

  • Image Quality: Higher resolution images often contain more detail and can be easier to classify. On the other hand, distorted or grainy images may be more challenging to classify accurately. The presence of irrelevant objects, text, noise, or other artifacts in the image can confuse the classifier.
  • Lighting Conditions: Reflections and shadows can affect the appearance of objects in the image and thus impact the classifier's ability to detect objects and features. Similarly, variations in color due to lighting conditions can also affect classification accuracy. Images with poor color balance may require additional preprocessing to enhance accuracy, while overexposed or underexposed images can make it difficult for the classifier to discern relevant details.
  • Object Pose and Orientation: Images of the same object taken from different angles or orientations may result in a more robust classifier. Uniform object size or consistent scaling across images can aid classification. The use of processing techniques prior to training, such as rotation, scaling, and flipping, can help improve the model's robustness to variations in image characteristics.
  • Occlusion: Objects partially obscured by other objects or elements in the image can be challenging to classify accurately. Cluttered or busy backgrounds can also distract from the main objects of interest.  Furthermore, The number of objects in an image and their relative proximity can influence classification accuracy.
  • Image Variability: A model trained with a diverse set of images is more likely to perform well in real-world scenarios where the input data can vary significantly. Images in the wild may have different lighting conditions, backgrounds, angles, resolutions, etc. Training on a variety of images helps the model generalize better to handle these variations because the model learns to recognize patterns that are more likely to appear across various instances of the object you're trying to detect or classify.  If you only train a model on a limited set of images that represent a small subset of the possible variations, your model may not perform well on images with characteristics not present in the training data. Image variability helps ensure that your model can handle a wide range of inputs. Having variability within the same class of objects (e.g., different breeds of monkeys, different colored clothing, diverse background environments) can make the model more robust.

Designing the Training Images with Midjourney

It's important to consider the above Data Variety characteristics when designing and training an image classifier, as well as when evaluating its performance on different datasets and real-world scenarios across a wide range of image variations and conditions. In an attempt to prevent overfitting and to ensure a balanced and diverse training set, we included images of Men/Women/Monkeys wearing a Business suit, a Kung Fu uniform, or Casual Wear1 that also varied along these six distinct dimensions:

  1. For reasons not entirely evident, it was extremely difficult to generate images of monkeys outside in Casual Wear (e.g., jeans, t-shirt, sneakers) where the monkey was not also wearing a long sleeved denim jacket.  In an attempt to stop the denim jacket from being included, we added the phrase “outside on a hot day in the park” or “without a jacket” but it was not very successful. As a side note, denim shirts with rolled-up or short sleeves did show up slightly more often for men than women but not to the extent of it being a noticeable fashion trend across the other classes like it was for monkeys. Women in Casual Wear frequently had torn jeans.
  2. “Light green” worked well as a color choice for shirts, especially for the monkey class, but it did not work as well for any class wearing Kung Fu uniforms because they often looked more like a military uniform or workman’s overalls.
  3. The casual wear description was refined to “A nice patterned short-sleeved t-shirt” to reduce the number of plain white t-shirts generated.  We eventually added colors to the description and other variations to further reduce the number of casual wear images in any of the office settings where the man / woman / monkey looked like they were still wearing some version of a business suit due to the presence of buttons and a tie.
  4. Business suits were generally dark colored without intention by default.  “Traditional” was added to the color modifier for Kung Fu uniforms to try and reign in the creativity of the Midjourney bot which would sometimes generate colorful hybrids of a military and a marching band uniform.
  5. We used “Blonde” as a hair color at first but later changed it to “Light” to give us a wider variety of colors from white to sandy blonde.
  6. Jumping, kicking, and punching are actions commonly seen in people wearing Kung Fu uniforms.  We included those actions across all classes as well as the clothing and environment dimensions so that those actions would not be associated only with Kung Fu uniforms.
  7. Since monkeys alternate between walking upright and walking on all four limbs, we included a crawling action to simulate that across all classes.
  8. “Location” and “Secondary object in the scene” were realistically linked.  While theoretically possible, we did not want to generate an image with a bedroom location containing a tree for example, or a beach location containing a copy machine.

The instructions provided to the Midjournney bot were generally constructed in the following format:

A {body size} {class: man / woman / monkey} with {hair color} hair who is wearing a {clothing color} {class: suit / uniform / casual}. The {class: man / woman / monkey} is {action in scene} [in / on / by / next to] a {secondary object in the scene} while [in / at] [a / an / the] {location}.

Some specific examples of this format appear below:

  • /imagine: Full length body view of a medium sized man with brown hair who is wearing a light blue traditional Kung Fu uniform.  The man is standing next to a copy machine while in a modern day office building. The image should have a realistic photographic quality.
medium sized man with brown hair who is wearing a light blue traditional Kung Fu uniform.
medium sized man with brown hair who is wearing a light blue traditional Kung Fu uniform.
  • /imagine: A full length body view of a medium sized woman with brown hair who is wearing a white short-sleeved t-shirt, blue jeans, and sneakers.  The woman is jumping in the air next to a desk while in a modern day office building. The image should have a realistic photographic quality.
medium sized woman with brown hair who is wearing a white short-sleeved t-shirt, blue jeans, and sneakers
  • /imagine: A full length body view of a tall monkey with brown hair who is wearing a brightly colored traditional Kung Fu uniform. The monkey is sitting on a bench while outside at the park. The image should have a realistic photographic quality.
A full length body view of a tall monkey with brown hair who is wearing a brightly colored traditional Kung Fu uniform.

The description ”Full length body view” was appended to the front of each instruction sent to the AI bot but occasionally it was ignored resulting in some images being only half or three-quarters body shots.  Fortunately, Midjourney has post-image generation options such as zoom out 1.5x or  2.0x, and pan (left, right, up, down) to get more of the central item in the image. However, in some instances of using those options, the bot chose not to extend the body in the image but rather strategically place another object in front of it such as a table, hot tub, or green bucket of ice. 

The description ”The image should have a realistic photographic quality.” was also added to the end of each instruction.  However, some images were generated that looked more like hand-drawn artwork or an animated cartoon character on a realistic background.  For the sake of data variety, we included some of those images in the model providing they passed the quality checks discussed in the next section. 

Quality Checking the Training Images

Even though we had some creative control over the images, not every image generated was usable. In fact, some images had us scratching our heads. To address potential issues of Image Quality, many images were discarded due to artifacts such as:

  • characters in super-impossible body contortions or with unrealistic limb proportions
  • characters with three or more arms, legs or hands
  • characters missing full limbs or hands (for no apparent reason as if they were simply invisible)
  • Characters in business suits jumping in an office accompanied by some sort of explosion
  • miscellaneous objects in the foreground (e.g., extra sneakers floating in the air, an exploding aircraft) 
  • reflections in glass or mirror containing a copy of the main character
  • images containing framed paintings on the wall of people who were clearly staring in judgment of the central character in the image
  • women in casual wear with sneakers on their hands as well as their feet
  • more than one character in the foreground or background
  • monkeys in casual wear next to an elevator where they were actually men in casual wear with a naked monkey somewhere else in the image (e.g., in a poster, sitting on the elevator controls, part of the pattern on the man’s shirt)
  • monkeys that looked too human with bare arms or legs and no hair visible anywhere other than on their face
  • monkeys wearing business suits made out of another monkey’s fur rather than a fine cloth material
  • a monkey in casual wear next to a water cooler who was actually a man in casual wear holding a water bottle in one hand and dangling the severed head of a monkey from their other hand
  • Plus-sized men or short monkeys that looked suspiciously like caricatures of famous British actors

Some of the discarded images appear below:

  • Extra arm/hands subtly placed on the main character
Extra arms and hands placed on the main character
  • Monkey in Casual wear next to an elevator in an office building (it is a man in Casual Wear with a monkey cleverly integrated into the background)
Monkey in casual wear next to an elevator
  • Man / Monkey in traditional Kung Fu uniform kicking (with three legs)
Man/Monkey in Kung FU uniform with three legs
  • Man / Woman in Business suit jumping (with a missing limb and some with explosive flatulence) near a desk in an office. 
Man/Woman in business suit jumping near a desk in an office

Example Images that Passed the Quality Checks

Here are a few exemplars used in training the classifier of a Man / Woman / Monkey in a Business suit while standing next to a water cooler, standing next to a tree, crawling on the ground, or jumping in the air:

Exemplar used in training the classifier

Here are a few exemplars used in training the classifier of a Man / Woman / Monkey in a Kung Fu uniform while standing next to a copy machine, standing next to a tree, crawling on the ground, or jumping in the air:

Exemplars used in training the classifier of a Man / Woman / Monkey in a Kung Fu uniform

Here are a few exemplars used in training the classifier of a Man / Woman / Monkey in a Casual Wear while standing next to a water cooler, standing next to a tree, crawling on the ground, or jumping in the air:

Exemplars used in training the classifier of a Man / Woman / Monkey in a Casual Wear

Reviewing the Model

Before training the model, we reviewed how well we addressed the key factors mentioned earlier that are important for  prediction accuracy.

Overfitting and Data Variety

To account for overfitting based on the selection of clothing, we varied the background settings (e.g. in an office, next to an elevator, outside in a park) and actions (e.g. jumping, sitting, standing) of the central character (i.e., man, woman, monkey).  We recognize that this created some rather absurd images such as a monkey wearing a Kung Fu uniform standing in an office next to a copy machine or a man wearing a business suit outside crawling on the ground next to a tree. We wanted to see how well the classifier would recognize the central item when out of context.  In addition, since monkeys are energetic, we include men and women doing a variety of actions such that the action alone would not be indicative of what the central character was or what it was wearing.

Data Quantity and Data Balance

As mentioned earlier, we were training the model with over 150 images for each of the resulting nine two-way class interactions of character and clothing which far exceeded the minimum number of 50 images per class.  This kept the balance ratio between classes well under 2:1. We were also interested to see how well the classifier would do overall for the character and apparel class.  Therefore, we created roll-up classes for “Man”, “Woman”, and “Monkey” regardless of apparel and roll-up classes for “Business suit”, “Kung Fu uniform”, and “Casual Wear” regardless of character. 

 From a data balance perspective, each roll-up class contained between 450-500 images so the ratio did not exceed 1:1.15 between them. However, each of the roll-up classes now had a 3:1 ratio to any of its component subclasses which would make it unbalanced from that perspective.  Essentially, this meant that the classifier could potentially be more accurate at predicting a “Woman” or “Business suit” roll-up class than it would be at predicting the ”Man in Casual Wear” interaction class. We were curious to see if this turned out to be the case and will discuss this in further detail in part 3.

Adding Images to your Model

Above we’ve discussed in detail our classes and how we can generate images via Midjourney to train our classifier with.  Next, let’s discuss how to add these images to our model:

  • Now that you have identified all the classifications / categories in your model, you can begin adding images. In your custom vision project.  Click on the “Add Images” link on the screen as seen here.
Add Image
  • Now, you can browse to the images stored on your hard drive.  It helps to keep your images organized in folders according to the tags you plan to use, especially if you plan to reuse them in other models. We set up a folder set containing images for training (seen below) and a separate set of similar folders for testing.
Searching repository
  • Once you have selected the images, you can tag them on the same screen either by using your existing tags or by creating new ones.
  • When you are done tagging, you can click on the “Upload Files” button. Depending on the number of images you are loading, it could take a few minutes to complete.
  • You can use the search feature to see which images are tagged with each label. This is very useful for when you want to group smaller categories (e.g. Business suit, Kung Fu uniform, Casual Wear) into a roll-up class like “Man”.
  • If you think your counts of image per tag are off, it is very easy to confirm what tags an image has been assigned.  Simply hover your mouse over the lower corner of an image to find the select box. 
Check the lower corner of the image
  • Select one or more images and click on the “Tag Images” link at the top.  On this screen, you can add or remove tags as needed. For demonstrative purposes, we misassigned the “Monkey in Casual Wear” to this image earlier. 
  • To fix this, all you have to do is click on the “X” to remove the tag and then add the correct tag to the image.

Negative Tags

  • There’s one last thing to do before training your model. It is recommended to include some images that absolutely do not match any of the other categories and tag them as “Negative”.  You only needed a minimum of 5 negative tags to start training the model. We discovered after the first iteration of training that this number of tagged images was insufficient and impacted the usefulness of the prediction. We will come back to this topic in part 3 of this series.

Training the Model

Now you can begin training the model.

  • Now you can begin training the model.  Click on the green “Train” button with the two gears on it on the top of the page.
Train
  • Select “Quick Training” in the following dialog and then click on the blue “Train” button. You have the option to select either "Quick Training" or "Advanced Training" modes.  If you prefer a faster iteration process just to initially get things rolling, select "Quick Training". 
Quick Training
  • Opting for the “Advanced Training” mode can lead to enhanced performance, particularly when dealing with complex datasets and detailed classification tasks. When utilizing advanced training, you have the ability to define a specific time allocation for training (up to 96 hrs), and Custom Vision will experimentally determine the optimal training and augmentation settings. For part 1 of this series, we chose “Quick Training”.  We will compare the prediction results from “Advanced Training” to “Quick Training” in part 3.
Training Budget
  • Once you hit the “Train” button, you can monitor the progress of the current training as well as see your other training iterations on the left. You have the ability to adjust the probability threshold by utilizing the slider located in the upper-left corner. With this threshold, you can set the required confidence level the classifier will have when making a prediction. A higher probability threshold will yield a greater number of accurate classifications.  For example, the classifier will be more likely to accurately predict that there is a monkey in the image, however, any image that is not close enough to the model’s idea of a monkey, will be missed, even if the image does, in fact, contain a monkey. Conversely, if the probability threshold is set too low you will get more false positives. For example, the model will predict lots of monkeys but in some cases, the image may be of a man or of a cat.
Iteration

Quick Test of the Model

Once training is complete, you can evaluate the model's performance right away using the provided metrics and by testing images via URL or from your PC. If the model’s performance is not satisfactory, you can retrain it with more or higher quality images or increase your model training time.

Performance

Using the “Quick Test” link, you can individually upload an image that wasn’t part of the training set and see how well the model predicts the classification.  As you can see in the examples below, our model identifies the roll-up classes: “Man”, “Women”, “Monkey”, “Business suit”, or “Kung Fu uniform” with a higher probability than the interaction of the two classes: “Man wearing a Business suit” or “Monkey in Kung Fu uniform”. This may be due to the disparity in image counts since there are three times more “Business suit” images than “Man in Business suit” images. 

Model Identification

This was not the case, however, for “Man in Casual Wear”.  While the classifier correctly predicted the roll-up class “Casual Wear”, it assigned a low probability for the “Man” roll-up class as seen in the examples below:

Low probability for the “Man” roll-up class

This unexpected result could be due to the following:

  • not enough representative images of plus-sized men in Casual Wear (only 10% of men in this class)
  • not enough time spent training on the image sets available (we used “Quick Train”)
  • the unintentional introduction of shorts in the beach environment

Even though jeans were specified in the instructions to the bot, some men and women were generated wearing shorts in the beach environment. We made an effort to keep the characters as fully clothed as possible.  In fact, only 9 out of 189 women in Business suits had uncovered legs.  This was an intentional quality check to prevent the classifier from using the presence of naked calves as an indicator of “Woman”.  

This leaves us with under-representation or under-training as possible causes.We intend to generate more test and training images and then examine this in more detail in part 3.

Publishing the Model

Azure provides REST API endpoints that allow you to integrate your trained model into your applications. You can use these endpoints to send image data and get classification results in real-time. Azure also offers SDKs and client libraries in various programming languages to simplify integration. When you are satisfied with the model's performance, click on the performance link and then click on "Publish" to make it available for consumption. When you publish, you have to assign it to the prediction resource you set up at the beginning of the article.  You can also rename your model to make it more memorable for when you use it in apps.

Publish Model

Integrating the Model with Power Apps

In the next article of this series, we will show you how to integrate the Custom Vision model with a mobile device using a Power Apps Low-Code approach.  Once you go through the tutorial, your mobile classifier app will be able to do the following:

  • Take a new photo with the device’s built-in camera or use an image already in a gallery or folder 
  • Automatically submit the image to the classifier
  • Return all predictions but only display the top 5 on the screen
  • Autogenerate an image title based on the top predicted result   (e.g., Woman 99.7% Taken on: 10/24/2023 10:08 AM)
  • Allow the user to identify the actual description of the image by selecting the character class, apparel class, and environment dimension
  • Save the full set of prediction results to SharePoint for later analysis
Power Apps Low-Code approach

Stay tuned for part 2 of this series!

References

https://learn.microsoft.com/en-us/azure/ai-services/custom-vision-service/select-domain

https://learn.microsoft.com/en-us/azure/ai-services/custom-vision-service/getting-started-build-a-classifier

https://learn.microsoft.com/en-us/azure/ai-services/custom-vision-service/getting-started-improving-your-classifier

https://www.youtube.com/watch?v=OzMRNVolrKE

https://www.youtube.com/watch?v=P5yKrEfKtEI

https://www.youtube.com/watch?v=92U0uNWepDw&list=PLPoQn6QlsOwMu-XDeh3SZemuTVYt0zVzT

How can we help?

Understanding low code development applications and uses, and the variety of AI complex use cases, might be something you are struggling with.

Turning to technologies that you do not grasp entirely its a challenge sometimes too hard to overtake alone. The best advice on how to do so effectively, is ironically to get some good advice. As experienced software and data experts, The Virtual Forge is here to help you understand your business problems, with up front engagement and guidance for you as the client: what are your problems and how can we solve them?

Our Most Recent Blog Posts

Discover our latest thoughts, tendencies, and breakthroughs in the realm of software development and data.

Swipe to View More

Get In Touch

Have a project in mind? No need to be shy, drop us a note and tell us how we can help realise your vision.

Please fill out this field.
Please fill out this field.
Please fill out this field.
Please fill out this field.
Send Message

Thank you.

We've received your message and we'll get back to you as soon as possible.
Sorry, something went wrong while sending the form.
Please try again.