There is the saying that images are worth 1000 words, so what happens when the images are wrong or are not the best they can be, therefore it is not conveying what you want.
Not too long ago, there were basically two routes to go, either you scrape and trust what you got and call it intelligent just because a machine got it or you needed to put a lot of human power behind classifying images and quality by hand. Not anymore, now with Artificial Intelligence, you can get the best of two worlds, scale, and accuracy.
We were presented with a problem to solve for a client, how can we distinguish and categorize images in a way that you can pick at scale the right image to use. The challenge, we had to solve it in a short period of time with a high level of accuracy. In other words, the client challenges us to provide a solution that has 3 main requirements:
- Accuracy (99% of the time the algorithm have to classify the images correctly)
- Speed at scale (~800,000 images per month)
- Cost-effective cloud computing-wise
In order to do this, we designed a multistep process that makes decisions and gives a classification of an image based on a serverless architecture.
First, we define the desired classifications. Then, we need to obtain the image. In this example, we get URLs where the images are stored, the images formats are in .jpg or .png. The classification categories are the following:
- Image of stock cars: These are images of cars that are taken by the manufacturer.
- Spotlight car images: Photos of the car being sold taken on a parking lot.
In this case, we want a solution that is accurate and cost-efficient for this specific problem. We will define a good level of accuracy as more than 99% of the images classified correctly and the costs under $1 per 1000 images processed.
We chose Python as the implementation language because of its flexibility to use and the considerable number of libraries for machine learning and computer vision that are implemented in that language, as we will see later.
On a high level, these are the steps:
- Getting the image URL
- Convert the image to an array of bytes using NumPy¹
- Resize the image (Car images are huge, around 4163 x 3330 pixels, so we needed to resize them to 640 x 480 pixels using an area-based interpolation)
- Extract and process the information using OpenCV² (open source computer vision and machine learning library that contains a big number of functions for real-time computer vision)
- Actual classification between Stock and Spotlight images. One of the main drivers for this algorithm was to use an image histogram to distinguish between images that have a small concentration of colors and tones (Stock images) from images with a lot of tones (Spotlight images).
Testing the algorithm
A sample of 825 images was used. Over that sample, a manual classification was done over those images and we got 175 ‘Stock’ images and 650 ‘Spotlight’ images. The algorithm for classifying images as ‘Spotlight’ and ‘Stock’ was applied and tested for these inputs. After some refinement, the algorithm classifies correctly 100% of the sample images. We have to consider that these results highly depend on the characteristics of the images and the criteria selected for classification. If the images processed change significantly from the sample considered, the results may have very different results and the classification process has to be tuned up for that input.
We tested the implementation of the algorithm as an AWS Lambda function with the minimum amount of memory (128MB). We used a limit of 29s for execution, so the function can be called through AWS API Gateway. The execution was never timed out in the tests and always used less time. Even with big images, these resources were sufficient for the function to run successfully.
After taking into consideration the possible images to classify, we came up with an algorithm that was able to classify an image as Spotlight or Stock, given a URL. The algorithm performed with good results and it was able to run in the lowest memory tier for an AWS Lambda function (128MB), even for big images. This allows the process to run in a cost-efficient manner. As an example, allocating 128MB of memory to our lambda function, executing it 2 million times in one month, and running it for 15s each time (an overestimation), we get a price of around $55 USD³. This value is lower than what we projected. For such a volume of images processed, it is a very reasonable price.
Additionally, it is worth noticing that these types of processes usually have a margin of error that can be minimized but probably it cannot be reduced to absolute zero. In our case, if the features of the images remain in the boundaries of our premises, we can achieve a percentage of success higher than 99% of the time, which is enough for our goals at this point.