Categories
etcetera

Messing with Image Classification

There’s so much buzz about image recognition these days, I felt like I wanted to join in. Long ago, I saw an excellent project by Tom White in which he used image classification to generate abstract drawings reminiscent of an object. Basically, he made a feedback loop between an image generator and the image classifier, aiming to find a simple image that is strongly classified as, for example, a sewing machine. This idea stuck in my mind for a long time, and I wanted to give it a try.

Tools

To make life easier, I installed Anaconda. This handles the different Python packages and environments so easily, compared to fighting pip all the time to get the right versions. Through that, I installed Keras with Tensorflow. Keras makes it simple to use different pre-trained classifiers, and I settled on VGG16. To handle drawing, I used OpenCV.

VGG16 was trained on Imagenet, and when I give it an image, it returns scores (from 0-1) on each of 1000 categories for how much the image matches that category. My goal is to pick a category and develop an image that the classifier properly recognizes.

I know that the image classifiers can be sensitive to texture and small-scale patterns. This is potentially an problem, since I want to do broad, simple shapes. It may be that a lot of categories are inaccessible with simple shapes.

Random Ellipses

I first threw a bunch of random ellipses at the classifier. For example, I gave it this image with eight ellipses with random colors, positions, sizes, and tilt angles. In this case, the classifier told me that the top three labels were “face_powder” (0.104), “bib” (0.082), and “hair_slide” (0.059). Hm.

Random ellipses that look like face_powder?

Next, I need to learn what it likes to see. That is, what are the easiest categories to score well on? For that, I generated 1000 random images and checked what it thought of those. The following plot shows the distribution of the minimum, median, and maximum scores for each category, sorted by the median value.

There’s a massive variation! The median covers almost 6 orders of magnitude! The top five classes that it saw were “hair_slide”, “pinwheel”, “rubber_eraser”, “Band_Aid”, and “envelope”. The worst five were “Komodo_dragon”, “African_crocodile”, “colobus”, “trolleybus”, and “electric_locomotive”, none of which look like random circles.

Ok, now I know that if I start with a junky image, I can make junk that is highly recognize by the classifier, but not by humans. What if I restrict my palette more? Say, just use two colors like in the next images?

That didn’t seem to change the scores much, so we can probably restrict the problem domain greatly by keeping only two colors.

Optimization

Now how do we generate a specific class of drawing? Since this is an optimization problem, the first guess would be some sort of gradient descent. This would be hard to do, however, since this is a ridiculously complex function. Instead, let’s avoid gradients and use a random search approach. At each iteration of the optimization, make a random perturbation (with some scale factor) of the ellipses. If the new version looks more like a toaster (and we’re aiming for a toaster), then keep it and go to the next iteration. Otherwise, throw away that attempt and try again. Over time, shrink the step size to aid convergence. This lets you try very different images at the start, and focus in on a good solution, hopefully.

This requires some hyperparameters such as the number of iterations, the schedule for the changing the properties of the ellipses, and the order in which to draw the ellipses. I picked some numbers I liked, with minimal justification.

What category should we aim for? It shouldn’t be one of the classes that the classifier sees in every pile of ellipses. And it shouldn’t be one of the super-rare classes either. Let’s go for a “great_white_shark”! After some optimization, we get these not-so-good images:

Hmm. I’ve heard that in some cases, these image classifiers depend strongly on the background, so I’ll allow it to choose a solid background color and try again:

Much improved! I guess the first one looks kinda sharky. The higher score show that my optimization is working toward its explicit objective. Also, since the optimization only runs for about 15 minutes on my laptop, it could potentially do better with more time.

Better results

What about trying some other categories? Here are some of the successes, sorted by how much I appreciate them:

The killer whale is decent! I can convince myself that the second one is a plane in front of some clouds. The jack o’lantern is abstract, and could be interesting with more work. The birdhouse knows that there are holes and sticks involved, but has no idea how to arrange them. The space shuttle might be leaving a cloud? The candles look out of focus. The sunglasses are special, but maybe it has lenses and earpieces? The daisy is just wrong. The fish does not look like a fish to me.

A qualified success! With more iterations, I think some of these would be quite nice!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.