Computer Vision

There are AI areas focused on different senses, but vision is fundamental along with natural language. Vision attempts to identify and extract symbols from raw visual data and then use those symbols to make decisions, take actions or produce information. These symbols have many forms: they can be labels from a set used for training, captions, text extracted from the image via OCR, colors, and so on. Not all images are created alike: In general, systems that are good at processing attributes for still images are not necessarily as good for processing video, and vice-versa.

Sub-domains of computer vision include scene reconstruction, motion/event detection, tracking, object recognition, and image restoration among many others.

What Can They Do?

Current Computer Vision APIs provide significant, impressive functionality with very little complexity. We can compare raw output for the same image and different APIs with the following page:

There’s plenty of information to be obtained: from tags, captions, labels, text (via OCR), detection of adult or inappropriate content, etc. Some systems will return specific coordinates in the image that allow separation of elements, either automatically or with a person’s help.

An Image With Text Areas Automatically Highlighted

As in other API types, there is significant variability between different services in terms of features, capabilities, and so on. When moving between one service and another there aren’t any shortcuts and each API call and response will need to be verified again.

How Do They Perform?

Running enough tests gives us an idea of how these APIs perform:

The good news is that all of these things can be addressed by how your code uses the underlying APIs. Also, the systems are improving rapidly. For example, a specific type of deep learning system called a convolutional neural network (which we'll discuss later) are enabling much higher accuracy for rotated images these days. Here are some tips to get the most out of the current generation of technology:

A Word on Efficiency

When you are implementing a vision recongition system (or most any machine learning-based software system), you need to be aware of two costs:

Keep both of these costs in mind as you are designing your system.