Computer vision is at crossroads between different fields such as mathematics, psychology, physics, probabilistic graphical models, machine learning, deep learning, and many more. Put simply; it's similar to giving computers the ability to see the way humans do. If you look at a picture of a couple standing in front of a curved arch at the beach during sunset in wedding gowns, you would automatically assume that it's a sunset beach wedding. A computer will have a hard time doing the same thing as easily.
The solution is to feed the computer thousands or millions of pictures of a particular type or category to examine them from different perspectives, e.g., different lighting, background, angles, poses, scales, etc. With enough data to analyze, the computer should be able to develop a profile of that category. So the next time you input a picture of a dog, for example, the computer should be able to compare it with dog images in its profile and accurately detect it correctly.
Here are several techniques of computer vision and how they differ from each other:
Image classification
Let's say you want a computer to identify a dog in an image. Your computer should look through its saved profiles, identify the object that matches the profile, and mark it with a box with a label. In the past, computers had a hard time accurately identifying objects, but now, millions of photos are uploaded daily, so they use them to fine-tune their profiles and consequently increase their accuracy levels.
There are people whose work is to take images and label the different objects just as a computer is expected to. Then, these images are fed to the computer to increase their accuracy levels in the future. Such people and the process of providing computers with millions of images have been useful in increasing the accuracy of computers in object detection tasks.
Image detection
It may be easy to confuse image classification and image detection. When doing image classification, the computer is looking for one class (subject) in the image. However, image detection can be used to identify several elements in an image. Each element will be put in a box and labeled just as it would during image classification. For example, let's say that a company wanted to count how many objects are in a warehouse. They could have their employees perform the tedious task or use image identification to do it for them.
Object tracking
Object tracking takes object identification from just static images to motion pictures. Rather than identify a dog on the beach, for example, you can track how it runs around on the beach. The applications of object tracking are numerous. For instance, law enforcers can use object tracking to track how criminals fled a scene from surveillance tapes. Self-driving cars can track objects on the road, such as cyclists or other moving vehicles, to maneuver safely.
Image reconstruction
Image reconstruction refers to regenerating images with the highest spatial resolution and accuracy possible. If you have a really old photo that has faded in some areas, you could use computer vision to reconstruct the picture through image reconstruction. Similarly, it can also be used to correct blurry images to extract useful information from a blurry photo. Image reconstruction can help people recover lost data or extrapolate an object if part of it is obstructed.
Semantic segmentation
Semantic segmentation is similar to image detection in that it identifies several objects in an image. The difference is that where image detection marks the item in boxes, semantic segmentation uses pixels. The image is broken down into pixels, and using pixel prediction models, the pixels that make up an object are grouped. This is the difference between identifying a car in the image and identifying where the boundaries of that car lie.
The applications of computer vision are endless. For example, companies making self-driving cars will rely heavily on computer vision. The vehicles should be able to track multiple objects on the road and act accordingly. For example, stop when traffic lights turn red and let a car cut in front of them when the driver indicates that they want to come to their lane.
Organizations can also use computer vision to enhance their security. They can save their employees biometric information like facial features and fingerprints and use them to identify them in the future and grant them access to different locations. The computerized security system will deny anyone whose information isn't in the system access.
While humans were better able to identify objects in the past, current computer vision trends prove that computers will become just as good if not better than us at the same task. It is possible that many industries, from the service industry to manufacturing, will use computer vision to make their processes better and more accurate.