Computer Vision
TL;DR Computer vision enables machines to understand and interpret images and video, enabling them to make decisions about the world around them.
Computer vision is the field of AI focused on enabling computers to see, understand, and analyse visual information. It draws from imaging, physics, machine learning, and cognitive science to transform raw pixels into meaningful insights. From recognising objects to interpreting complex scenes, computer vision enables systems to navigate, inspect, diagnose, and interact with the physical world.
Computer vision lets machines analyze photos or video and infer what is happening without being explicitly told. It is how apps recognise faces, how robots find their way around, and how cars can detect lanes or pedestrians. Any time a device seems to understand what it sees, computer vision is working behind the scenes to make sense of the image.
For technical readers: Computer vision involves methods for feature extraction, image processing, deep convolutional architectures, transformer-based vision models, 2D and 3D perception, SLAM, multimodal fusion, and real-time inference. Key tasks include classification, detection, segmentation, tracking, depth estimation, pose estimation, and visual reasoning. Modern systems rely heavily on large-scale pretraining, synthetic data generation, differentiable rendering, and high-performance inference pipelines optimised for embedded or cloud environments.
Image processing and enhancement
Object detection, recognition, and classification
Segmentation and scene understanding
Motion analysis and tracking
3D vision, depth, and spatial reasoning
Practical applications in robotics, medicine, industry, and vehicles
ELI5 Computer vision is like giving a computer eyes and a little brain that helps it recognise things in pictures, so it knows what it is looking at.