When we stream movies online or share video clips through social media, we likely don’t think twice about the underlying technology enabling the transmission of dynamic visual content. However, video compression and encoding technologies are essential for efficient storage and transmission of digital videos.At its core, video compression aims to reduce redundancy and irrelevancy in video data to minimize its size. Since raw uncompressed video contains an immense amount of data, compression discards redundant bits and encodes only the essential data. This compressed bitstream can then be efficiently transmitted or stored.Several techniques are employed to achieve high compression ratios. Spatial redundancy within image frames can be reduced using transform coding techniques that concentrate signal information into fewer coefficients. Motion compensation methods exploit the temporal correlation between consecutive frames – pixels that do not change are not recoded.
The Magic Behind the Pixels
Every moment of video, no matter how brief, is composed of thousands, even millions, of tiny pixels. Each pixel has data—colors, brightness, and more. If we were to store video without compressing it, the file sizes would be enormous, making streaming a slow and cumbersome experience. Enter video codecs.
Psycho-visual optimizations take advantage of the human visual system’s reduced sensitivity to certain visual components. For instance, color information can be subsampled since the eye perceives luminance more sharply than color.After compressing video frames using the best available methods, these frames are assembled into a coded video sequence known as the bitstream. Additional compression occurs by assigning variable-length codes to frequently occurring symbols and patterns.To decode this compact bitstream, video codecs specify compression and decompression processes. Well-known video codecs include H.264, HEVC, VP9 and more. Codecs balance performance in terms of compression efficiency, complexity, latency and visual quality. Newer codecs like HEVC aim to improve compression rates.
Compression Strategies
The decades of research and development into video compression technologies has enabled the streaming and sharing of high definition content we enjoy today. Ongoing improvements focus on supporting emerging applications like high-resolution video, VR and AR content. Efficient video compression will remain crucial for delivering rich visual experiences into the future.
Transform Coding
- Apply transforms like DCT to concentrate signal into fewer coefficients
- Quantize coefficients to reduce precision needed to store them
- Entropy coding further compresses quantized values
Motion Compensation
- Divide frames into blocks and track motion across frames
- Only code the pixel changes rather than recode similar blocks
- Use motion vectors to represent movement
Motion Estimation
- Find best match of a block within a search region in the reference frame
- Calculate motion vectors representing spatial displacement
Intra Prediction
- For intra-frames, predict blocks using surrounding pixels within the same frame
- Subtract prediction and code residual
Inter Prediction
- For inter-frames, predict blocks using pixels from other frames
- Use motion vectors to compensate for movement
Psychovisual Optimization
- Leverage human visual system properties like texture masking
- Filter to remove high frequencies that are less perceptible
- Chroma subsampling since luminance has finer details
Rate Control
- Adjust quantization dynamically to meet target bitrate constraints
- Allocate more bits to complex regions and fewer to easy to compress regions
Entropy Coding
- Assign variable length codes based on probability of symbols
- Shorter codes for common symbols, longer codes for rare ones
These techniques reduce spatial, temporal, and statistical redundancy. Combining them allows modern video codecs to achieve high compression ratios. Ongoing research improves efficiency of each strategy.
The Concept of Saliency Mask
A saliency mask is a computer vision technique that attempts to determine the most important regions in an image from a perceptual perspective. The goal is to create a binary mask that highlights the salient or most visually attention-grabbing parts of the scene.Saliency detection models aim to predict where humans would focus when presented with a scene. This is inspired by visual attention mechanisms in the human visual system. Areas of greater visual complexity, contrast, color variation or distinctness tend to attract more attention.
Common approaches to generating saliency masks include:
- Using image features like color, luminance, textures to detect areas distinct from their surroundings. Machine learning classifiers can combine features.
- Visual attention models based on psychology and neuroscience research into how humans focus on scenes. These models try to computationally simulate human attention allocation.
- Neural networks trained to predict human eye gaze patterns on images, learning which parts people look at.
- Frequency domain analysis to identify areas of greater spatial contrast that draw attention.
Saliency masks have applications in areas like image cropping, thumbnail generation, image compression by preserving salient regions at higher quality, and general computer vision understanding of human perception. They provide supplementary information to models on the most relevant parts of the scene.Ongoing research aims to improve accuracy of models in predicting human visual attention across diverse images. Evaluations use datasets of images labeled with human gaze patterns. More robust saliency detection remains an active area in computer vision and psychology.
Here are some key optimization strategies for video compression based on use case:
Short Video
- Favor quality over latency – Use multi-pass encoding, slower presets
- Target playback platforms – Optimize for target resolutions and devices
- Rate control for size – Use constant quality over constant bitrate
- Compress audio separately – Balance high quality audio with video bitrates
Live Streaming
- Favor low latency – Use faster presets and encoding speed
- Adaptive bitrate – Output multiple quality levels for adaptive streaming
- Segment video – Short GOP for quicker starts, align keyframes
- Rate control for smoothness – Avoid variable quality between segments
Real-Time Communications
- Ultra low latency – Tuned single-pass encoding, slice parallelism
- Forward error correction – Deal with packet loss in noisy networks
- Encoder adjusted bitrate – Match available network bandwidth
- Quick GOPs – Intra refresh to limit error propagation
- Encoder hints – Provide spatial/temporal awareness to maximize quality
The constraints and priorities vary – short video favors maximum quality, live streaming smoothness and responsiveness, and real-time communications minimizing end-to-end latency above all else. Codec settings and strategies are tailored to the use case.
Why Compression is Crucial
Video codecs use advanced algorithms to shrink file sizes by identifying and eliminating redundant data. For instance, in a scene where the sky takes up half the frame and remains a consistent shade of blue, rather than storing information for every pixel, the codec will recognize the redundancy and store that section of the frame more efficiently.
The Balancing Act: Quality vs. Size
But it’s not just about making files smaller. It’s about striking the right balance between file size and video quality. Over-compression can result in loss of detail and visual artifacts. Modern codecs like H.265 and AV1 are designed to deliver high-quality videos at a fraction of the original size, but selecting the right settings is key.
The Future is Even Smaller
As we march forward into an era of 4K, 8K, and even virtual reality videos, the demands on data and storage will only grow. Thankfully, the evolution of video codecs is keeping pace. Research into more efficient algorithms and better compression techniques promises to deliver even smaller files with even bigger pictures.
In essence, the complex dance of video compression is an ongoing tango between innovation and necessity. As our world grows increasingly digital, the power of video codecs ensures that our experiences remain seamless, vivid, and deeply engaging.