# Discrete Cosine Transform (DCT)
The Discrete Cosine Transform is the mathematical heart of JPEG compression. It converts image data from the **spatial domain** (pixel values) to the **frequency domain** (patterns of variation), enabling efficient compression by concentrating image energy into fewer coefficients.
## The Key Insight
Natural images are dominated by **smooth variations** (gradual changes in brightness and color) rather than **rapid oscillations** (checkerboard patterns). The DCT separates these:
```text
Smooth gradient (low frequency): Checkerboard (high frequency):
████████████████████████████ █ █ █ █ █ █ █ █
███████████████████████████░ █ █ █ █ █ █ █ █
██████████████████████████░░ █ █ █ █ █ █ █ █
█████████████████████████░░░ █ █ █ █ █ █ █ █
████████████████████████░░░░ █ █ █ █ █ █ █ █
███████████████████████░░░░░ █ █ █ █ █ █ █ █
██████████████████████░░░░░░ █ █ █ █ █ █ █ █
█████████████████████░░░░░░░ █ █ █ █ █ █ █ █
```
Photographs have lots of the left, very little of the right. The DCT exploits this!
## Understanding Frequency
In image processing, "frequency" refers to how rapidly pixel values change:
| Zero (DC) | Constant value everywhere | Solid color fill |
| Low | Slow, smooth changes | Sky gradient |
| Medium | Moderate detail | Fabric texture |
| High | Rapid, sharp changes | Text edges, hair strands |
## The 1D DCT
Before tackling 2D images, let's understand the 1D case. Given N samples, the DCT produces N frequency coefficients.
### The Formula
For N samples x\[0\] to x\[N-1\], the DCT coefficient X\[k\] is:
```text
X[k] = α(k) × Σ(n=0 to N-1) x[n] × cos(π(2n+1)k / 2N)
where:
α(0) = 1/√N
α(k) = √(2/N) for k > 0
```
This looks complex, but the intuition is simple: **we're measuring how much each frequency pattern appears in the data**.
### Worked Example
Consider 8 pixel values: `[100, 110, 120, 130, 140, 130, 120, 110]`
This is roughly a smooth curve peaking in the middle. Let's compute the first few DCT coefficients:
**DC coefficient (k=0)**: The average, scaled.
```text
X[0] = (1/√8) × Σ x[n] × cos(0)
= (1/√8) × (100+110+120+130+140+130+120+110) × 1
= (1/2.83) × 960 = 339.4
```
**First AC coefficient (k=1)**: Measures one half-cycle oscillation.
```text
X[1] = √(2/8) × Σ x[n] × cos(π(2n+1)/16)
≈ small value (the data doesn't oscillate once)
```
**Higher coefficients**: Measure higher frequencies — mostly near zero for this smooth data.
### Basis Functions
Each DCT coefficient corresponds to a **basis function** — a specific cosine wave:
```text
k=0: ████████████████ (constant - DC)
k=1: ██████░░░░░░░░░░ (half cycle)
k=2: ████░░░░████░░░░ (full cycle)
k=3: ██░░░░██░░░░██░░ (1.5 cycles)
k=4: ██░░██░░██░░██░░ (2 cycles)
...
```
The DCT tells us: "Your signal is made of 339.4 units of DC, plus a little of k=1, etc."
## The 2D DCT
Images are 2D, so JPEG uses a 2D DCT on 8×8 blocks.
### Separability
A key property: the 2D DCT can be computed as two 1D DCTs:
1. Apply 1D DCT to each row
2. Apply 1D DCT to each column of the result
This reduces computation from O(N⁴) to O(N³).
```rust,ignore
// From src/jpeg/dct.rs
pub fn dct_2d(block: &[f32; 64]) -> [f32; 64] {
let mut temp = [0.0f32; 64];
let mut result = [0.0f32; 64];
// 1D DCT on rows
for row in 0..8 {
let row_start = row * 8;
dct_1d(
&block[row_start..row_start + 8],
&mut temp[row_start..row_start + 8],
);
}
// 1D DCT on columns
for col in 0..8 {
let mut col_in = [0.0f32; 8];
let mut col_out = [0.0f32; 8];
for row in 0..8 {
col_in[row] = temp[row * 8 + col];
}
dct_1d(&col_in, &mut col_out);
for row in 0..8 {
result[row * 8 + col] = col_out[row];
}
}
result
}
```
### The 2D Basis Functions
The 64 basis functions for an 8×8 block form a complete set of patterns:
```text
Basis functions (simplified visualization):
(0,0) DC (0,1) Horiz (0,2) Horiz ... (0,7) Horiz
░░░░░░░░ ██░░░░░░ ██ ██ ██ ██ ██ ██
░░░░░░░░ ██░░░░░░ ██ ██ ██ ██ ██ ██
░░░░░░░░ ██░░░░░░ ██ ██ ██ ██ ██ ██
░░░░░░░░ ██░░░░░░ ██ ██ ██ ██ ██ ██
constant 1 horiz 2 horiz 8 horiz
(1,0) Vert (1,1) Diag (2,2) (7,7) Checker
████████ ██░░░░░░ ██░░██░░ █ █ █ █
████████ ██░░░░░░ ░░██░░██ █ █ █ █
░░░░░░░░ ░░░░██░░ ██░░██░░ █ █ █ █
░░░░░░░░ ░░░░░░██ ░░██░░██ █ █ █ █
1 vert diagonal medium freq highest freq
```
The coefficient at position (u,v) tells us how much of that basis pattern is in the image block.
## Energy Compaction
The magic of DCT: for typical images, **most energy concentrates in the top-left corner** (low frequencies):
```text
DCT of a typical photo block:
Low freq ───────────────────▶ High freq
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ 312 │ -42 │ 18 │ -7 │ 3 │ 0 │ 0 │ 0 │ Low
│ -23 │ 14 │ -8 │ 4 │ -1 │ 0 │ 0 │ 0 │ │
│ 11 │ -6 │ 3 │ -1 │ 0 │ 0 │ 0 │ 0 │ │
│ -5 │ 3 │ -1 │ 0 │ 0 │ 0 │ 0 │ 0 │ │
│ 2 │ -1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ │
│ -1 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ │
│ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ │
│ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ ▼
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ High
The vast majority of coefficients are near zero!
```
This is why JPEG works: we can store mainly the top-left values and discard the rest.
## Precomputed Cosine Table
Computing cosines repeatedly is expensive. We precompute them:
```rust,ignore
// From src/jpeg/dct.rs
const COS_TABLE: [[f32; 8]; 8] = precompute_cos_table();
const fn precompute_cos_table() -> [[f32; 8]; 8] {
let mut table = [[0.0f32; 8]; 8];
let mut i = 0;
while i < 8 {
let mut j = 0;
while j < 8 {
// cos((2*i + 1) * j * PI / 16)
let angle = ((2 * i + 1) * j) as f32 * PI / 16.0;
table[i][j] = cos_approx(angle);
j += 1;
}
i += 1;
}
table
}
```
The table entry `COS_TABLE[n][k]` = cos((2n+1)kπ/16), which appears in the DCT formula.
## The 1D DCT Implementation
```rust,ignore
// From src/jpeg/dct.rs
fn dct_1d(input: &[f32], output: &mut [f32]) {
for k in 0..8 {
let mut sum = 0.0f32;
for n in 0..8 {
sum += input[n] * COS_TABLE[n][k];
}
output[k] = 0.5 * ALPHA[k] * sum;
}
}
```
Where ALPHA provides the normalization:
```rust,ignore
const ALPHA: [f32; 8] = [
0.7071067811865476, // 1/√2 for k=0
1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, // 1 for k>0
];
```
## The Inverse DCT
Decoding requires the **inverse DCT (IDCT)**. Fortunately, DCT is nearly self-inverse:
```rust,ignore
// From src/jpeg/dct.rs
fn idct_1d(input: &[f32], output: &mut [f32]) {
for n in 0..8 {
let mut sum = 0.0f32;
for k in 0..8 {
sum += ALPHA[k] * input[k] * COS_TABLE[n][k];
}
output[n] = 0.5 * sum;
}
}
```
The IDCT reconstructs spatial values from frequency coefficients.
## Level Shifting
Before DCT, JPEG subtracts 128 from pixel values (converting 0-255 to -128 to 127). This centers the data around zero, which produces a smaller DC coefficient and better numerical behavior.
```rust,ignore
// From src/jpeg/mod.rs
fn extract_block(...) {
// ...
y_block[idx] = pixel as f32 - 128.0; // Level shift
// ...
}
```
## Why DCT Instead of DFT?
The DCT is related to the DFT (Discrete Fourier Transform), but better for image compression:
| Output | Real numbers | Complex numbers |
| Energy compaction | Excellent | Good |
| Boundary behavior | Smooth (symmetric) | Wraparound |
| Computation | ~N² | ~N log N (with FFT) |
The DCT's smooth boundary handling is crucial; it implicitly mirrors the data, avoiding discontinuities at block edges.
## Fast DCT Algorithms
The naive O(N²) DCT can be accelerated using algorithms like:
- **Loeffler's algorithm**: 29 multiplications for 8-point DCT
- **AAN algorithm**: Fast integer-based DCT
- **SIMD implementations**: Process multiple values in parallel
Our implementation uses the straightforward approach for clarity, but production JPEG codecs use these optimizations.
## DCT and Compression
After DCT:
1. **DC coefficient** (position 0,0): Average value of the block
2. **Low-frequency AC**: Smooth variations (important, keep)
3. **High-frequency AC**: Fine detail (less important, can discard)
Quantization (the next step) divides by larger values for high frequencies, forcing them to zero.
## Visualizing the DCT
Consider what happens to different image patterns:
**Solid block** (all pixels = 100):
```text
Input: All 100s
DCT: Only DC is non-zero (≈283)
All AC = 0
```
**Vertical edge**:
```text
Input: Left half = 0, Right half = 200
DCT: Large DC (average ≈ 100)
Large horizontal AC components
Vertical AC ≈ 0
```
**Checkerboard**:
```text
Input: Alternating 0 and 255
DCT: DC = 127 (average)
Position (7,7) is very large
Other positions ≈ 0
```
## The DC Coefficient's Importance
The DC coefficient at position (0,0) represents the **average brightness** of the block:
```text
DC = (1/8) × Σ all pixels
For a block with pixels averaging 100:
DC ≈ 100 × scaling_factor
```
The DC coefficient typically contains **80-90% of the block's total energy**. JPEG encodes it with special care (differential encoding between blocks).
## Summary
The Discrete Cosine Transform:
- Converts spatial data to frequency components
- Concentrates energy in low frequencies
- Enables efficient lossy compression
- Is reversible (lossless) before quantization
- Uses separability for efficient 2D computation
The DCT is why JPEG can achieve 10-20x compression while maintaining acceptable visual quality — it reveals that most image information is in smooth variations, not fine detail.
## Next Steps
See [JPEG Quantization](./quantization.md) to learn how quantization discards high-frequency DCT coefficients.
---
## References
- Ahmed, N., Natarajan, T., and Rao, K.R. (1974). "Discrete Cosine Transform"
- [ITU-T T.81 - JPEG Standard, Section A.3.3](https://www.w3.org/Graphics/JPEG/itu-t81.pdf)
- Pennebaker, W.B. and Mitchell, J.L. (1993). "JPEG: Still Image Data Compression Standard"
- See implementation: `src/jpeg/dct.rs`