Inside the NVIDIA GPU
Chip by Chip

Explore every component of a modern NVIDIA GPU — from Streaming Multiprocessors to the memory hierarchy. Click any component to understand what it does and why it matters.

SM CoresCUDA ThreadsMemory HierarchyRasterizationMath Operations

Data Flow

How Data Travels
Through the Hardware

↑ Click any node to learn what it does in the pipeline

PCIe 5.0 ×16

64 GB/s

GDDR6X VRAM

~1 TB/s

L2 Cache

~3.5 TB/s

Shared Mem

~19 TB/s

Streaming Multiprocessor

Inside One SM:
Every Component

An RTX 4090 has 128 SMs. Click each component to understand its role in processing your image.

Streaming Multiprocessor (SM)

Ada Lovelace

CUDA Cores (FP32 ALUs)

128 / SMFloating-Point Math

Each CUDA core is an Arithmetic Logic Unit (ALU) that performs one floating-point addition or multiplication per clock cycle. 128 cores × 128 SMs = 16,384 total cores in an RTX 4090. They're the workhorses — doing pixel color math, coordinate transforms, and any general computation.

Analogy

🔢 Like 128 calculators running simultaneously — each computing one pixel's color.

Rendering Pipeline

Rasterization:
Triangles → Pixels

Vertex Processing

GPU receives 3D triangle vertices (x,y,z coordinates). The Vertex Shader applies a 4×4 Model-View-Projection matrix to transform world coordinates into screen space.

shader.glsl

// MVP Matrix Transform
vec4 clip = MVP * vec4(pos, 1.0);
vec3 ndc = clip.xyz / clip.w;
// ndc.x,y = [-1,1] screen coords

Pixel State — Vertex Processing

Active pixel

Empty

GPU Mathematics

The Math Behind
Every Pixel

Vertex Stage

28 FLOPs per vertex

MVP Matrix Transform

Every 3D vertex is multiplied by a 4×4 Model-View-Projection matrix. This transforms world coordinates (x,y,z) into 2D screen coordinates. One matrix multiply = 16 multiplications + 12 additions.

Formula

⎡ m00 m01 m02 m03 ⎤   ⎡ x ⎤   ⎡ x' ⎤
⎢ m10 m11 m12 m13 ⎥ × ⎢ y ⎥ = ⎢ y' ⎥
⎢ m20 m21 m22 m23 ⎥   ⎢ z ⎥   ⎢ z' ⎥
⎣ m30 m31 m32 m33 ⎦   ⎣ 1 ⎦   ⎣ w' ⎦

GLSL Code

vec4 clip = MVP * vec4(pos, 1.0);
vec3 ndc = clip.xyz / clip.w;

Inside the NVIDIA GPUChip by Chip

How Data TravelsThrough the Hardware

Inside One SM:Every Component