HomeGPU Architecture Explorer
Interactive Chip Anatomy

Inside the NVIDIA GPU
Chip by Chip

Explore every component of a modern NVIDIA GPU — from Streaming Multiprocessors to the memory hierarchy. Click any component to understand what it does and why it matters.

SM CoresCUDA ThreadsMemory HierarchyRasterizationMath Operations
Data Flow

How Data Travels
Through the Hardware

↑ Click any node to learn what it does in the pipeline

PCIe 5.0 ×16
64 GB/s
GDDR6X VRAM
~1 TB/s
L2 Cache
~3.5 TB/s
Shared Mem
~19 TB/s
Streaming Multiprocessor

Inside One SM:
Every Component

An RTX 4090 has 128 SMs. Click each component to understand its role in processing your image.

Streaming Multiprocessor (SM)
Ada Lovelace

CUDA Cores (FP32 ALUs)

128 / SMFloating-Point Math

Each CUDA core is an Arithmetic Logic Unit (ALU) that performs one floating-point addition or multiplication per clock cycle. 128 cores × 128 SMs = 16,384 total cores in an RTX 4090. They're the workhorses — doing pixel color math, coordinate transforms, and any general computation.

Analogy
🔢 Like 128 calculators running simultaneously — each computing one pixel's color.
Rendering Pipeline

Rasterization:
Triangles → Pixels

VS

Vertex Processing

GPU receives 3D triangle vertices (x,y,z coordinates). The Vertex Shader applies a 4×4 Model-View-Projection matrix to transform world coordinates into screen space.

shader.glsl
// MVP Matrix Transform
vec4 clip = MVP * vec4(pos, 1.0);
vec3 ndc = clip.xyz / clip.w;
// ndc.x,y = [-1,1] screen coords
Pixel State — Vertex Processing
Active pixel
Empty
GPU Mathematics

The Math Behind
Every Pixel

Vertex Stage
28 FLOPs per vertex

MVP Matrix Transform

Every 3D vertex is multiplied by a 4×4 Model-View-Projection matrix. This transforms world coordinates (x,y,z) into 2D screen coordinates. One matrix multiply = 16 multiplications + 12 additions.

Formula
⎡ m00 m01 m02 m03 ⎤   ⎡ x ⎤   ⎡ x' ⎤
⎢ m10 m11 m12 m13 ⎥ × ⎢ y ⎥ = ⎢ y' ⎥
⎢ m20 m21 m22 m23 ⎥   ⎢ z ⎥   ⎢ z' ⎥
⎣ m30 m31 m32 m33 ⎦   ⎣ 1 ⎦   ⎣ w' ⎦
GLSL Code
vec4 clip = MVP * vec4(pos, 1.0);
vec3 ndc = clip.xyz / clip.w;