Inside the NVIDIA GPU
Chip by Chip
Explore every component of a modern NVIDIA GPU — from Streaming Multiprocessors to the memory hierarchy. Click any component to understand what it does and why it matters.
How Data Travels
Through the Hardware
↑ Click any node to learn what it does in the pipeline
Inside One SM:
Every Component
An RTX 4090 has 128 SMs. Click each component to understand its role in processing your image.
CUDA Cores (FP32 ALUs)
Each CUDA core is an Arithmetic Logic Unit (ALU) that performs one floating-point addition or multiplication per clock cycle. 128 cores × 128 SMs = 16,384 total cores in an RTX 4090. They're the workhorses — doing pixel color math, coordinate transforms, and any general computation.
Rasterization:
Triangles → Pixels
Vertex Processing
GPU receives 3D triangle vertices (x,y,z coordinates). The Vertex Shader applies a 4×4 Model-View-Projection matrix to transform world coordinates into screen space.
// MVP Matrix Transform
vec4 clip = MVP * vec4(pos, 1.0);
vec3 ndc = clip.xyz / clip.w;
// ndc.x,y = [-1,1] screen coordsThe Math Behind
Every Pixel
MVP Matrix Transform
Every 3D vertex is multiplied by a 4×4 Model-View-Projection matrix. This transforms world coordinates (x,y,z) into 2D screen coordinates. One matrix multiply = 16 multiplications + 12 additions.
⎡ m00 m01 m02 m03 ⎤ ⎡ x ⎤ ⎡ x' ⎤
⎢ m10 m11 m12 m13 ⎥ × ⎢ y ⎥ = ⎢ y' ⎥
⎢ m20 m21 m22 m23 ⎥ ⎢ z ⎥ ⎢ z' ⎥
⎣ m30 m31 m32 m33 ⎦ ⎣ 1 ⎦ ⎣ w' ⎦vec4 clip = MVP * vec4(pos, 1.0);
vec3 ndc = clip.xyz / clip.w;