36 Trillion Calculations Per Second: Inside a Modern GPU | Imperym Labs | Imperym Labs

Your graphics card is basically a calculator city.

Not the cute kind. The absurd kind.

When you launch a modern game and crank everything to ultra, the GPU inside your PC is performing tens of trillions of mathematical operations every single second. At the same time, it is moving massive amounts of data back and forth, synchronizing thousands of tiny processing units, and doing all of this fast enough that your brain perceives smooth motion instead of a stuttering slideshow.

So how many calculations are we really talking about? And more importantly, what does that number actually mean?

Let's build intuition.

"Calculations per second" is a real thing

When people say a GPU can do "36 trillion calculations per second," they are usually referring to a very specific operation that GPUs are exceptionally good at: Fused Multiply Add, or FMA.

One FMA looks like this: one multiplication followed by one addition, fused into a single instruction. Because it contains two arithmetic operations, the industry convention is to count one FMA as two floating-point operations.

(A × B) + C

If a single GPU core can execute one FMA per clock cycle, then that core is effectively performing two calculations every cycle. Multiply that by thousands of cores and billions of cycles per second, and the numbers start getting very large very quickly.

This is where the "tens of trillions per second" figures come from. Not marketing magic — just arithmetic.

A simple mental model of GPU compute

Let's strip this down to the cleanest possible model. Assume: number of CUDA cores = N, clock speed = f cycles per second, each core executes 1 FMA per cycle, and each FMA counts as 2 operations.

Ops/sec ≈ N × f × 2

Now plug in realistic numbers. N = 10,496 cores, f = 1.7 × 10⁹ Hz.

10,496 × 1.7 × 10⁹ × 2 = 35,686.4 × 10⁹ ≈ 35.7 trillion ops/sec

That is the number you keep hearing. And it is not fluff — it is literally how many tiny arithmetic units are firing every cycle.

Why your brain refuses to believe this number

Human intuition completely breaks at this scale. So let's make it human.

Imagine doing one long multiplication problem every second. How many people would you need, each doing one calculation per second, to match 36 trillion calculations per second?

Earth has roughly 8 × 10⁹ people. So:

36 × 10¹² ÷ 8 × 10⁹ ≈ 4,500 Earths worth of people

That is about 4,500 Earths worth of people, all calculating in perfect synchronization, forever.

And your GPU does this quietly inside a metal box while you complain about 72 FPS.

GPUs are not "faster CPUs"

This is where confusion usually starts.

A CPU core is a generalist. It can branch, make decisions, handle complex instructions, run operating systems, manage memory, talk to networks, and juggle thousands of unrelated tasks.

A GPU core cannot do most of that. A GPU core is a specialist — a simple arithmetic unit designed to do the same operation repeatedly on huge amounts of data.

GPUs win when the problem looks like: same math, on millions of independent data points, in parallel. That is why GPUs dominate video game rendering, 3D geometry transforms, image and video processing, neural networks, matrix multiplication, and AI training and inference.

Not because GPUs are magical — but because the workload fits the hardware.

Why modern games naturally need trillions of operations

A modern game world looks continuous and cinematic, but underneath it is pure mathematics.

A 3D object is made of triangles. Triangles are made of vertices. Vertices have coordinates like (x, y, z). Each object has its own coordinate system called model space, but the game world uses a shared coordinate system called world space.

So every vertex must be transformed from model space into world space. That transformation is matrix math. Matrix math is mostly fused multiply adds.

Now multiply that by thousands of objects, millions of vertices, and 60 to 120 frames per second — and this is before lighting, shadows, textures, reflections, post-processing, physics, and ray tracing even enter the picture.

At that point, trillions of operations per second stop sounding exaggerated. They become unavoidable.

How GPUs actually pull this off

The GPU breaks work into threads. Each thread handles a tiny slice of data.

Threads are grouped into warps, usually 32 threads at a time. Every thread in a warp executes the same instruction, just on different data.

So the GPU is essentially saying: "You 32 threads — do the same thing, right now, on 32 different values."

Thousands of these warps run across many streaming multiprocessors, constantly scheduled to keep hardware busy. This structure is why GPUs achieve such extreme throughput. It is not one supercomputer brain — it is an army of simple workers moving in formation.

Compute is useless without memory bandwidth

Here is the part most people miss. Doing 36 trillion operations per second is not the hardest problem. Feeding data fast enough is.

If GPU cores do not have data ready, they stall. Idle cores do zero calculations.

That is why GPUs use GDDR memory, optimized for bandwidth rather than low latency. High-end GPUs move hundreds of bits per cycle and approach terabytes per second of memory bandwidth.

The GPU is not just a math engine. It is a math engine attached to a data firehose.

Why this same architecture powers AI

The twist is that everything you just read applies equally well to AI.

Neural networks are not text engines. They are matrix engines. A transformer forward pass is essentially: matrix multiplication, activation functions, more matrix multiplication, attention math, and more matrix multiplication.

That is exactly what GPUs were already designed to accelerate. Tensor cores simply pushed this further by specializing in matrix math at lower precision.

So GPUs did not evolve for AI. AI evolved into a problem GPUs were already perfect at solving.

The punchline

A modern high-end GPU can execute around 36 trillion operations per second.

Not one big calculation. But trillions of tiny arithmetic operations, executed in parallel, across thousands of cores, synchronized in warps, fed by extreme memory bandwidth, all to turn numbers into pixels.

Which means the next time you see a realistic reflection in a puddle, or hair moving naturally in the wind, you are watching trillions of mathematical operations being converted into a believable world — sixty times every second.

And your brain just goes: nice graphics, bro.