Computers are faster than you think

Computers are incredibly fast, and you can almost always make your code faster. When optimizing, you should be able to keep optimizing until you’re hitting these limits. Analyze your code and figure out what the fundamental limits are.

Handy references

CategoryMetricNotes
CPU clock3.8 GHz; 0.26 ns/cycle1 ms = 3.8M cycles
L1 cache~13 ms/GB, hot scan32 KB/core max
L2 cache~13 ms/GB, hot scan1 MB/core max
L3 cache~17 ms/GB, hot scan32 MB/slice; 256 MB total
Main memory~24 ms/GB41 GB/s
CPU to GPU~30 ms/GB over PCIe x16
SSD~360 ms/GB, NVMe RAID03 GB 1080p movie: ~1.1 s
US HTTP request~90 ms warm; ~180-270 ms coldSF to Dallas; cold adds TCP + TLS
GPU mathCPU ~0.2 ms; GPU ~0.05 ms, residentgrid[i] *= 1.01 for 1M cells
Postgres indexed read, local~2 msSELECT ... WHERE id = ?
Postgres indexed read, same datacenter~4 msApp server calls DB server
Postgres write~6 ms, single durable rowINSERT one row and commit

NB: CPU, cache, memory, and SSD are anchored to a sample I ran on a representative prod machine (EPYC 9354P, Micron 7450 NVMe). Actual performance will vary.

GPU math example is of course highly dependent on the math being done, but unless you’re running a full-on model GPUs are astoundingly fast especially with huge datasets.

Postgres queries will be dependent on indices, data size, and if anything is TOASTed (TOASTed columns are an order of magnitude slower).

Always benchmark

You can’t optimize what you can’t measure. Establish a baseline before you touch any code, then re-run the benchmark each time you make changes. Always write these notes down, ideally in a markdown file next to the code so that it’s easy to reference.

End-to-end benchmarks are a must. You MUST have end to end benchmarks. Inner benchmarks can also be useful, especially at the beginning when you’re figuring out your initial approach, but be wary of the work just moving from one place to another rather than actually getting faster.