Computers are faster than you think

Computers are incredibly fast, and you can almost always make your code faster. When optimizing, you should be able to keep optimizing until you’re hitting these limits. Analyze your code and figure out what the fundamental limits are.

Handy references

Category	Metric	Notes
CPU clock	3.8 GHz; 0.26 ns/cycle	1 ms = 3.8M cycles
L1 cache	~13 ms/GB, hot scan	32 KB/core max
L2 cache	~13 ms/GB, hot scan	1 MB/core max
L3 cache	~17 ms/GB, hot scan	32 MB/slice; 256 MB total
Main memory	~24 ms/GB	41 GB/s
CPU to GPU	~30 ms/GB over PCIe x16
SSD	~360 ms/GB, NVMe RAID0	3 GB 1080p movie: ~1.1 s
US HTTP request	~90 ms warm; ~180-270 ms cold	SF to Dallas; cold adds TCP + TLS
GPU math	CPU ~0.2 ms; GPU ~0.05 ms, resident	`grid[i] *= 1.01` for 1M cells
Postgres indexed read, local	~2 ms	`SELECT ... WHERE id = ?`
Postgres indexed read, same datacenter	~4 ms	App server calls DB server
Postgres write	~6 ms, single durable row	`INSERT` one row and commit

NB: CPU, cache, memory, and SSD are anchored to a sample I ran on a representative prod machine (EPYC 9354P, Micron 7450 NVMe). Actual performance will vary.

GPU math example is of course highly dependent on the math being done, but unless you’re running a full-on model GPUs are astoundingly fast especially with huge datasets.

Postgres queries will be dependent on indices, data size, and if anything is TOASTed (TOASTed columns are an order of magnitude slower).

Always benchmark

You can’t optimize what you can’t measure. Establish a baseline before you touch any code, then re-run the benchmark each time you make changes. Always write these notes down, ideally in a markdown file next to the code so that it’s easy to reference.

End-to-end benchmarks are a must. You MUST have end to end benchmarks. Inner benchmarks can also be useful, especially at the beginning when you’re figuring out your initial approach, but be wary of the work just moving from one place to another rather than actually getting faster.