Computer Latency at a Human Scale

By David Jeppesen

How fast are computer systems really? Those of us who work in technology can blithely rattle off the clock speeds of processors, but what do those numbers mean? For example, each core of a server based on the Intel Xeon processor E5 v4 family might perform 2.4 billion clock ticks—and might execute many times that number of instructions—in the second or so that it takes to read and register this fact.

But these are numbers so gargantuan as to be completely outside of human experience. Let’s rescale things a bit to get a visceral sense of these machines we use every day.

In his book Systems Performance: Enterprise and the Cloud, performance architect and author Brendan Gregg scaled the relative time that different computer operations took by making a single CPU cycle equivalent to one second.[1] The book is a few years old, so here is a table with a couple of updates based on the clock speed of the Intel Xeon processor E5 and with some additional enterprise storage technologies thrown in:[2]

System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel Optane memory access <10 μs 7 hrs
NVMe SSD I/O ~25 μs 17 hrs
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet call: San Francisco to New York City 65 ms[3] 5 years
Internet call: San Francisco to Hong Kong 141 ms3 11 years

Now, as with any performance numbers, there is room to quibble over the specifics of these numbers. Any such numbers will be highly sensitive to test configurations and methodologies. However, the important thing is that these numbers are accurate within the correct order of magnitude, which viscerally highlights some important realities about system latency. (Note, for latency figures that span a range of values, like SSD and HDD I/O, I graphed the mean value below.)

LatencyBlog

As we move farther for the CPU, latency rises, but it doesn’t do so smoothly. Moving from memory to storage (even fast storage like Intel Optane or SSDs with NVMe) is a huge performance hit. The move from solid-state storage to spinning-disk storage is likewise huge—as is moving from disks of any kind to Internet calls. This is to be expected, but the scale of these increases to latency is all the clearer when laid out in approachable terms.

This brings up a more important point. As you build systems or applications, what are you trying to optimize? We all know that application calls to storage are slower than working in memory. But if, for example, your application uses microservices—if you are essentially turning program function calls into network calls—the latency difference between memory and storage pales in comparison to the latency that network and Internet calls will introduce.[4] It’s the same story if your app depends on cloud-based data. If your application depends on network or Internet calls that take 10 or 20 times longer than even the slowest calls to storage, optimizing your code to work in memory (or at least to minimize storage access) could be wasted effort. You might be better served figuring out how to bring new services to market faster instead.

As IT researcher Gene Kim put it in his novel The Phoenix Project, “improvements made anywhere besides the bottleneck are an illusion.”[5] Because of the scale of modern computing, we often only have an intellectual understanding of where our real system bottlenecks lie. By looking at computer latency at the human scale, we can begin to recognize our bottlenecks more viscerally and thus make faster, better decisions and avoid illusions.

For more insights like these, be sure to Twitter (@ProwessConsult) ‏or check out some of our other blog posts.

[1] Gregg, Brendan. “Systems Performance: Enterprise and the Cloud.” March 2015. www.brendangregg.com/sysperfbook.html. A CPU cycle refers to a single tick of a processor’s internal clock. It is during the ticks of this clock that processors work their way through the pipeline of instructions awaiting computation.

[2] Some modifications in this table are based on: Intel. “Memory Performance in a Nutshell.” June 2016. https://software.intel.com/en-us/articles/memory-performance-in-a-nutshell.

[3] AT&T. “Network Latency.” May 2017. http://ipnetwork.bgtmo.ip.att.net/pws/network_delay.html.

[4] Hat tip to Nick Humrich’s blog post “Yes, Python is Slow, and I Don’t Care” for this particular example.

[5] Kim, Gene, Kevin Behr, and George Spafford. The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win. Chapter 7.

Share this: