Why building your own Deep Learning computer is 10x cheaper than AWS

By Jeff Chen

Gorgeous interiors of your Deep Learning Computer
If you’ve used, or are considering, AWS/Azure/GCloud for Machine Learning, you know how crazy expensive GPU time is. And turning machines on and off is a major disruption to your workflow. There’s a better way. Just build your own Deep Learning Computer. It’s 10x cheaper and also easier to use. Let’s take a closer look below.
This is part 1 of 3 in the Deep Learning Computer Series. Part 2 is ‘How to build one’ and Part 3 is ‘How to benchmark performance’. Follow me to get the new articles. Leave questions and thoughts in comments below!

The machine I built costs $3k and has the parts shown below. There’s one 1080 Ti GPU to start (you can just as easily use the new 2080 Ti for Machine Learning at $500 more — just be careful to get one with a blower fan design), a 12 Core CPU, 64GB RAM, and 1TB M.2 SSD. You can add three more GPUs easily for a total of four.

$3K of computer parts before tax. You’ll be able to drop the price to about $2k by using cheaper components, which is covered in the next post.

Assuming your 1 GPU machine depreciates to $0 in 3 years (very conservative), the chart below shows that if you use it for up to 1 year, it’ll be 10x cheaper, including costs for electricity. Amazon discounts pricing if you have a multi-year contract, so the advantage is 4–6x for multi-year contracts. If you are shelling out tens of thousands of dollars for a multi-year contract, you should seriously consider building at 4–6x less money. The math gets more favorable for the 4 GPU version at 21x cheaper within 1 year!

Cost comparisons for building your own computer versus renting from AWS. 1 GPU builds are 4–10x cheaper and 4 GPU builds are 9–21x cheaper, depending on how long you use the computer. AWS pricing includes discounts for full year and 3 year leases (35%, 60%). Power consumption assumed at $0.20 / kWh, and 1 GPU machine consumes 1 kW / h and 4 GPU machine consumes 2 kW / h. Depreciation is conservatively estimated at linear w/ full depletion in 3 years. Additional GPUs at $700 each, before tax.

There are some draw backs, such as slower download speed to your machine because it’s not on the backbone, static IP is required to access it away from your house, you may want to refresh the GPUs in a couple of years, but the cost savings is so ridiculous it’s still worth it.

If you’re thinking of using the 2080 Ti for your Deep Learning Computer, it’s $500 more and still 4-9x cheaper for a 1 GPU machine.

The reason for this dramatic cost discrepancy is that Amazon Web Services EC2 (or Google Cloud or Microsoft Azure) is expensive for GPUs at $3 / hour or about $2100 / month. At Stanford, I used it for my Semantic Segmentation project and my bill was $1,000. I’ve also tried Google Cloud for a project and my bill was $1,800. This is with me carefully monitoring usage and turning off machines when not in use — major pain in the butt!

Even when you shut your machine down, you still have to pay storage for the machine at $0.10 per GB per month, so I got charged a couple hundred dollars / month just to keep my data around.

For the 1 GPU $3k machine you build (Power: 1 kW/h), you will break even in just 2 months if you are using it regularly. This is not to mention you still own your computer and it hasn’t depreciated much in 2 months, so building should be a no-brainer. Again, the math gets more favorable for the 4 GPU version (Power: 2 kW/h) as you’ll break even in less than 1 month. (Assumes power is $0.20 / kWh)

Your $700 Nvidia 1080 Ti performs at 90% speed compared to the cloud Nvidia V100 GPU (which uses next gen Volta tech). This is because Cloud GPUs suffer from slow IO between the instance and the GPU, so even though the V100 may be 1.5–2x faster in theory, IO slows it down in practice. Since you’re using a M.2 SSD, IO is blazing fast on your own computer.

You get more memory with the V100, 16GB vs. 11GB, but if you just make your batch sizes a little smaller and your models more efficient, you’ll do fine with 11GB.

Compared with renting a last generation Nvidia K80 online (cheaper at $1 / hour), your 1080 Ti blows it out of the water, performing 4x faster in training speed. I validated that it’s 3x-4x faster in my own benchmark (I will show you how to benchmark in a subsequent post). K80 is 12GB per GPU, which is a tiny advantage to your 11GB 1080 Ti.

There’s a reason why datacenters are expensive: they are not using the Geforce 1080 Ti. Nvidia contractually prohibits the use of GeForce and Titan cards in datacenters. So Amazon and other providers have to use the $8,500 datacenter version of the GPUs, and they have to charge a lot for renting it. This is customer segmentation at its finest folks!

You also need to decide whether to buy a computer or build your own. Though it’s utterly unimaginable to me that an enthusiast would choose to buy instead of build, you’ll be happy to know that it’s also 40–50% cheaper to build. Pre-builts cost at least $5k, here are Some buying options: this one, and that one.

It’s not necessary to buy one. You see, the hard part about building is finding the right parts for machine learning and making sure they all work together, which I’ve done for you! Physically building the computer is not hard, a first-timer can do it in less than 6 hours, a pro in less than 1 hour.

When new gen hardware comes out every year, there’s a stepwise drop on last gen hardware. For example, when AMD came out with the Threadripper 2 CPUs, it slashed the price of the 1920X processor from $800 to $400. You can take immediate advantage of these drops, and keep $$$ in your pocket.

I looked at some of the off-the-shelf builds, and some cannot go to 4 GPUs or are not optimized for performance. Some examples of issues: a CPU does not have 36+ PCIe lanes, the motherboard cannot physically have 4 GPUS plugged in, the power supply is less than 1400W, a CPU is less than 8 cores. I will discuss the nuances of part picking in the next post.

You can also make sure the design aesthetic is awesome (I personally find some of the common computer cases hideously ugly), the noise profile is low (some gold rated power supplies are very loud), and the parts make sense for Machine Learning (SATA3 SSD is 600MB/sec while M.2 PCIe SSD is a whopping 5x faster at 3.4GB/sec).

In the next post I will discuss how to pick components, avoid common pitfalls, and build your machine. If you want to get a head start, you can use my public parts list with pricing and get going.

Why is expandability important in a Deep Learning Computer?
If you don’t know how much GPU power you’ll need, the best idea is to build a computer with 1 GPU and add more GPUs as you go along.

Why 4 GPUs?
You want to add as many as you can to amortize the cost of the rest of your system. I was only able to find motherboards/cpu combos that support 4 GPUs with reasonable costs.

Will you help me build one?
Happy to help with questions via comments / email. I also run the www.HomebrewAIClub.com, some of our members may be interested in helping.

How can I make my computer even cheaper?
Buy parts off eBay (you can get a 1080 Ti Founders Edition for $600 and a 1920X CPU for $400 as of 09/2018).

How does my computer compare to Nvidia’s $49,000 Personal AI Supercomputer?
Nvidia’s Personal AI Supercomputer uses 4 GPUs (Tesla V100), a 20 core CPU, and 128GB ram. I don’t have one so I don’t know for sure, but latest benchmarks show 25–80% speed improvement. Nvidia’s own benchmark quotes 4x faster, but you can bet their benchmark uses all V100’s unique advantages such as half-precision and won’t materialize in practice. Remember your machine only costs $4.5k with 4 GPUs, so laugh your way to the bank.

I got a lot of help from other articles while researching the build, if you’re interested in reading further I’ve listed them here: Michael Reibel Boesen’s post, Gokkulnath T S’s post, Yusaku Sako’s post, Tim Dettmer’s blog, Vincent Chu’s post, Puget System’s PCIe 16x vs. 8x post, QuantStart’s rent vs. buy analysis, Tom’s Hardware’s article.

Thank you to my friends Evan Darke, Nick Guo, James Zhang, Khayla Sill, and Imogen Grönninger for reading drafts of this.