There seems to be a general consensus in the world of startups that you should build an MVP (minimum viable product) without caring too much about technical scalability. I’ve heard many times that the only priority is to just get the product out there. As long as your business model would work at scale, you’re good. You shouldn’t waste time and money on making a technically scalable product. All you worry about is testing your assumptions, validating the market and gaining traction. Scalability is a concern for later. Unfortunately, this somewhat blind belief has led to some terrible failures. And Pokémon GO reminded us of it.
One person who won’t make this mistake again is Jonathan Zarra, the creator of GoChat for Pokémon GO. The guy who reached 1 million users in 5 days by making a chat app for Pokémon GO fans. Last week, as you can read in the article, he was talking to VCs to see how he could grow and monetize his app. Right after, GoChat went down. A lot of users lost and a lot of money spent. A real shame for a genius move.
The article states that Zarra had a hard time paying for the servers that were necessary to host 1M active users. He never thought to get this many users. He built this app as an MVP, caring about scalability later. He built it to fail. Zarra hired a contractor on Upwork to fix a lot of performance issues. The contractor stated that the server costs were around $4,000. Since my calendar says it’s 2016, I assume he isn’t talking about $4000 of hardware, but $4000 in monthly or yearly virtual server and traffic costs.
I’ve been designing and building web platforms for hundreds of millions of active users for most of my career. I can say $4,000 is a totally unnecessary amount of money for 1M users in a chat app. Even for an MVP. It means the app’s server tech was designed poorly. It’s not easy to build a cost-efficient, scalable system for millions of monthly users. But it’s also not terribly complicated to have some sort of setup that can handle at least a decent amount of users on some cheap servers in the cloud. You just have to take it into account when building the MVP, by making the right choices.
Similarly to GoChat, I also launched a Pokémon GO fan app last week, called GoSnaps. GoSnaps is an app to share Pokémon GO screenshots and images on a map. The Instagram/Snapchat for Pokémon GO. GoSnaps grew to 60k users its first day, 160k users on its second day and 500k unique users after 5 days (which is now). It has 150–200k uploaded snaps now. It has around 1000 concurrent users at any given time. I built image recognition software to automatically check if an uploaded image is Pokémon GO-related, and resizing tools for uploaded images. We run this whole setup on one medium Google Cloud server of $100/month, plus (cheap) Google Cloud Storage for the storage of images. Yes, $100. And it performs well.
Let’s compare GoChat and GoSnaps. Both apps probably fire a lot of requests per second to fetch chats/images within a certain area of the map. This is a geospatial lookup in the database (or search engine), either by a polygon of latitude/longitude locations or by a specific point. We use a polygon and we fire this request every time someone moves the map. These types of queries are heavy operations on a database, especially in combination with sorting or filtering. We get this type of search request hundreds of times per second. GoChat probably did too.
Unique to GoChat is that it had to fetch and post a lot of chat messages every second. The article about GoChat talks about 600 requests per second for the whole app. Those 600 requests are a combination of map requests and chat messages. These chat messages are small and could/should be done over a simple socket connection, but happen often and have to be distributed to other chatters. This is manageable with the right setup, but disastrous with a poor, MVP-like setup.
GoSnaps, on the other hand, has a lot of images being fetched and ‘liked’ every second. The snaps pile up on the server, since old snaps stay relevant. Old chats do not. Since the actual image files are stored in the Google Cloud Storage, the amount of requested image files is not a concern for me as a developer. Google Cloud handles this and I trust Google. But the requested snaps on the map are my concern. GoSnaps has image recognition software that looks for patterns on all uploads to see if an image is Pokémon-related or not. It also resizes the images and sends them to Cloud Storage. These are all heavy operations in terms of CPU and bandwidth. Way heavier than distributing some small chat messages, but less frequent.
My conclusion is that both apps are very similar in terms of scalability complexity. GoChat handles more small messages while GoSnaps handles larger images and heavier server operations. Designing an architecture for these two apps both require a slightly different approach, but are similarly complex.
GoSnaps is built as an MVP, not as a professional business product. It was built entirely in 24 hours. I took a NodeJS boilerplate project for hackathons and used a MongoDB database without any form of caching. No Redis, no Varnish, no fancy Nginx settings, nothing. The actual iOS app was built in native Objective-C code, with some borrowed Apple Maps-related code from Unboxd, our main app. So how did I make it scalable? By not being lazy.
Let’s say I would consider an MVP as solely a race against the clock to build a functional app as quick as possible, regardless of technical backend quality. Where would I have put my images? In the database: MongoDB. It would require no configuration and almost no code. Easy. MVP. How would I have queried the snaps within a certain area that got the most likes? By just running a plain MongoDB query on the entire pile of uploaded snaps. Just one database query on one database collection. MVP. All of this would have destroyed my app and the app’s feature.
Look at the query I would have had to run to get these snaps: “find all snaps within location polygon [A, B, C, D], excluding snaps marked as abuse, excluding snaps that are still being processed, ordered by number of likes, ordered by valid Pokémon GO snaps first and then ordered by newest first”. This works great on a small dataset, great, MVP. But this would have been totally disastrous under any type of serious load. Even if I would have simplified the above query to only include three conditions/sorting operations, it would have been disastrous. Why? Because this is not how a database is supposed to be used. A database should query only on one index at a time, which is impossible with these geospatial queries. You’ll get away with it if you don’t have a lot of users, but you’ll go down once you get successful. Like GoChat.
What did I do instead? After applying the CPU-expensive image recognition and doing resizing, the resized images are uploaded to Google Cloud Storage. This way the server and database don’t get hit for requesting images. The database should worry about data, not images. This saves many servers by itself. On the database side, I separate the snaps into a few different collections: all snaps, most liked snaps, newest snaps, newest valid snaps and so forth. Whenever a snap gets added, liked or marked as abuse, the code checks if it (still) belongs to one of those collections and acts accordingly. This way the code can query from prepared collections instead of running complicated queries on one huge pile of mess. It’s simply separating data logically into some simple buckets. Nothing complicated. But it allows me to query solely on the geospatial coordinates with one sorting operation, instead of a complex query as described above. In simple terms: it makes it straightforward to select data.
How much extra time did I spent on all of this? Maybe 2 to 3 hours. Why I did this in the first place? Because that’s just the way I set things up. I assume my apps will be successful. There’s no point in building an app assuming it won’t be successful. I would not be able to sleep if my app gains traction and then dies due to bad tech. I bake minimum viable scalability principles into my app. It’s the difference between happiness and total panic. It’s what I think should be part of an app MVP.
If I would have built GoSnaps with a slower programming language or with a big framework, I would have required more servers. If I would have used something like PHP with Symfony, or Python with Django, or Ruby on Rails, I would have been spending my days on fixing slow parts of the app now, or adding servers. Trust me, I’ve done it many times before. These languages and frameworks are great in many scenarios, but not for an MVP with low server budget. This is primarily due to the many layers of code that are usually used for mapping database records to logic and unnecessary framework code. It just simply hits the CPU too hard. Let me give you an example on how much this actually matters.
As said, GoSnaps uses NodeJS as the backend language/platform, which is generally fast and efficient. I use Mongoose as an ORM to make the MongoDB work straightforward as a programmer. I’m not a Mongoose expert by any means and I know the library by itself has a huge codebase. Therefore Mongoose was a red flag. But yeah, MVP. At one point last weekend, our server’s 4 NodeJS processes were running at 90% CPU each, which is unacceptable to me for 800–1000 concurrent users. I realized that it had to be Mongoose doing things with my fetched data. Apparently I simply had to enable Mongoose’s “lean()” function to get plain JSON objects instead of magical Mongoose objects. After that change, the NodeJS processes dropped to around 5–10% CPU usage. Just the simple logic of knowing what your code actually does is very important. It reduced the load by 90%. Imagine having a really heavy library, like Symfony with Doctrine. It would have required a couple of servers with many CPU cores to just execute the code alone, even though the database is supposed to be the bottleneck, not the code.
Choosing a lean and fast language is important for scalability, unless you have a lot of money for servers. Choosing a language with a lot of useful available libraries is even more important, since you want to build your MVP quickly. NodeJS, Scala and Go are good languages that cover both of these requirements. They provide a lot of good tools with a lot of good performance. A language like PHP or Java by itself is not necessarily slow, but is usually used together with large frameworks and codebases that make the application heavy. These languages are great for clean object oriented development and well-tested code, but not for quick and cheap scalability. I don’t want to start a big programming language argument, so let me just state that this is subjective and incomplete. I personally love Erlang and would never use it for an MVP, so all your arguments are invalid.
A few years ago, I co-founded Cloud Games, an HTML5 games publisher. When we started, we were a B2C gaming website focused on the MENA region. We spent a lot of effort on gaining users and reached 1M monthly active users (MAU) after a few months. At the time, I used PHP, Symfony2, Doctrine and MongoDB in a pretty simple and lean setup. I used to work for Spil Games with 200 million MAU, which used PHP at the time and then moved to Erlang. After Cloud Games reached approximately 100,000 MAU, we started to see real server pain with Doctrine and MongoDB due to the huge overhead of these PHP libraries. I did set up MongoDB, indexes and queries the right way, but the servers were having a hard time processing all the code. And yes, I used PHP’s APC cache and so forth.
We needed this cheap setup, since we were a self-funded, early-stage startup. Cloud Games is now doing well and still based on a cost-efficient NodeJS architecture. We might not have managed to be successful with a more costly tech setup, given the fact that we’ve been through some really tough times as a startup. Designing a low-cost, scalable architecture has been essential for success.
If there’s an opportunity for your app to grow exponentially due to hype or possible media coverage, make sure to consider scalability as part of your MVP. The principles of minimum viable products and scalable tech can coexist. There’s nothing sadder than building a successful app and seeing it fail because of technical issues. Pokémon GO itself has had a lot of issues, but is so unique and hyped that it didn’t matter. Small startups don’t have this luxury. Timing is everything. One million GoChat users and half a million GoSnaps users probably agree with me.
I’ve slightly edited the article, because GoChat is still alive in the Google Play store. The Google Play page says it’s “back 100%” with “over 2 million users”. An iOS version is supposed to come soon again.
If you liked this article, please like it here below on Medium. This would mean a lot to me. Feel free to comment for advice on scalability. At Unboxd, we’re always happy to see other apps grow! And New Yorkers: check out our app Cuisine! Thank you!