If NoSQL is the king, MongoDB is surely its crown jewel. With over 15 million downloads and counting, MongoDB is the most popular NoSQL database today, empowering users to query, manipulate and find interesting insights from their data.
Alex Giamas is a Senior Software Engineer at the Department for International Trade, UK. Having worked as a consultant for various startups, he is an experienced professional in systems engineering, as well as NoSQL and Big Data technologies. Alex holds an M. Sc., from Carnegie Mellon University in Information Networking and has attended professional courses in Stanford University. He is a MongoDB-certified developer and a Cloudera-certified developer for Apache Hadoop & Data Science essentials. Alex has worked with a wide array of NoSQL and Big Data technologies, and built scalable and highly available distributed software systems in C++, Java, Ruby and Python.
In this insightful interview with MongoDB expert Alex Giamas, we talk about all things related to MongoDB – from why NoSQL databases gained popularity to how MongoDB is making developers’ and data scientists’ work easier and faster. Alex also talks about his book Mastering MongoDB 3.x, and how it can equip you with the tools to become a MongoDB expert!
- NoSQL databases have grown in popularity over the last decade because they allow users to query their data without having to learn and master SQL.
- MongoDB has grown from being just a JSON data store to become the most popular NoSQL database solution with efficient data manipulation and administration capabilities.
- The sharding and aggregation framework, coupled with document validations, fine-grained locking, a mature ecosystem of tools and a vibrant community of users are some of the key reasons why MongoDB is the go-to database for many.
- Database schema design, data modeling, backup and security are some of the common challenges faced by database administrators today.
- Mastering MongoDB 3.x focuses on these common pain points of the database administrators and shows them how to build robust, scalable database solutions with ease.
NoSQL databases seem to have taken the world by storm, and many people now choose various NoSQL database solutions over relational databases. What do you think is the reason for this rise in popularity?
That’s an excellent question. There are several factors contributing to the rise in popularity for NoSQL databases. Relational databases have served us for 30 years. At some point we realised that the one size fits all model is no longer applicable. While “software is eating the world” as Marc Andreessen has famously written, the diversity and breadth of use cases we use software for has brought an unprecedented specialisation in the level of solutions to our problems. Graph databases, column-based databases and of course document-oriented databases like MongoDB are in essence specialised solutions to particular database problems. If our problem fits the document-oriented use case, it makes more sense to use the right tool for the problem (e.g. MongoDB) than a generic one-size-fits-all RDBMS.
MongoDB is one of the most popular NoSQL databases out there today, and finds application in web development as well as Big Data processing. How does MongoDB aid in effective analytics?
In the past few years we have seen the explosive growth of generated data. 80% of the world’s data has been generated in the past 3 years and this will continue to happen even more in the near future with the rise of IoT. This data needs to be stored and most importantly analysed to derive insights and actions.
The answer to this problem has been to separate the transactional loads from the analytical loads into OLTP and OLAP databases respectively. Hadoop ecosystem has several frameworks that can store and analyse data. The problem with Hadoop data warehouses/data lakes however is threefold. You need experts to analyse data, they are expensive and it’s difficult to get quickly the answers to your questions. MongoDB bridges this gap by offering efficient analytics capabilities. MongoDB can help developers and technical people get quick insights from data that can help define the direction of research for the data scientists working on the data lake. By utilising tools like the new charts or the BI connector, data warehousing and MongoDB are converging.
MongoDB does not aim to substitute Hadoop-based systems but rather complement them and decrease the time to market for data-driven solutions.
You have been using MongoDB since 2009, way back when it was in its 1.x version. How has the database has evolved over the years?
When I started using MongoDB, it was not much more than a JSON data store. It’s amazing how far MongoDB has come in these 9 years in every aspect. Every piece of software has to evolve and adapt to the always changing environment. MongoDB started off as the JSON data store that is easy to setup and use while being blazingly fast with some caveats.
The turning point for MongoDB early in its evolution was introducing sharding. Challenging as it may be to choose the right shard key, being able to horizontally scale using commodity hardware is the feature that has been appreciated the most by developers and architects throughout all these years.
The introduction of aggregation framework was another turning point for MongoDB since it allowed developers to build data pipelines using MongoDB data, reducing time to market. Geospatial related features were there from an early point in time and actually one of MongoDB’s earliest and most visible customers, FourSquare was a vivid user of geospatial features in MongoDB.
Overall, with time MongoDB has matured and is now a robust database for a wide set of use cases. Document validations, fine grained locking, a mature ecosystem of tools around it and a vibrant community means that no matter the language, state of development, startup or corporate environment, MongoDB can be evaluated as the database choice.
There have been of course features and directions that didn’t end up as well as we were originally hoping for. A striking example is the MongoDB MapReduce framework which never lived up to the expectations of developers using MapReduce via Hadoop and has gradually been superseded by the more advanced and more developer-friendly Aggregation framework.
What do you think are the most striking features of MongoDB? How does it help you in your day to day activities as a Senior Software Engineer?
In my day to day development tasks I almost always use the Aggregation framework. It helps to quickly prototype a pipeline that can transform my data to a format that I can then collaborate with the data scientists to derive useful insights in a fraction of the time needed by traditional tools.
Day to day or sprint to the next sprint – what you want from any technology is to be reliable and not get in your way but rather help you achieve the business goals. With MongoDB we can easily store data in JSON format, process it, analyse it and pass it on to different frontend or backend systems without much hassle.
What are the different challenges that MongoDB developers and architects usually face while working with MongoDB? How does your book ‘Mastering MongoDB 3.x‘ help, in this regard?
The major challenge developers and architects face when choosing to work with MongoDB is the database design. Irrespective of whether we come from an RDBMS or a NoSQL background, designing the database such that it can solve our current and future problems is a difficult task. Having been there and struggled with it in the past, I have put emphasis on how someone coming from a relational background can model different relationships in MongoDB. I have also included easy to understand and follow checklists around different aspects of MongoDB.
Backup and security is another challenge that users often face. Backups are many times ignored until it’s too late. In my book I identify all available options and the tradeoffs they come with, including cloud-based options. Security on the other hand is becoming an ever increasing concern for computing systems with data leaks and security breaches happening more often. I have put an emphasis on security both in the relevant chapters and also across most chapters by highlighting common security pitfalls and promoting secure practices wherever possible.
MongoDB has commanded a significant market share in the NoSQL databases domain for quite some time now, highlighting its usefulness and viability in the community. That said, what are the 3 areas where MongoDB can get better, in order to stay ahead of its competition?
MongoDB has conquered the NoSQL space in terms of popularity. The real question is how/if NoSQL can increase its market share in the overall database market.
- The most important area of improvement is interoperability. What developers get with popular RDBMS is not only the database engine itself, but also easy ways to integrate it with different systems, from programming frameworks to Big Data and analytics systems. MongoDB could invest heavier in building these libraries that can make a developer’s life easier.
- Real-time analytics is another area with huge potential in the near future. With IoT rapidly increasing the data volume, data analysts need to be able to quickly derive insights from data. MongoDB can introduce features to address this problem.
- Finally, MongoDB could improve by becoming more tunable in terms of the performance/consistency tradeoff. It’s probably a bit too much to ask from a NoSQL database to support transactions as this is not what it was designed to be from the very beginning, but it would greatly increase the breadth of use cases if we could sparingly link up different documents and treat them as one, even with severed performance degradation.
Artificial Intelligence and Machine Learning are finding useful applications in every possible domain today. Although it’s a database, do you foresee MongoDB going the Oracle way and incorporating features to make it AI-compatible?
Throughout the past few years, algorithms, processing power and the sheer amount of data that we have available have brought a renewed trust in AI. It is true that we use ML algorithms in almost every problem domain, which is why every vendor is trying to make the developer’s life easier by making their products more AI-friendly. It’s only natural for MongoDB to do the same. I believe that not only MongoDB but every database vendor will have to gradually focus more on how to serve AI effectively, and this will become a key part of their strategy going ahead.
Please tell us something more about your book ‘Mastering MongoDB 3.x‘. What are the 3 key takeaways for the readers? Are there any prerequisites to get the most out of the book?
First of all, I would like to say that as a “Mastering” level book we assume that readers have some basic understanding of both MongoDB and programming in general. That being said, I encourage readers to start reading the book and try to pick up the missing parts along way. It’s better to challenge yourself than the other way around. As for the most important takeaways, in no specific order of importance:
- Know your problem. It’s important to understand and analyse as much as possible the problem that you are trying to solve. This will dictate everything, from data structures, indexing, to database design decisions, to technology choices. On the other hand, if the problem is not well defined then this may be the chance to shine for MongoDB as a database choice as we can store data with minimal hassle.
- Be ready to scale ahead of time. Whether that is replication or sharding, make sure that you have investigated and identified the correct design and implementation steps so that you can scale when needed. Trying to add an extra shard when load has already peaked in the existing shards is neither fun, nor easy to do.
- Use aggregation. Being able to transform data in MongoDB before extracting it for processing in an external database is a really important feature and should be used whenever possible, instead of querying large datasets and transforming their data in our application server.
Finally, what advice would you give to beginners who would like to be an expert in using MongoDB? How would the learning path to mastering MongoDB look like? What are the key things to focus on in order to master data analytics using MongoDB?
To become an expert in MongoDB, one should start by understanding its history and roots. They should understand and master schema design and data modelling. After mastering data modelling, the next step would be to master querying – both CRUD and more advanced concepts. Understanding the aggregation framework and how or when to index would be the next step.
With this foundation, one can then move on to cross-cutting concerns like monitoring, backup and security, understanding the different storage engines that MongoDB supports and how to use MongoDB with Big Data.
All this knowledge should then provide a strong foundation to move on to the scaling aspects like replication and sharding, with the goal of providing effective fault tolerance and high availability systems. Mastering MongoDB 3.x explains these topics in this order with the intention of getting from beginner to expert in a structured and easy to follow and understand way.