Coming in MongoDB 4.2

Over the coming weeks we’ll be introducing you to some of the features that will be making it into MongoDB 4.2. For many of you, this will be all the information you need to be ready for when 4.2 goes generally available (GA). This week, we’re looking at Wildcard Indexing.

Wildcard Indexing

One common source of work for a MongoDB admin or developer is working out what fields will be or are being accessed by their applications and then creating indexes that match that workload. Sometimes, it's impossible because the fields in question are part of an unstructured dataset - that's one where there's different fields in different hierarchies in each document - so there's no way to predict what the index should be.

This is where 4.2's new Wildcard Indexes come in. Think of Wildcard Indexes as a filter that can automatically match any field, sub-document or array in a collection and then indexes anything that matches. Only the matches to that filter are added to a sparse index. The filtering is flexible enough to include and exclude specific fields from the index. You can, for example, skip indexing a field in a very large string value to lighten the index.

When To Wildcard

Now, before you go Wildcard Indexing your entire database, the fact is that it is not a replacement for workflow specific indexing on fields. But where you have a polymorphic pattern within your data, it can make for a simpler schema design and Wildcard Indexes can make searching the polymorphic data faster.

Examples of this type of schema can be found in product catalog, e-commerce, social data and IoT applications. A product catalog will have records with a number of predictable fields for general product data and then product-type specific sub-documents. An e-commerce retailer will have one set of fields for their furnishings, another for clothing and yet another for electrical goods. By applying Wildcard Indexes to the polymorphic parts of the records, the developer can benefit from being able to perform efficient, index-supported queries on those fields.

Another place where polymorphic documents can be common is content management systems (CMS). Each record in a CMS will represent a different type of content. Each one of these will have different needs as to what data is associated with the content, from sources and attributions on text to copyright owners and EXIF data on images. Again, the polymorphic pattern helps us manage this data and Wildcard Indexes will allow us to do index supported queries.

Even when you’re not handling polymorphic document structures, there are other situations where Wildcard Indexes can come into effective play. Where you have a data warehouse or other pool of data for analytics, creating a Wildcard Index on that data can allow a wider range of ad-hoc queries that benefit from having an index to speed them up.

Covered queries are supported by Wildcard Indexes. If a query's results can be obtained from the data in the index then MongoDB will use that data, saving a round trip to retrieve the document. Collation is also supported allowing language-specific rules to be applied to string comparisons within the index.

Unlike other non-tabular databases, MongoDB's Wildcard Indexes are updated synchronously and atomically with changes to the data. That means it avoids the "eventually" indexed scenario offered by many of those other databases and never returns outdated or stale data from the index.

Wildcard Indexes In Practice

Let's begin with some sample data, in this case some book information. The developers started with good intentions - there's some common attributes like color and size, but it wasn't long before arbitrary attributes were being added like "inside bookmarks"…

{ "type":"book", "title":"The Red Book", "attributes": { "color":"red", "size":"large", "inside": { "bookmark":1, "postitnote":2 }, "outside": { "dustcover": "worn" } }
{ "type":"book", "title":"The Blue Book", "attributes": { "color":"blue", "size":"small", "inside": { "map":1 }, "outside": { "librarystamp": "Local Library" } }
{ "type":"book", "title":"The Green Book", "attributes": { "color":"green", "size":"small", "inside": { "map":1, "bookmark":2 }, "outside": { "librarystamp": "Faraway Library", "dustcover": "good" } }

As you can see, we have an attributes field which contains a variable selection of other fields and values. Say we want to find the books which have a value of 2 in the bookmark field, we might use a query like:

db.example.find({ "attributes.inside.bookmark": 2} })

And if we ask MongoDB to explain how it's going to handle that by using explain() we see that it is going to do a collection scan (COLLSCAN), working through every document looking for the field.

> db.example.find({ "attributes.inside.bookmark": 2 }).explain()
{ "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.example", "indexFilterSet" : false, "parsedQuery" : { "attributes.inside.bookmark" : { "$eq" : 2 } }, "queryHash" : "F33E15E9", "planCacheKey" : "F33E15E9", "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "attributes.inside.bookmark" : { "$eq" : 2 } }, "direction" : "forward" }, "rejectedPlans" : [ ] }, "ok" : 1

Now let's create a wildcard index. The command looks like this:

> db.example.createIndex({ "attributes.$**": 1 });
{ "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1

The important part is the $** in the index specification. It says, simply, from this field and all its subdocuments, create a wildcard index. As we only need to put a wildcard index on the attributes field, we precede it with attributes. to limit the wildcard index's scope. We haven't specified anything else. There are no hints about the fields and values below attributes.Now if we repeat our query:

> db.example.find({ "attributes.inside.bookmark": 2 }).explain()
{ "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.example", "indexFilterSet" : false, "parsedQuery" : { "attributes.inside.bookmark" : { "$eq" : 2 } }, "queryHash" : "F33E15E9", "planCacheKey" : "92EE47A6", "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "$_path" : 1, "attributes.inside.bookmark" : 1 }, "indexName" : "attributes.$**_1", "isMultiKey" : false, "multiKeyPaths" : { "$_path" : [ ], "attributes.inside.bookmark" : [ ] }, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 2, "direction" : "forward", "indexBounds" : { "$_path" : [ "[\"attributes.inside.bookmark\", \"attributes.inside.bookmark\"]" ], "attributes.inside.bookmark" : [ "[2.0, 2.0]" ] } } }, "rejectedPlans" : [ ] }, "ok" : 1

The COLLSCAN is gone, replaced by an index scan (IXSCAN) on our new wildcard index. Internally, for a wildcard index each field that exists within our attributes field has been indexed as a path and its value, similar to a compound index, and there's an entry in the index for every field in the hierarchy. Where the field value is a subdocument, the indexing descends into the subdocument and repeats the process.

As the explain() results above show, that makes it simply a matter of scanning the index for the matching path. With a wildcard index, there's no creating multiple indexes and trying to find an exact fit of indexes to match your users access patterns or variable document structure.

Wrapping Up

Before the availability of Wildcard Indexes, users would either restructure their data to make it somewhat simpler to index, create many indexes and reduce performance or add an external search engine to their MongoDB deployments to get this kind of flexibility. Now you can simplify your platform architecture by consolidating all query types against MongoDB 4.2.

You can try wildcard indexes out now by downloading the 4.2 release candidate. Read the guide below first before installing it.

If you are eager and want to explore the very latest in MongoDB technology — and you have a safe testing area away from production — then we’d like to introduce you to MongoDB development releases and release candidates.

A quick guide to how MongoDB is built

One thing we’ve always done and MongoDB is build our next releases out in the open and the process for MongoDB 4.2 is no different. You may wonder where the alpha, beta or other greekly named variant of 4.2 is and right now there isn’t one. That’s because we initially build the next version of MongoDB with an odd-numbered minor version number.

So, when we released MongoDB 4.0 we also created MongoDB 4.1 as the development version. Both are being worked on, one with an eye to stability and reliability, one with an eye on building new features. When 4.1 is ready, it becomes a release candidate for 4.2, and after that has been tested in the community, that becomes a stable 4.2 release. Work will then begin on version 4.3. As we write this, we have reached 4.2 release candidate status.

There’s an abundance of new features coming in the next release of MongoDB and they are appearing in the development build as each feature is ready. Your essential reference will be the release notes, which you’ll find on the documentation site under its own version tab. In the current case, that’s the Release Notes for MongoDB 4.2 (Release Candidate). This is a living document and regularly modified - always check it.

Downloading the development build.

You can find the development build in the download center. Just select the latest development release in the Version drop down menu and continue as usual, selecting OS, package format and clicking on download.

Giving feedback and reporting Issues

If you want to give general feedback on MongoDB 4.2 RC0 and later, head over to the MongoDB User Google Group. For bug reporting, follow these instructions on how to submit a bug to the MongoDB JIRA.