
Mongo and Case-Insensitive Queries
I recently had to analyze an incident where our Mongo servers became completely overloaded and simply could not process requests in a timely-manner.
I began my analysis by downloading the logs from mongodb. Since we use Mongo hosted, I had to get the logs from Atlas: View and Download MongoDB Logs — MongoDB Atlas. I decided to get an entire 24 hours worth of logs for our 8 server mongo cluster, which turned out to be a huge amount of data. Once unzipped, I had nearly 3GB of log data to sift through. Fortunately, I was made aware of mtools, which really helped find turn the raw data into something useful. This command would rather quickly parse through the log file and create a table of the queries along with some useful statistics
mloginfo mongo.log --queries > queries.log
The output would look something like this:
QUERIES
namespace operation pattern count min (ms) max (ms) 95%-ile (ms) sum (ms) mean (ms) allowDiskUse
e3c3fbe7-04d0-430a-9b65-0c527e2240d4-ls.students find {"$or": [{"email": 1}, {"userName": 1}]} 94004 136 148299 77347.7 1158787373 12327.0 None
e3c3fbe7-04d0-430a-9b65-0c527e2240d4-ls.clients find {"loginName": 1} 15967 200 231710 75783.8 214490232 13433.3 None
The namespace is comprised of the database and the collection. The pattern describes in general terms what the query that is being made looks like. Familiarity with your code base will help you locate the places where you make these queries. One thing that is not immediately obvious with these queries are what values we are trying to match against. Examining the code will help answer that. Finally we have the number of times the query was made under count
and then some statistics of how long these queries took. As you can see, during our outage, some of these queries were taking an unacceptable amount of time, which some taking nearly 4 minuts to complete.
Also, it is not obvious whether or not the fields being searched are indexed. Hopefully they are, but do some research to see if you are querying against indexed fields. I used Compass to check the collections and fields and found that we did have these fields indexed, so I next turned to looking at the queries themselves and I found the following usage:
{ "loginName": { "$regex": paginationParams.searchQuery, "$options": "i" } }
The use of regexes caught my eye, so I went to do some research on how a regex query behaved with regards to the index and found this following tidbit.
If the sole reason for using a regex is case insensitivity, use a case-insensitive index instead, as those are faster.
source: Performance Best Practices: Indexing | MongoDB
Case-insensitive queries are best accomplished in Mongo with collation indexes, which gives us the case-insensitive index we need to optimize this use case:
Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.
source: Indexes — MongoDB Manual
You can use a regex query in mongo and still take advantage of the index, but you must pin the regex to the beginning of the string like so and it works better if it not a case-insensitive query:
{ "loginName": { "$regex": /^test/ } }
The case-insensitive indexes are still the preferred solution for case-insensitive searches, but it is good to know that a regex query can be acceptable query from a performance standpoint, because it does give the ability to do partial matches, so if the database collection contained:
{
loginName: "testUser"
}
then this query would find it:
{ "loginName": { "$regex": /^test/ } }
In a small index, a case-insensitive regex might still do an index scan, but this behavior becomes unreliable on larger collection. But I digress. If your queries are being constructed simply to take advantage of case-insensitivity, then a collation index is best.
To construct a collation index with Mongoose, you would make the following call on your database schema:
clientSchema.index({
"loginName": 1
},
{
name: "loginName_collation_1",
collation: {
locale: "en_US",
strength: 1
}
});
source: Mongoose v5.11.17: Schemas
This creates a collation index for loginName
with a locale of en_US
and strength of 1, which is the
Primary level of comparison. Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.
See Collation — MongoDB Manual for other strength levels.
With this index in place, you can then make queries against the collection:
db.getCollection('clients').find({ loginName: test })
.collation({ locale: "en_US", strength: 1 })
Appending collation
to query informs the database that you would like to use the collation index that was configured with those parameters with this call. You have to explicitly call the collation you want to use as Mongo allows you to construct different collation indexes for different locales and with different strengths depending on your needs and use-cases. If you didn't specify the collation
then Mongo would either use a regular index, if available, or the query wouldn't use an index at all.
Give this a try if you need to make case-insensitive queries on your data. I think you'll find that using collation indexes will result in much more reliable and performant queries, especially on larger collections. Just be aware of the drawbacks. Moving from a regex queries to one that specifies a collation will mean that you lose any kind partial matching, but if partial matching is a requirement for you, then you probably need a full-blown text search index like Lucene, but that is a topic for another article.
—
You've found yourself on the site of Travis Ennis, a software engineer who lives in Indiana. If you'd like, you can contact me.