Monthly Archives: February 2016

Extending MongoDB findOne() functionality to findAnother()

Given a new MongoDB instance to work with, its often difficult to understand the structure since, unlike SQL, the database is inherently schema-less. A popular initial tool is Variety.js, which finds all possible fields and tells you the types, occurrences and percentages of them in the db.

However, it is often useful to see the contents of the fields as well. For this, you can use findOne(), which grabs what seems to be the last inserted record from the collection and pretty-prints it to the terminal. This is great, but sometimes you want to see more than one record to get a better feel for field contents. In this post I’ll show you how I extended Mongo to do this.

(Assuming you’ve already installed mongo and git – I use  brew install mongo from the homebrew package manager and git came with XCode on my Mac)

First off, let’s get a copy of mongo’s code:

This will have created a folder in your Documents, so let’s inspect it:

There’s quite a lot of files there, it will take a while to find where the findOne() function is located. Github’s repository search returns far too many results so let’s use a unix file search instead. MongoDB is written in JavaScript (and compiled) so we need to look for a function definition like findOne = function( or function findOne( . Try this search:

The two flags here are -l  (lowercase L) which shows filenames, and R  which searches all subdirectories. The fullstop at the end means to search the current directory. You can see it found a javascript file there, “collection.js”. If you open it up in a text editor, the findOne() function is listed:

This code can also be found here on the mongodb github page: https://github.com/mongodb/mongo/blob/master/src/mongo/shell/collection.js#L207

Our function is going to extend findOne instead to find another record. This can be implemented by doing a “find” using exactly the same code, but then skipping a random number of records ahead. The skip amount has to be less than the number of records listed which unfortunately means we have to run the query twice. First to count the number of results, and second to actually skip some amount.

Copy the findOne function and rename it to findAnother, with these lines at the top instead:

  1. Gets a count of the records the query returns (stored in total_records)
  2. Generates a random number in a range from 1 to the count (stored in randomNumber)
  3. Queries again using that number as a skip (stored in cursor)

Generating random numbers in a range is a little obscure in JavaScript but I found a helpful tip on StackOverflow to do it: http://stackoverflow.com/a/7228322 in one line. I’m used to Python’s ultra simple random.randrange().

Let’s test this out first. You’ll notice that all code can be returned in mongo’s javascript shell:

You can actually replace this code live in the terminal, though it won’t be saved once you close it. Try first replacing findOne with a Hello World:

You can test out our new function first in the terminal by copying everything after the first equals. Open a mongoDB shell to your favourite db and type db.my_collection.findOne =  then paste in the function. Try calling it and it should return different results each time.

Let’s patch mongodb now with our function. We have to compile mongo from the source we just downloaded.

  1. Save the collection.js file you just edited with the new findAnother() function
  2. Make sure you have a C++ compiler installed. I’m using gcc, which was installed using  brew install gcc
  3. Check how many cores your processor has. According to Google, my 2.7 GHz Intel i5 has 4 cores.
  4. In the same terminal window (still in the mongo folder), type: scons -j4 mongo  and press enter. The 4  here is the number of cores I have, and seriously speeds things up. scons  is a program that handles compilation, and putting mongo  at the end of this specifies that we only want to patch mongo’s client.

You’ll see a huge amount of green text as scons checks everything, and then:

Its compiled! We now have to replace the existing mongo application with our modified version. Let’s do a unix find:

We want to replace the mongo application in /usr/local/Cellar/ (where brew installed it to). Let’s back it up then copy across:

Now, open up a MongoDB shell to your favourite DB:

 Caveats:

  • This could be really slow since it has to run the query twice. However, typically I don’t add super taxing queries to findOne().
  • This function may be replaced if you reinstall/upgrade mongo with brew – they release new versions fairly frequently.

Improvements

  • A function that acts like a python generator, keeping state and cycling forward until it reaches the end of the record set. This would fix any slowness above, but be less random.