Category Archives: JavaScript

Extending MongoDB findOne() functionality to findAnother()

Given a new MongoDB instance to work with, its often difficult to understand the structure since, unlike SQL, the database is inherently schema-less. A popular initial tool is Variety.js, which finds all possible fields and tells you the types, occurrences and percentages of them in the db.

However, it is often useful to see the contents of the fields as well. For this, you can use findOne(), which grabs what seems to be the last inserted record from the collection and pretty-prints it to the terminal. This is great, but sometimes you want to see more than one record to get a better feel for field contents. In this post I’ll show you how I extended Mongo to do this.

(Assuming you’ve already installed mongo and git – I use  brew install mongo from the homebrew package manager and git came with XCode on my Mac)

First off, let’s get a copy of mongo’s code:

This will have created a folder in your Documents, so let’s inspect it:

There’s quite a lot of files there, it will take a while to find where the findOne() function is located. Github’s repository search returns far too many results so let’s use a unix file search instead. MongoDB is written in JavaScript (and compiled) so we need to look for a function definition like findOne = function( or function findOne( . Try this search:

The two flags here are -l  (lowercase L) which shows filenames, and R  which searches all subdirectories. The fullstop at the end means to search the current directory. You can see it found a javascript file there, “collection.js”. If you open it up in a text editor, the findOne() function is listed:

This code can also be found here on the mongodb github page: https://github.com/mongodb/mongo/blob/master/src/mongo/shell/collection.js#L207

Our function is going to extend findOne instead to find another record. This can be implemented by doing a “find” using exactly the same code, but then skipping a random number of records ahead. The skip amount has to be less than the number of records listed which unfortunately means we have to run the query twice. First to count the number of results, and second to actually skip some amount.

Copy the findOne function and rename it to findAnother, with these lines at the top instead:

  1. Gets a count of the records the query returns (stored in total_records)
  2. Generates a random number in a range from 1 to the count (stored in randomNumber)
  3. Queries again using that number as a skip (stored in cursor)

Generating random numbers in a range is a little obscure in JavaScript but I found a helpful tip on StackOverflow to do it: http://stackoverflow.com/a/7228322 in one line. I’m used to Python’s ultra simple random.randrange().

Let’s test this out first. You’ll notice that all code can be returned in mongo’s javascript shell:

You can actually replace this code live in the terminal, though it won’t be saved once you close it. Try first replacing findOne with a Hello World:

You can test out our new function first in the terminal by copying everything after the first equals. Open a mongoDB shell to your favourite db and type db.my_collection.findOne =  then paste in the function. Try calling it and it should return different results each time.

Let’s patch mongodb now with our function. We have to compile mongo from the source we just downloaded.

  1. Save the collection.js file you just edited with the new findAnother() function
  2. Make sure you have a C++ compiler installed. I’m using gcc, which was installed using  brew install gcc
  3. Check how many cores your processor has. According to Google, my 2.7 GHz Intel i5 has 4 cores.
  4. In the same terminal window (still in the mongo folder), type: scons -j4 mongo  and press enter. The 4  here is the number of cores I have, and seriously speeds things up. scons  is a program that handles compilation, and putting mongo  at the end of this specifies that we only want to patch mongo’s client.

You’ll see a huge amount of green text as scons checks everything, and then:

Its compiled! We now have to replace the existing mongo application with our modified version. Let’s do a unix find:

We want to replace the mongo application in /usr/local/Cellar/ (where brew installed it to). Let’s back it up then copy across:

Now, open up a MongoDB shell to your favourite DB:

 Caveats:

  • This could be really slow since it has to run the query twice. However, typically I don’t add super taxing queries to findOne().
  • This function may be replaced if you reinstall/upgrade mongo with brew – they release new versions fairly frequently.

Improvements

  • A function that acts like a python generator, keeping state and cycling forward until it reaches the end of the record set. This would fix any slowness above, but be less random.

Converting an HTML table to an Excel download HTTP Response: A hack for slow OLAP DB connections

Overview

Zenko (“Good fox” in Japanese) is a reporting system (see code on Github here) I’ve created over the last couple of weeks at Mozilla. Basically my non-technical coworkers were getting so frustrated by Tableau (“what the heck is the difference between INNER JOIN and OUTER JOIN?”) that I decided to create a simple dashboard interface for them.

Its a simple bootstrap front-end to a database containing campaign stats for sponsored tiles. You can drill down to each tile or client/partner and pivot by things like locale, country and date.

Zenko’s stack (high to low)

A new feature

When loading one of the analyses pages, a table will be shown. My coworker wanted to be able to download the data to Excel. I came up with 4 possible ways to implement this:

  1. Simply rerun the query, format the results as a csv on the backend, save it and window.open() the file location.
  2. Automatically save the data from each analysis request server request and periodically clear old files.
  3. Use a javascript library like ExcelBuilder
  4. Send the data back to the server, format it, and then back to the client via an iframe

Which is the best solution?

  1. This is problematic because our sticking point is the query speed. The redshift database is an OLAP Column Oriented database, and append-only. This means that it is insanely fast to add data to, but quite slow (often 6+ seconds) to query. Yes, it is dealing with billions of rows so excusable, but its not so great in terms of user experience to wait so long.The user doesn’t want to wait another 6 seconds for the analysis to rerun when they have the data already.
  2. This sounds like it could just end up storing a lot of data on the client, but it could work quite well. In terms of security though, I’m not sure that the data should be lingering on the user’s PC unrequested though.
  3. This didn’t work out so well – in Firefox, the file is incorrectly named. In the future, I’d like to name the files according to the parameters of the analysis e.g. <client>-<date>-<country>.xls
  4. This is the weirdest solution, but it works! Flask is running locally so it is actually very fast. There are no huge JQuery/JavaScript complications with file permissions and the fact that you can manipulate the data easily on the server is nice too.

Solution 4

The process is as follows when the “Download for Excel” button is clicked:

  1. Reference the HTML table using JavaScript and convert it to an array of arrays
  2. Append an iframe to the DOM
  3. Append a form with a POST action and hidden field to the iframe
  4. Insert the table contents into the hidden field’s value
  5. Submit the form
  6. Let Flask receive the POST request and format the information as a CSV
  7. Return an HTTP response with a file attachment containing the CSV

Let’s implement it

There were various ways to do this in JQuery with iterable.each()  but I ran into complications and simply referencing cells using .children was much easier.

The (locally running) Flask will then recieve a POST request at /download_excel . Let’s set up the route:

Now, when the user clicks the button:

Download Link for Excel

They instantly get:

Download popup

Sorry, I can’t show what it looks like in Excel because the data isn’t public at the moment. All code is however available here on github!

One bizarre thing, however, is that the form doesn’t appear in the inspector (in either Chrome or Firefox):

Invisible in the inspector

Though, you can access it with some fairly lengthy getters:

why.jpg

Future features 

  • The files could be named something more intuitive than data.csv  – perhaps a combination of various things seen in the URL’s query string
  • Accommodate for a table wider than 6 rows. This could be done easily by stringifying the array using a different delimiter such as a “###”.
  • Create an .xls file rather than a CSV, if there is any advantage

Using Web Workers in Firefox Extensions

Web Workers allow you to run code in the background in browsers such as Firefox. This is how to build one into a Firefox Extension, which is slightly different than from just creating one as normal on a page. The documentation for doing this is basically non-existent, so hopefully you’ll find this useful.

Please make sure you have a development environment set up similar to the one described here in my previous post.

How do workers work?

  • Workers in /data/ are not directly connected to scripts in /lib/
  • However, they can communicate by sending messages to each other
  • These messages are text only, so could contain serialized JSON, but nothing else
  • You’ll notice below that we are basically just slinging messages between two scripts

The code for the worker

Navigate to the /data/ directory and create a file called hello_world.js

Now paste the following in there (new users of vim, press i  to start typing and Esc  followed by :wq  to save):

This says that whenever the worker receives a message from the client, then send a message back with the word “Hello” prepended.

One note here: In workers, you can’t use the useful function  console.log("message") , instead use  dump("message")

Let’s call the worker from the main code

Let’s navigate back to the /lib/  folder and edit the main.js  file, which is the first thing that runs in the extension.

Paste in the following code:

And run  cfx run . You’ll notice a messy error:

Aha! The key line here is:  ReferenceError: Worker is not defined . This is because Firefox Extensions use something called a ChromeWorker instead. We need to import this in main.js by pasting this at the top:

and changing the line that references the hello_world.js file to call a ChromeWorker instead:

Ok let’s try running it again! Try  cfx run . Wtf another error?!

The key line here is:  Malformed script URI: hello_world.js . This cryptic error is because firefox can’t yet access anything in the /data/  folder. We have to use another part of the SDK to enable access to it.

Open main.js  and put this at the top:

Now we can use the function self.data.url() . When you put a filename as the first argument, it will return a string like  resource://jid1-zmowxggdley0aa-at-jetpack/test/data/whatever_file.js which properly refers to it in the case of extensions. Modify the worker import line as follows:

Now let’s run the extension again using cfx run :

Yay it works! The Worker returned the message “Hello Matthew“.

FAQ

  • What does this {notation}  mean?

It is shorthand for:

Basically this means that  require("chrome") returns an Object, and we just need the value that is referenced by the key “ChromeWorker”. This is a very succinct way of extracting things from JavaScript Objects that will come in handy in the future.

  • Why is Worker now called ChromeWorker? Are we doing something with Google Chrome?

This is a naming coincidence and nothing to do with Chrome as in the browser. Chrome in this case refers to Firefox Addon internals.