Monday, June 6, 2011

MongoDB Performance: Group vs. Find vs. MapReduce

Lately, I've been playing a lot with MongoDB's group command. It's a powerful command that allows you to write some really interesting queries quickly. Here's an example of one that calculates "Tags" on a blog site:
app.get('/find/other', function(req, res) {
var reduce = function(obj, prev) {
obj.tags.forEach(function(tag) {
prev.tagCounts[tag] = prev.tagCounts[tag] ? prev.tagCounts[tag] + 1 : 1;
})
};
db.reviews.group({}, {}, {tagCounts:{}}, reduce, function(err, result) {
// get the tags
var tags = result[0].tagCounts,
sorted = [],
scale = 0;
// sort the tags
for (tag in tags) {
sorted.push({
tag: tag,
count:tags[tag],
});
}
// sort the tags
sorted.sort(function(a, b) {
return a.count > b.count;
});
// figure out the scale and apply it
scale = sorted[sorted.length - 1].count;
sorted.forEach(function(tag) {
tag.num = (tag.count / scale).toFixed(1);
});
// randomize
sorted.sort(function() {
return 0.5 - Math.random()
})
// render tags page
res.render('find/tags.html', {
tags: sorted
});
});
});
view raw tags.js hosted with ❤ by GitHub


In my tests it also proved to be about 4x faster than a similar MapReduce, however, it comes with a severe cost: It blocks all reading from the collection. This is a huge problem and basically makes it worthless for doing serious queries on a database with say hundreds of thousands of users like I have in my day job. From what I can tell finds, distincts, and mapReduces don't block and some combination of those provide non-blocking alternatives.

Here's a simple map reduce example:
// groups users by type
var map = function() {
emit(this.type, 1)
}
var reduce = function(key, values) {
var result = 0
values.forEach(function(value) {
result += value
})
return result
}
var options = {out:{inline:1}}
db.users.mapReduce(map, reduce, options)
view raw mr.js hosted with ❤ by GitHub


One more note, I recently read MongoDB: The Definitive Guide, which had a lot of examples and clarification that are not readily available in the online documentation. I highly suggest checking it out!

1 comment:

  1. Worth noting that group blocking appears to be fixed as of 1.9.2 - https://jira.mongodb.org/browse/SERVER-1395

    ReplyDelete