Wednesday, May 30, 2012

myVidster Embraces NoSQL

If you are a hardcore web techie you have seen the word NoSQL toss around.  In a nutshell its an alternative database management system.  Most websites (including myVidster) are build with mySQL, a traditional relational database management system (RDBMS).  For the most part mySQL is one of the best database solutions out there and will meet most (if not all) of your data needs.

As myVidster user base grows so does its need for more and more data.  When your data gets into millions of stored rows and you are trying to insert 3000+ rows of data every 2 minutes, mySQL faults begin to show.

In my experience mySQL does everything well expect for inserting large amounts of data in a given batch job and your queries can take a performance hit when you are have to sort large amounts of user data.  Most developers will agree with my issue with inserts, but some will argue that proper use of indexes will solve your performance issues when sorting.  I beg to differ.

Lets say you want to sort all user notification by date and return the last 5.  You will put an index on both the user id and the post date.  If the users has 5000 notifications, mySQL will run a filesort on the post date index (all 5000).  Disk I/O starts to become an issue when you are auto refreshing user's notification every minute.  Now if mySQL provided sorted indexes, then fetching the last 5 rows by date would not require a filesort, but sadly mySQL does not provide this feature.

I knew I have reach the limits of what mySQL can do so I decided to look into MongoDB. MongoDB is a NoSQL document oriented storage database.  During my testing MongoDB could insert documents (there like rows in a table) at a rate of 1000+/sec.  That solved my first problem, what about querying large data sets?  MongoDB provides sorted indexes and as long as you define the sort order of your indexes correctly (ascending or descending) MongoDB will not need to perform an nscanned (its like mySQL filesort) on the user's data.  I am also using MongoDB for backing up and achieving data as well.

Am I planning on replacing mySQL with MongoDB?  For now my answer is no, MongoDB does not support table linking and the code rewrite would be massive and the performance gain might not be worth it.  I do see myself using MongoDB for new features that I have in the pipeline.

Well, I think that is enough geek talk for now, take care and keep collecting!