Tips for Scaling a Web App
As many of you know, I’ve been working on things over at iBegin for the past 6 months. One of the things we did was a complete rewrite of our platform which includes a local business listings directory. While doing this, I had the goal in mind to make it as scalable as possible, and keep the caching as simple as possible. I wanted to give everyone a brief rundown on our philosophy and how we’ve done that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
1 2 3 4 5 6 7 8 9 10 11 12
http://www.ibegin.com/directory/us/new-york/new-york/acura-of-manhattan-662-11th-ave/Obviously, we have a lot of information in the URL. The typical schema here is that a business has a foreign key to a city, which has a foreign key to a state, which has a foreign key to a country. Ouch! To avoid the relational problems which this schema would create, we store the country, state, and city, all as foreign key references within each actual listing. Even more so, we store the slugs for each one as well.
1 2 3 4 5 6 7 8 9 10 11 12 13
As you can see, we hook the save method here to ensure that if our city is changed, we update the related fields, country, and state. Please note that has_changed() is not part of the Django core.
In this same example, a typical Django application might have been built with singular indexes on country, state, and city, as they are all Foreign Key references and that is the standard in Django. This is one of the first things you should be looking at when optimization your database. In our situation, we could look up a business listing by the country, by the state, or by the city, but it’s always the country, the country and state, or the country, state, and city, so we can optimize our index here:
INDEX (`country`, `state`, `city`)
Since indexes work left to right, this index will handle all three of the above queries, and on our dataset of 12 million records, takes 1 or 2 milliseconds to return a typical dataset.
These are just a few of the tricks we use to optimize things at iBegin, but they are some of the most critical. We also use composite primary keys to handle a semi-shared dataset (we have around 13 million businesses listed), a lot of save triggers such as seen above, and many summary tables. However, we have not modified Django’s core for any performance optimizations and we are able to do 10-20ms requests without a problem.