Whether you’re doing master / slave, or partitioning data, when your product gets large enough you’ll need the ability to route data to various nodes in your database. Django (as of 1.2) out of the box provides a pretty cool solution called a Database Router. Here at DISQUS we have a large set of data, and one this of course brings the need to implement some of these fairly standard solutions.
The first solution that many companies will choose is a master / slave setup. This is the most common of all database scaling techniques and is very easy to setup in modern RDBMS solutions. In Django, this also comes very easy with a few lines of code:
1234567
classMasterSlaveRouter(object):"Sends reads to 'slave' and writes to 'default'."defdb_for_write(self,model,**hints):return'default'defdb_for_read(self,model,**hints):return'slave'
Now while this won’t scale very far (if you’re not using a proxy or bouncer, this is a single slave), it also brings a lot of other problems with it. The dreaded replication lag will hit you no matter your size (ever notice Facebook not being in “sync”), and can be fairly difficult to work around. Not going to dive into details here, but there are many ways to lessen visibility of this delay by using caching as well as doing some of your reads off your master nodes.
The other solution I want to talk about is partitioning. We’re going to specifically talk about vertical partitioning, or the act of separating data by purpose. This is another very easy to implement solution which just requires you to move tables to other servers. Again, in Django this is very easy to implement with routers:
1234567
classPartitionByApp(object):"Send reads to an app-specific alias, and writes to the 'default'."defdb_for_write(self,model,**hints):return'default'defdb_for_read(self,model,**hints):returnmodel._meta.app_label
We’re currently working on splitting of a fairly large set of data over here, so we whipped up a little bit more flexible solution using routers. Our needs were simple: assign an app (or a model) to a separate database cluster. Here’s what we came up with:
fromdjango.confimportsettingsclassPrimaryRouter(object):_lookup_cache={}default_read=Nonedefault_write='default'defget_db_config(self,model):"Returns the database configuration for `model`"ifmodelnotinself._lookup_cache:conf=settings.DATABASE_CONFIG['routing']app_label=model._meta.app_labelmodule_name=model._meta.module_namemodule_label='%s.%s'%(app_label,module_name)ifmodule_labelinconf:result=conf[module_label]elifapp_labelinconf:result=conf[app_label]else:result={}self._lookup_cache[model]=resultreturnself._lookup_cache[model]defdb_for_read(self,model,**hints):db_config=self.get_db_config(model)returndb_config.get('read',db_config.get('write',self.default_read))defdb_for_write(self,model,**hints):db_config=self.get_db_config(model)returndb_config.get('write',self.default_write)defallow_relation(self,obj1,obj2,**hints):# Only allow relations if the models are on the same databasedb_config_1=self.get_db_config(obj1)db_config_2=self.get_db_config(obj2)returndb_config_1.get('write')==db_config_2.get('write')defallow_syncdb(self,db,model):db_config=self.get_db_config(model)allowed=db_config.get('syncdb')# defaults to both read and write serversifallowedisNone:allowed=filter(None,[self.db_for_read(model),self.db_for_write(model)])ifallowed:# FIX: TEST_MIRROR passes the mirrored alias, and not the originatingforkinallowed:ifdb==k:returnTrueifdb==settings.DATABASES[k].get('TEST_MIRROR')ork:returnTruereturnFalse
To use this, we simply define a key called routing in our DATABASE_CONFIG.
1234567891011121314151617181920
# Note: this isn't how we partition our models, its just an exampleDATABASE_CONFIG={'routing':{# defaults for all models in forums'forums':{'write':'default','read':'default.slave',},# override for forums.Forum'forums.forum':{'write':'cluster2','read':'cluster2.slave',},# override for forums.Post'forums.post':{'write':'default','read':'default.slave',},},}
A future post will cover how we’ve started moving to a dictConfigurator to make inheritance in many of our settings much easier.