David Cramer's Blog

Python, Django, and Scale.

What Powers Curse

Since we’re having some issues at Curse (hardware now :P) and because our technologies page doesnt explain everything, below are more technical details.

Software:


  • Apache2 + mod_python for the web servers

  • lighttpd + fcgi (mostly for static, we had SEVERE issues using fcgi for the entire Django site, weakref stuff, never found a solution)

  • memcached – every web server runs a copy and I believe we have 2gb allocated per-server

  • Squid + lighttpd for managing requests between anonymous and logged in users

  • Sphinx – our full-text search; the code may not reflect the current version (i’ll update it soon) but check out djangosnippets

  • MySQL – until someone can prove us wrong with facts saying PostgreSQL can scale better


Hardware:

  • 5 web servers (8 yesterday): 4-8GB of memory, 2x intel core duos, memcached + apache2

  • 2 static servers (one runs a small adserver script, another runs our sphinx search daemon): lighttpd + fcgi

  • 2 media (download) server (running small python apps to generate images/similar): lighttpd + fcgi

  • 2 cache servers: squid + lighttpd

  • 2 sql servers (one is inactive but replicated): 2x intel core duos, MySQL, 6GB of memory (being upgraded :P)

  • 1 dev/deployment server: the stats aren’t anything out of the ordinary – it runs various daemons as well as powering cursebeta.com


What happens under the hood:

  • The cache servers run both lighttpd and squid, forwarding requests to logged in users past the squid caches.

  • The squid caches then round robin to the web servers (which has caused extra stress due to the hardware not matching, we’re working on changing this now).

  • The web servers rely heavily on memcached for key components. We cache every django.contrib.contenttypes request, every django.contrib.sites request, etc..

  • We have modified various components of Django, with patches similar to what’s linked above, and with changes to select_related and some smaller systems (nothing major).

  • Most accessors to common are cached – the method we use is messy so I’ll post more when we change it.

  • Anonymous sessions are disabled – we’re still trying to manage a way to enable them and not destroy the website.

  • We use a lot of custom middleware, including a custom internationalization backend to handle our URL schemas + handle translations in the database (vs compiled files – yes, we know it’s slower)

Comments