You are here
High-Performance Communties: Hardware
This article is the first in a series for community owners who have outgrown a VPS or a larger shared hosting environment, and need their own server to handle similar load levels without running into bottlenecks, but want advice about what they need, from someone who has been there. Since both of my major communities are adult, I created this site in part to serve as a decidedly non-adult repository for presenting what I have learned.
For my own experience, I run two of the most active communities on the Internet - Elliquiy Adult Roleplaying Forums and the Blue Moon Sexual Role Playing Forum. The splash page of each forum does not involve anything racy, but the images and text inside of said forums might not be so gentle on the pure of mind - be forewarned.
As you take up more and more of your host's resources on a machine - whether high-end VPS or high-end shared - you are going to consume more and more of its I/O bandwidth. On a good host, this will typically be higher than what your first dedicated server would be - however, you are at the mercy of everyone else on the machine. And you will notice, repeatedly. When you or your host get fed up with this, it's time to move to a dedicated machine.
This is a typical 15-minute snapshot from iotop on Elliquiy's server. Around a hundred people and bots hit the site in any given minute, with around thirty of them hitting it every three seconds in the case of the AJAX chat. This somewhat obscures what is going on, since it's the average over a 'long' period of time. While the typical throughput is ~30 kb/sec read from disk and ~350 kb/sec written, the writes occur in bursts from the binlog thread (the top MySQL one) and the individual thread making the actual commit to the database (the multitude of MySQL threads below) of about 3mb every ten seconds or so, during normal loads. CPU rarely tops out - most lag either occurs because of some form of locking, or simple congestion due to too many writes being required.
As of this writing, the server is a simple Core2 Duo machine with four gigs of memory, and a pair of 7200 RPM drives on a hardware RAID 1 controller. A few of my peers split their disks between binlogs, database, and everything else instead - this is one of those performance, security, price, pick two issues. If I lose a disk, it's irritating, but my performance is less. If they lose a disk, it's potentially catastrophic.
Regardless, even with this scaled-down hardware, I have few issues supporting three hundred members, thousands of guests, and fifty people in the AJAX shoutbox at the same time. This is on a total forum size of about three million posts and two million private messages. For me, posts are the primary consumer of RAM, but this is not always the case in other forums. Most communities with three million posts and 3,000+ posts per day will have a lot more active users, putting that much more strain on the server, although the ajax chat is a hog on its own.
This general setup - 4GB of ram, dual core processor, two disks in RAID 1 - is fairly easy to get on the cheap for a 1U machine. My host sells them on the cheap in their overstock section - you need to ask for the RAID specifically, and possibly bring the other specs up to par, but it's quite worth it in my opinion.
Eventually - hopefully - you are going to stretch the limits of this first machine, however. The specific needs are going to vary depending on the type of software you use and what your community is oriented around.
There are certainly some easy gains to be made with multiple servers. DNS, mail, backups, static web files, and even sphinx (for searching) can all be split off for a varying boosts to performance, depending on the nature of your community. None of this touches on the issues surrounding scaling up dynamic content, however, which is much more of a problem.
Think twice before splitting the stack serving your dynamic content versus getting a larger machine. When you do this, you are communicating between these layers via TCP sockets, rather than unix domain sockets, and in the case of your community script, if that gets divided across multiple machines, you can no longer use a unified cache like APC or XCache in the case of php - these will both outperform memcached on a per-machine basis. Even here, however, splitting your script processing between multiple machines is only going to reduce CPU load - it's not going to help your poor database's voracious apetite for disk I/O.
To increase your database server's throughput, you have several options:
- RAID: I never buy a server without a RAID 1, personally. A RAID 10 would be one option to double write rates, however. Striping more than two disks is asking for some sort of failure to occur to occur during a rebuild, however, and keep in mind that live rebuilds are extremely slow if there is only one mirror - increasing your risk of catastrophe.
- RAID Controller: A controller with a lot of battery or supercapacitor-backed cache can use write-back caching, which can drastically increase performance as a lot of writes are going to be due to the same users modifying the same data.
- High-Performance Disks: 10k and 15k RPM disks, as well as high-performance SSD's supporting TRIM (with an appropriate OS)
- Multiple Arrays: In particular, assuming you like the binlog, placing the binlog on one array, the database on another, and have a final array for your mostly static files.
Taking these steps will carry a long way in terms of raw capability - in my case, supporting thousands of concurrent posters rather than just hundreds. Of course, this is a moving limit - what I am doing with one machine now was barely conceivable a decade ago. Non-volatile storage technology has a lot of room for improvement, after all.
Moving beyond this point - actually moving beyond the simple, single relational database setup - is beyond the scope of this series, but I imagine I will revisit that later after I cross that boundary myself.