There are about a gazillion FAQs and HOWTOs out there that talk about XFS configuration, RAID IO alignment, and mount point options. I wanted to try to put some of that information together in a condensed and simplified format that will work for the majority of use cases. This is not meant to cover every single tuning option, but rather to cover the important bases in a simple and easy to understand way.
Let’s say you have a server with standard hardware RAID setup running conventional HDDs.
RAID setup
For the sake of simplicity you create one single RAID logical volume that covers all your available drives. This is the easiest setup to configure and maintain and is the best choice for operability in the majority of normal configurations. Are there ways to squeeze more performance out of a server by dividing the logical volumes: perhaps, but it requires a lot of fiddling and custom tuning to accomplish.
There are plenty of other posts out there that discuss RAID minutia. Make sure you cover the following:
- RAID type (usually 5 or 1+0)
- RAID stripe size
- BBU enabled with Write-back cache only
- No read cache or read-ahead
- No drive write cache enabled
Partitioning
You want to run only MySQL on this box, and you want to ensure your MySQL datadir is separated from the OS in case you ever want to upgrade the OS, but otherwise keep it simple. My suggestion? Plan on allocating partitions roughly as follows, based on your available drive space and keeping in mind future growth.
- 8-16G for Swap –
- 10-20G for the OS (/)
- Possibly 10G+ for /tmp (note you could also point mysql’s tmpdir elsewhere)
- Everything else for MySQL (/mnt/data or similar): (sym-link /var/lib/mysql into here when you setup mysql)
Are there alternatives? Yes. Can you have separate partitions for Innodb log volumes, etc.? Sure. Is it work doing much more than this most of the time? I’d argue not until you’re sure you are I/O bound and need to squeeze every last ounce of performance from the box. Fiddling with how to allocate drives and drive space from partition to partition is a lot of operational work which should be spent only when needed.
Aligning the Partitions
#fdisk -ul Disk /dev/sda: 438.5 GB, 438489317376 bytes 255 heads, 63 sectors/track, 53309 cylinders, total 856424448 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00051fe9 Device Boot Start End Blocks Id System /dev/sda1 2048 7813119 3905536 82 Linux swap / Solaris Partition 1 does not end on cylinder boundary. /dev/sda2 * 7813120 27344895 9765888 83 Linux /dev/sda3 27344896 856422399 414538752 83 Linux
- Start with your RAID stripe size. Let’s use 64k which is a common default. In this case 64K = 2^16 = 65536 bytes.
- Get your sector size from fdisk. In this case 512 bytes.
- Calculate how many sectors fit in a RAID stripe. 65536 / 512 = 128 sectors per stripe.
- Get start boundary of our mysql partition from fdisk: 27344896.
- See if the Start boundary for our mysql partition falls on a stripe boundary by dividing the start sector of the partition by the sectors per stripe: 27344896 / 128 = 213632. This is a whole number, so we are good. If it had a remainder, then our partition would not start on a RAID stripe boundary.
Create the Filesystem
XFS requires a little massaging (or a lot). For a standard server, it’s fairly simple. We need to know two things:
- RAID stripe size
- Number of unique, utilized disks in the RAID. This turns out to be the same as the size formulas I gave above:
- RAID 1+0: is a set of mirrored drives, so the number here is num drives / 2.
- RAID 5: is striped drives plus one full drive of parity, so the number here is num drives – 1.
# mkfs.xfs -d su=64k,sw=4 /dev/sda3 meta-data=/dev/sda3 isize=256 agcount=4, agsize=25908656 blks = sectsz=512 attr=2 data = bsize=4096 blocks=103634624, imaxpct=25 = sunit=16 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=50608, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
The XFS FAQ is a good place to check out for more details.
Mount the filesystem
Again, there are many options to use here, but let’s use some simple ones:
/var/lib/mysql xfs nobarrier,noatime,nodiratime
Setting the IO scheduler
This is a commonly missed step related to getting the IO setup properly. The best choices here are between ‘deadline’ and ‘noop’. Deadline is an active scheduler, and noop simply means IO will be handled without rescheduling. Which is best is workload dependent, but in the simple case you would be well-served by either. Two steps here:
echo noop > /sys/block/sda/queue/scheduler # update the scheduler in realtime
And to make it permanent, add ‘elevator=<your choice>’ in your grub.conf at the end of the kernel line:
kernel /boot/vmlinuz-2.6.18-53.el5 ro root=LABEL=/ noapic acpi=off rhgb quiet notsc elevator=noop
This is a complicated topic, and I’ve tried to temper the complexity with what will provide the most benefit. What has made most improvement for you that could be added without much complexity?