Dec
21
2010
--

MySQL 5.5.8 and Percona Server on Fast Flash card (Virident tachIOn)

This is to follow up on my previous post and show the results for MySQL 5.5.8 and Percona Server on the fastest hardware I have in our lab: a Cisco UCS C250 server with 384GB of RAM, powered by a Virident tachIOn 400GB SLC card.

To see different I/O patterns, I used different innodb_buffer_pool_size settings: 13G, 52G, an 144G on a tpcc-mysql workload with 1000W (around 100GB of data). This combination of buffer pool sizes gives us different data/memory ratios (for 13G – an I/O intensive workload, for 52G – half of the data fits into the buffer pool, for 144G – the data all fits into memory). For the cases when the data fits into memory, it is especially important to have big transactional log files, as in these cases the main I/O pressure comes from checkpoint activity, and the smaller the log size, the more I/O per second InnoDB needs to perform.

So let me point out the optimizations I used for Percona Server:

  • innodb_log_file_size=4G (innodb_log_files_in_group=2)
  • innodb_flush_neighbor_pages=0
  • innodb_adaptive_checkpoint=keep_average
  • innodb_read_ahead=none

For MySQL 5.5.8, I used:

  • innodb_log_file_size=2000M (innodb_log_files_in_group=2), as the maximal available setting
  • innodb_buffer_pool_instances=8 (for a 13GB buffer pool); 16 (for 52 and 144GB buffer pools), as it is seems in this configuration this setting provides the best throughput
  • innodb_io_capacity=20000; a difference from the FusionIO case, it gives better results for MySQL 5.5.8.

For both servers I used:

  • innodb_flush_log_at_trx_commit=2
  • ibdata1 and innodb_log_files located on separate RAID10 partitions, InnoDB datafiles on the Virident tachIOn 400G card

The raw results, config, and script are in our Benchmarks Wiki.
Here are the graphs:

13G innodb_buffer_pool_size:

In this case, both servers show a straight line, and it seems having 8 innodb_buffer_pool_instances was helpful.

52G innodb_buffer_pool_size:

144G innodb_buffer_pool_size:

The final graph shows the difference between different settings of innodb_io_capacity for MySQL 5.5.8.

Small innodb_io_capacity values are really bad, while 20000 allows us to get a more stable line.

In summary, if we take the average NOTPM for the final 30 minutes of the runs (to avoid the warmup stage), we get the following results:

  • 13GB: MySQL 5.5.8 – 23,513 NOTPM, Percona Server – 30,436 NOTPM, advantage: 1.29x
  • 52GB: MySQL 5.5.8 – 71,774 NOTPM, Percona Server – 88,792 NOTPM, advantage: 1.23x
  • 144GB: MySQL 5.5.8 – 78,091 NOTPM, Percona Server – 109,631 NOTPM, advantage: 1.4x

This is actually the first case where I’ve seen NOTPM greater than 100,000 for a tpcc-mysql workload with 1000W.

The main factors that allow us to get a 1.4x improvement in Percona Server are:

  • Big log files. Total size of logs are: innodb_log_file_size=8G
  • Disabling flushing of neighborhood pages: innodb_flush_neighbor_pages=0
  • New adaptive checkpointing algorithm innodb_adaptive_checkpoint=keep_average
  • Disabled read-ahead logic: innodb_read_ahead=none
  • Buffer pool scalability fixes (different from innodb_buffer_pool_instances)

We recognize that hardware like the Cisco UCS C250 and the Virident tachIOn card may not be for the mass market yet, but
it is a good choice for if you are looking for high MySQL performance, and we tune Percona Server to get the most from such hardware. Actually, from my benchmarks, I see that the Virident card is not fully loaded, and we may benefit from running two separate instances of MySQL on a single card. This is a topic for another round.

(Edited by: Fred Linhoss)


Entry posted by Vadim |
10 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Dec
13
2010
--

Write performance on Virident tachIOn card

This is crosspost from http://www.ssdperformanceblog.com/.

Disclaimer: The benchmarks were done as part of our consulting practice, but this post is totally independent and fully reflects our opinion.

One of the biggest problems with solid state drives is that write performance may drop significantly with decreasing free space. I wrote about this before (http://www.ssdperformanceblog.com/2010/07/free-space-and-write-performance/), using a
FusionIO 320GB Duo card as the example. In that case, when space utilization increased from 100GB to 200GB, the write performance
dropped 2.6 times.

In this regard, Virident claims that tachIOn cards provide “Sustained, predictable random IOPS – Best in the Industry”. Virident generously provided me model 400GB, and I ran the benchmark using the
same methodology as in my experiment with FusionIO, which was stripped for performance. Also using my script, Virident made runs on tachIOn 200GB and 800GB model cards and shared numbers with me ( to be clear I can certify only numbers for 400GB card, but I do not have reasons to question the numbers for 200GB and 800GB, as they corresponding to my results).

The benchmarks was done on Cisco UCS C250 box and raw results are on Benchmarks Wiki



Visually, the drop is not as drastic as it was in the case using FusionIO, but let’s get some numbers.
I am going to take the performance numbers at the points where the available space of the card is 1/3, 1/2, and 2/3 filled, as well as at the point where the card is full. Then I will compute the ratio of each of those IOS numbers to the IOS at the 1/3 point.

**For the 400GB tachIOn card:**

size Throughput, MiB/sec ratio
130 959.17
200 849.58 1.13
260 685.18 1.40
360 417.33 2.29

That is, at the 2/3 point, the 400GB card is slower by 29% than at the 1/3 point, and at full capacity it is slower by 57%.

Observations from looking at the graph:

* You can also see the card never goes below 400MB/sec, even when working at full capacity. This characteristic (i.e., high throughput at full capacity) is very important to know if you are looking to use an SSD card as a cache layer (say, with FlashCache), as, usually for caching, you will try to fill all available space.
* The ratio between the 1/3 capacity point and full capacity point is much smaller compared with FusionIO Duo (without additional spare capacity reservation).
* Also, looking at the graphs for Virident and comparing with the graphs for FusionIO, one might be tempted to say that Virident just has a lot of space reserved internally which is not exposed to the end user, and this is what they use to guarantee a high level of performance. I checked with Virident and they tell me that this is not the case. Actually from diagnostic info on Wiki page you can see: tachIOn0: 0×8000000000 bytes (512 GB), which I assume total installed memory. Regardless, it is not a point to worry about. For pricing, Virident defines GB as the capacity available for end users. So, a competitive $/GB level is maintained, and it does not matter how much space is reserved internally.

Now it would be interesting to compare results with results for FusionIO. As cards have different capacity I made graph which shows
throughput vs percentage of used capacity for both cards, FusionIO 320GB Duo SLC and Virident 400GB SLC

Util % Duo 320GB tachIOn 400GB advantage percent
20% 1,095 990 90%
30% 1,006 980 97%
40% 825 964 117%
50% 613 872 142%
60% 397 783 197%
70% 308 669 217%
80% 237 611 258%
90% 117 502 429%
100% 99 417 421%

In conclusion:
* On single Virident card I see random write throughput close or over 1GB/sec in with low space usage and it is comparable with throughput I’ve got on stripped FusionIO card. I assume Virident maintain good level of parallelism internally.
* Virident card maintains very good throughput level in close to full capacity mode, and that means you do not need to worry ( or worry less) about space reservation or formatting card with less space.


Entry posted by Vadim |
7 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Jun
14
2010
--

Virident tachIOn: New player on Flash PCI-E cards market

(Note: The review was done as part of our consulting practice, but is totally independent and fully reflects our opinion)

In my talk on MySQL Conference and Expo 2010 “An Overview of Flash Storage for Databases” I mentioned that most likely there are other players coming soon. I actually was not aware about any real names at that time, it was just a guess, as PCI-E market is really attractive so FusionIO can’t stay alone for long time. So I am not surprised to see new card provided by Virident and I was lucky enough to test a pre-production sample Virident tachIOn 400GB SLC card.

I think it will be fair to say that Virident targets where right now FusionIO has a monopoly, and it will finally bring some competition to the market, which I believe is good for the end users. I am looking forward to price competition ( not having real numbers I can guess that vendors still put high margin in the price) as well as high performance in general and stable performance under high load in particular, and also competition in capacity and data reliability areas.

Priceline for Virident tachIOn cards already shows the price competition: oriented price for tachIOn 400GB is 13,600$ (that is 34$/GB) , and entry-base card is 200GB with price 6,800$ (there also is 300GB card in product line). Price for FusionIO 160GB SLC ( from dell.com, price on 14-Jun-2010 ) is 6,308.99$ ( that is 39.5$/GB)

Couple words about product, I know that Virident engineering team was concentrating on getting stable write performance in long running
write activities and in cases when space utilization is close to 100%. As you may know (check my presentation) SSD design requires background
“garbage collector” activity, which requires space to operate and Virident card already has enough space reservation to get stable write performance even when the disk is almost full.

As for reliability, I think, the design of the card is quite neat. The card by itself contains bunch of replaceable flash modules, and each individual module can be changed in case of failure. Also internally modules are joined in RAID (it is fully transparent for end user).

All this guarantees good level of confidence in data reliability: if a single module fails, the internal RAID will allow to continue operations, and after the replacement of module – it will be rebuilt. It still leaves the controller on card as single point of failure, but in this case all flash modules can be safely relocated to the new card with working controller. (Note: It was not tested by Percona engineers, but taken from vendor’s specification)

As for power failures – flash modules also come with capacitors which guarantees data delivery to final media even if power is lost on the main host. (Note: It was not tested by Percona engineers, but taken from vendor’s specification)

Now to most interesting part – performance numbers. I took sysbench fileio benchmark with 16KB blocksize to see what maximal performance we can expect.

Server specification is:

  • Supermicro X8DTH series motherboard
  • 2 x Xeon E5520 (2.27GHz) processors w/HT enabled (16 cores)
  • 64GB of ECC/Registered DDR3 DRAM
  • Centos 5.3 2-6.18.164 Kernel
  • Filesystem is XFS formatted with mkfs.xfs -s size=4096 option ( size=4096, sector size, is very important to have aligned IO requests) and mounted with nobarrier option
  • Benchmark: sysbench fileio on 100GB file, 16KB blocksize

The raw results are available on Wiki

And the graphs for random read, writes and sequential writes:

I think very interesting to see distribution of 95% response time results ( 0 time is obviously the problem in sysbench, which has no enough time resolution for such very fast operations)

As you can see we can get about 400MB/sec random write bandwidth with 8-16 threads and
with response time below 3.1ms (for 8 threads) and 3.8ms (16 threads) in 95% of cases.

As some issue here, I should mention, that despite the good response time results,
the maximal response time in some cases can jump to 300 ms per request, and I was told
it corresponds to garbage collector activity and will be fixed in the production release of driver.

I think it would be fair to get comparison with FusionIO card, especially for write pressure case
As you may know FusionIO recommends to have space reservation to get sustainable write performance
(Tuning Techniques for Writes).

I took FusionIO ioDrive 160GB SLC card, and tested fully formatted card (filesize 145GB), card formatted with 25% space reservation (file size 110GB), and Virident card 390GB filesize. It also allows us to see if Virident tachIOn card can sustain write in fully utilized card.

As disclaimer I want to mention that Virident tachIOn card was fine tuned by Virident engineers, while FusionIO card was tuned only by me and I may not have all knowledge needed for FusionIO tuning.

First graph is random reads, so see compare read performance

As you see in 1 and 4 threads FusionIO is better, while with more threads Virident card scales better

And now random writes:

You can see that FusionIO definitely needs space reservation to provide high write bandwidth, and it comes with
cost hit ( 25% space reservation -> 25% increase $/GB).

In conclusion I can highlight:

  • I am impressed with architecture design with replaceable individual flash modules, I think it establishes new high-end standard for flash devices
  • With single card you can get over 1GB/sec bandwidth in random reads (16-64 working threads), and it is the maximal results what I’ve seen so far ( again for single card)
  • Random write bandwidth exceeds 400MB/sec (8-16 working threads)
  • Random read/write mix results are also impressive, and it can be quite important in workloads like FlashCache, where card have both concurrent read and write pressure
  • Quite stable sequential writes performance (important in question for log related activity in MySQL)

I am looking forward to present results in sysbench oltp, tpcc workload, and also in FlashCahce mode.


Entry posted by Vadim |
15 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com