Jul
14
2015
--

MongoDB benchmark: sysbench-mongodb IO-bound workload comparison

In this post I’ll share the results of a sysbench-mongodb benchmark I performed on my server. I compared MMAP, WiredTiger, RocksDB and TokuMXse (based on MongoDB 3.0) and TokuMX (based on MongoDB 2.4) in an IO-intensive workload.

The full results are available here, and below I’ll just share the summary chart:

MongoDB benchmarks: sysbench-mongodb IO-bound workload

I would like to highlight that this benchmark was designed to emulate a heavy IO load on a (relatively) slow IO subsystem. This use case, I believe, is totally valid and represents frequently used “cloud” setups with limited memory and slow IO.

The WiredTiger engine, as B-Tree based, is expected to perform worse comparing to RocksDB and Toku Fractal Trees, which, are designed to handle IO-intensive workloads. My assumption is that WiredTiger will perform better (or even outperform others) for CPU intensive in-memory workloads (see for example Mark Callaghan’s results). Also WiredTiger is expected to perform better with faster storage.

The post MongoDB benchmark: sysbench-mongodb IO-bound workload comparison appeared first on MySQL Performance Blog.

Mar
15
2015
--

Deep dive into MySQL’s innochecksum tool

Percona XtraBackupOne of our Percona Support customers recently reported that Percona XtraBackup failed with a page corruption error on an InnoDB table. The customer thought it was a problem or bug in the Percona XtraBackup tool. After investigation we found that an InnoDB page was actually corrupted and a Percona XtraBackup tool caught the error as expected and hence the backup job failed.

I thought this would be an interesting topic and worthy of a blog post. In this article I will describe the innochecksum tool, when and how to use it and what are the possible fixes if an InnoDB table suffers from page corruption.

The innochecksum tool is an offline tool that prints checksums for InnoDB files. This tool reads an InnoDB tablespace file, calculates the checksum for each page and reports mismatches, if any. A checksum mismatch is an indication of corrupt pages. Being as offline tool, innochecksum can’t be used on tablespace file that a MySQL server is currently using,  hence you need to shutdown the server prior to running the innochecksum tool. If you try to run the innochecksum tool on a running MySQL server, then there is a chance that innochecksum crashes or reports a bad checksum for a good page, resulting false positives results. There is chance when you run innochecksum on a tablespace file that is opened by mysqld, that pages are dirty and not checksummed yet by the InnoDB storage engine itself.

The point: don’t run innochecksum against a running server.

InnoDB corruption can be caused by many factors (e.g. power lost, faulty hardware, bugs).  The InnoDB storage engine validates calculated checksum while reading pages from a tablespace on disk to the stored checksum in the page. In case, InnoDB finds page checksum mismatch it will force down the MySQL server.

Let me show you a page corruption error identified by Percona XtraBackup during a backup run in which the backup failed afterward.

[01] xtrabackup: Database page corruption detected at page 25413, retrying...
[01] xtrabackup: Database page corruption detected at page 25413, retrying...
[01] xtrabackup: Database page corruption detected at page 25413, retrying...

First, we need to identify if the tablespace is really corrupted for that particular table. I do that with the help of the innochecksum utility as shown below. As I mentioned earlier, make sure to shut down MySQL before using the innochecksum tool.

$ innochecksum -p 25413 /path/to/datadir/database_name/table_name.ibd

I passed the -p (page) flag for innochecksum to only check the specific pages that were reported corrupt by Percona XtraBackup. Without passing any option to the innochecksum tool, it will check entire tablespace for corruption which will required additional server downtime. The innochecksum tool also supports the -d (debug) option to print the checksum for each page and the -v (verbose) parameter to print a progress indicator. You can find more details in the manual. If the tool reports page corruption then database table is really corrupted as below.

page 25413 invalid (fails log sequence number check)

In order to fix this issue, the first thing you should try is to mysqldump the corrupted table and If mysqldump succeeded then problem exists in secondary indexes for that tablespace. This is because the mysqldump utility doesn’t touch indexes as indexes are created after all rows are inserted.

If mysqldump succeeds then the problem is associated with indexes. I would suggest following options to fix the corruption.

— Execute OPTIMIZE TABLE on that table which rebuilds indexes. The table will be locked during the operation prior to MySQL 5.6.17. Since MySQL 5.6.17 OPTIMIZE TABLE is an online operation.
— Rebuild table with the pt-online-schema-change tool from Percona Toolkit. This will give the same result as OPTIMIZE TABLE a non-blocking way as the pt-online-schema=change tool is online schema change tool.
— Drop all secondary indexes and then recreate them. The table will be locked during that operation for writes only. Again, you can use pt-online-schema-change tool for this purpose without sacrificing read/writes ability on the table during the drop and create indexes operation.

Finally, I would suggest to re-run the innochecksum tool to verify the tables integrity again as this will make sure there is no more page corruption. In this case we found that the table was actually corrupted and fixing table corruption through the backup/reload table fixed the problem and Percona XtraBackup ran fine during the next run.

It is possible that mysqldump crashes a MySQL server for a corrupted table. Fortunately, Percona Server contains innodb_corrupt_table_action which you can enable. The configuration variable is dynamic in nature, this means enabling it doesn’t requires a MySQL server restart. Prior to Percona Server 5.6 innodb_corrupt_table_action was known as innodb_pass_corrupt_table. Once you enable this option, you can try mysqldump again. If you are using Oracle MySQL then I would suggest to try this with innodb_force_recovery in case mysqldump fails to dump the table contents.

As a side note, if your backup is successful without any errors while performing a backup with Percona Xtrabackup, this means your InnoDB tables don’t have any page checksum mismatch or corruption. Percona XtraBackup can validate page checksums and in case of errors it  logs error and exists as I mentioned above.

There is also a modified version of the innochecksum made available by Facebook’s Mark Callaghan and can be found in this bug report which provides extra stats on tablespace undo blocks. There is another tool made by Google’s Jeremy Cole known as the InnoDB Ruby Tool to examine the internals of InnoDB.

LIMITATIONS:

  • Innochecksum is an offline InnoDB checksum tool. This means you must stop MySQL server. Otherwise it produces “Unable to lock file” error since MySQL 5.7.
  • Old versions of innochecksum only supports files up to 2GB in size. However, since MySQL 5.6 innochecksum supports files greater than 2GB in size.
  • Percona Server variable innodb_corrupt_table_action is supported on tables existing in their tablespace (i.e. innodb_file_per_table).
  • If you are using compressed tables (ROW_FORMAT=COMPRESSED) , then you must use innochecksum from MySQL 5.7.2 or greater, as earlier versions of innochecksum don’t support compressed tables. Check this bug for details.

New Features for the innochecksum tool from MySQL 5.7:

  • As I mentioned above, since MySQL 5.7 innochecksum supports file sizes greater than 2GB.
  • Since MySQL 5.7 you can log the output with the –log option.
  • –page-type-summary option added for page type summaries.
  • MySQL 5.7 also includes another nice option –page-type-dump which dumps the details of each page to standard output (STDOUT) or standard error (STDERR).
  • Since MySQL 5.7 innochecksum can be executed on multiple user-defined system tablespace files.
  • Since MySQL 5.7 innochecksum can be executed on multiple system tablespace files.

You can read more about this is in the MySQL 5.7 manual page of innochecksum.

Conclusion:
In this post, we identified InnoDB page corruption using the logs generated by Percona XtraBackup and fixed  them by using the mysqldump tool. But again, unfortunately, there are chances that Percona XtraBackup will not always fail in the same way when it finds corruption. So in some cases, it’s not easy to tell whether Percona XtraBackup has failed due to a bad checksum or a bug of its own. But in most cases, page corruption is the culprit if Percona XtraBackup fails to complete.

To summarize, I would say that Percona XtraBackup is a good way of verifying whether or not InnoDB pages are corrupted – and you can also verify the same thing via the mysqldump utility.

The post Deep dive into MySQL’s innochecksum tool appeared first on MySQL Performance Blog.

Mar
27
2014
--

A conversation with 5 Facebook MySQL gurus

A conversation with 6 Facebook MySQL gurusFacebook, the undisputed king of online social networks, has 1.23 billion monthly active users collectively contributing to an ocean of data-intensive tasks – making the company one of the world’s top MySQL users.

A small army of Facebook MySQL experts will be converging on Santa Clara, Calif. next week where several of them are leading sessions at the Percona Live MySQL Conference and Expo. I had the chance to chat virtually with four of them about their sessions: Steaphan Greene, Evan Elias, Shlomo Priymak and Yoshinori MatsunobuMark Callaghan, who spoke at Percona Live last year, also joined our conversation which included a discussion of Facebook’s use of MySQL and other open source technologies.


Tom: What’s Facebook’s view of Open Source?

Steaphan: Facebook was built on open source software, and we still invest heavily in open source today. We understand the power these communities have to drive innovation – they allow us to focus on new challenges, as opposed to reinventing the wheel over and over again. And contributing as much as possible back to open projects is in everyone’s best interest.


Tom: Why MySQL? Wouldn’t NoSQL databases, for example, be better suited for the massive workloads seen at Facebook?

Mark: MySQL is great for many of our important workloads. We make it even better with our expertise in MySQL operations and engineering, and by working with the community and learning from their experience.

Yoshinori: I have not been able to find a transactional NoSQL database better than InnoDB. And it’s easy to understand how MySQL Replication works, which makes much easier to fix problems in production.


 Tom: How does Facebook make MySQL scale?

Steaphan: Sharding, automation, monitoring, and heavy investment in operations and performance engineering.


 Tom: What other things help Facebook run smoothly?

Steaphan: Our completely open culture, and the freedom all engineers here have to try any idea they have.


Tom: What is the top scaling challenge(s) Facebook faces in 2014 – and beyond?

Mark: Our biggest challenge is to make things better (performance, efficiency, availability) in the future at the rate we made things better in the past.

Yoshinori: Availability has improved a lot so far for us. Come to my session at Percona Live to hear about that. For me personally, efficiency is the biggest challenge for 2014 and 2015. This includes reducing space and optimizing for newer-generation hardware.


Tom: Facebook deployed MySQL 5.6 last year – including on critical environments – long before many other large organizations. What prompted such a move so soon? And where there any major concerns?

Steaphan: The same thing that prompts most efforts on the Facebook infra team: We will consider any technology that will help us improve performance, efficiency, or reliability, and we’re willing to accept the risk that sometimes comes along with adopting things like 5.6 very early on. But that’s only half the story here. The other half is that Facebook encourages its engineers to go after big bets like these — in this case it was just one engineer who made this happen. And we had the MySQL engineering talent we needed to work with the Oracle team to get 5.6 ready for production at our scale.

Yoshinori: At Facebook, we have three MySQL teams — Operations, Performance and Engineering. Facebook is one of the very few MySQL users that has internal MySQL developers. We all worked hard to adapt 5.6 to our scale and ensure that it would be production-ready. We found some issues after production deployment, but in many cases we could fix the problem and deployed new MySQL binary within one or two days. When deploying in production, we expected that we encountered MySQL 5.6 specific issues, which was typical when releasing new software. We were just confident that we could fix issues immediately.

Our 5.6 deployment step was not all at once. At first rollout, we disabled most major 5.6 features, such as GTID and binlog checksum. We gradually enabled such features in production.


Tom: Where there any significant issues in that move to MySQL 5.6? Any lessons learned you like to share – along with best practices you’d like to share?

Yoshinori: Performance regression of the CPU intensive replication was a main blocker for some of our applications. I wrote a blog post about this last year. We have several design plans to fix the problem on MySQL side. One of the most effective plans is grouping multiple transactions into one, since the most expensive part is writing to InnoDB system table at transaction commit. This optimization would be done when writing binary log, or by SQL thread. It may take longer time to test and deploy in production. For existing applications, we optimized to group multiple transactions from application side to mitigate the problem.


Tom: Performance monitoring is usually challenging at any organization. How do you do that at Facebook, which has tens of thousands of MySQL instances?

Yoshinori: Top-N monitoring is very important for managing a huge number of instances. Average statistics (for example: average innodb_rows_read across all instances) is not always useful since ~1% of problematic instances won’t noticeably change average numbers. p99 gives better indicators, but in our environment we typically have fewer than 0.1% instances causing problems, in which case p99 is not helpful either. We have several graphical and command-line tools to efficiently list up top-N bad behaving hosts. After listing up bad instances, the way to investigate root cause is pretty straightforward, like what MySQL consultants usually do. Server failure is something we expect and plan for at Facebook. For example, typical MySQL DBA at small companies may not encounter master instance failure during employment, because recent mysqld and H/W are stable enough. At Facebook, master failure is a norm and something the system can accommodate.


Tom: Evan, you and Yoshinori will present on Global Transaction ID (GTID) at Facebook. GTID is very tricky to deploy to an existing large-scale environment – how, and why, did you decide on adopting it?

Evan: Our primary motivations for adopting GTID all relate to either failover or binlog backups. When a master fails, getting replicas in sync with GTID is substantially simpler, faster, and less error-prone than previous methods of diffing binlogs. For backups, GTID is a cornerstone in building cross-datacenter point-in-time recovery, without needing redundant binlog streams from every region.

The “how” question is a bit more involved, and we’ll be covering this in detail during the session. The GTID project was a joint effort between three of Facebook’s MySQL teams. Santosh added new functionality to the MySQL server to make online rollout possible, and Yoshinori improved MHA to seamlessly support GTID-based failover. I added GTID support to all of our other in-house automation, and also scripted the rollout procedure across our many thousands of replica sets. A lot of validation logic and monitoring functionality was involved to ensure the safety of the rollout.


Tom: Shlomo, your session is titled “Under the Hood – MySQL Pool Scanner (MPS).” As you point out in your talk, Facebook has one of the largest MySQL database clusters in the world, comprising thousands of servers across multiple data centers. You must have an army of DBAs – or is there some secret you’d like to share? 

Shlomo: We do have an army, yes — it’s an “army of one.” We have one person on call on the MySQL Operations team at a given time, and they don’t even need to do all that much most days. We built “robots” to do our day to day jobs. The largest and most complex robot we have is MPS, an automated system to do most of the work a DBA might otherwise spend time on, such as replacement of broken or overflowing servers. Among other things, MPS also allows a human to initiate complex bulk operations with a few keystrokes, and it will follow up and complete the operations over the course of days or weeks.

I’ll be describing some of the complex MySQL automation systems we have at Facebook, and how they fit together during my talk.


Tom: Shlomo, what does a typical day look like for you there at Facebook?

Shlomo: The team’s work mostly focuses on maintaining those robots I’ve mentioned, as well as developing new ways to improve the reliability of our databases for Facebook’s users. This year the team also spent a lot of time making sure the new MySQL features such as GTIDs and semi-sync are deeply integrated in our automation. Every day, we work hard to to make ourselves obsolete, but we haven’t gotten there just yet!

On a typical day, I probably spend much of the time coding, mainly in Python. I also spend a significant amount of time working on capacity-related projects, such as thinking of ways to optimize the way we distribute the data across our fleet of servers.
Even after 2.5 years at Facebook, I am still in awe of the number of servers we manage. The typical small-scale maintenance operation at Facebook probably involves more servers than all the companies I’ve previously worked for had, combined. It really is pretty amazing!


Tom:  What are you looking forward to the most at this year’s conference?

Evan: There are plenty of fascinating sessions this year. Just to mention a handful: Jeremy Cole and Davi Arnaut’s session on innodb_ruby, since it’s a very unique way to interactively learn about InnoDB’s internals. Baron Schwartz’s session on using Go with MySQL, as VividCortex is blazing the trail here. Peter Boros and Kenny Gryp’s talk on scalability and benchmarking, which I’m hoping will include recent developments of Percona Playback. Tom Christ’s session on my former project Jetpants, to see how it has evolved over the past year at Tumblr. And several talks by Oracle engineers about upcoming functionality in MySQL 5.7.

Steaphan: In addition to the conference sessions, I look forward to the birds of a feather session with the MySQL team.  Last year, it proved to be a valuable opportunity to engage with those upstream developers who make the changes we care about, and I expect the same this year.


Tom: If you could talk to a DBA or developer on the fence about attending this year’s conference, what would be your top 3-5 reasons for making it over to Santa Clara for this event?

Evan: I’m based in NYC, so I’m traveling a bit further than many of my colleagues, but I can still confidently say that Percona Live is well worth the trip. The MySQL ecosystem is very healthy and constantly evolving, and the conference is the best place to learn about ongoing developments across a wide spectrum of companies and contributors. It’s also a perfect opportunity to personally connect with all of the amazing engineers, DBAs, users, and vendors that make MySQL so unique and compelling.


 The Percona Live MySQL Conference and Expo 2014 runs April 1-4 in Santa Clara, Calif. Use the code “SeeMeSpeak” when registering and save 10 percent. The inaugural Open Source Appreciation Day is on March 31 – this full-day event is free but because space is limited I suggest registering now to reserve your spot.

The post A conversation with 5 Facebook MySQL gurus appeared first on MySQL Performance Blog.

Mar
27
2013
--

Why MySQL Performance at Low Concurrency is Important

PerformanceA few weeks ago I wrote about “MySQL Performance at High Concurrency” and why it is important, which was followed up by Vadim’s post on ThreadPool in Percona Server providing some great illustration on the topic. This time I want to target an opposite question: why MySQL performance at low concurrency is important for you.

I decided to write about this topic as a number of recent blog posts and articles look at MySQL performance starting with certain concurrency as the low point. For example, MySQL 5.6 DBA and Developer Guide has a number of graphs starting with 32 connections at the low point. Mark Callaghan also starts with a concurrency of 8 in some of the results he publishes. All of this may make it look as though single-thread performance is no more a concern. I believe it still is!

First, performance at concurrency of 1 serves as an important baseline that offers the best response times (MySQL server does not use multiple threads to execute query in parallel at this point). Response times at the variety of concurrency levels can be compared with this baseline to see how they are affected and how the system scales with increasing concurrency in effect.

Second, there are many cases in which single-thread execution is what is going to happen — many batch jobs are written to be single-threaded. MySQL replication is single-thread too, and MySQL 5.6 brings some abilities of parallel replication but these apply only to some use cases. Many database maintenance operations such as “alter table” are run in single-thread too.

In fact,  chances are your system actually runs at low concurrency most of the time. The majority of systems I see in the wild have Threads_running on average of 5 or less, at least the system is performing well, pointing to rather low concurrency in the normal system operation. Much higher concurrency in most cases is seen when the system is overloaded and suffering.

So what is the problem with looking at performance at just high-concurrency values? This might mislead you on how your production will be impacted. Think about, for example, a system that is slower at low concurrency but a lot faster at high concurrency (which is the typical situation). If you just look at performance at high concurrency you might not understand why the system is showing worse response times when you are actually running it in real life with low concurrency.

When it comes to benchmarks I would love to see results published starting from concurrency of 1 if possible so we can see the full picture. In terms of benchmarks, Mark Callaghan published an example where 8 sysbench tables are used which means the lowest concurrency you can run with to compare apples to apples is 8. Running a comparable benchmark with concurrency of 1 and the same tool is not possible.

Want to talk more about performance? Come to Percona Live MySQL Conference and Expo April 22-25 in Santa Clara, California, where “performance” is going to be a big topic :)

The post Why MySQL Performance at Low Concurrency is Important appeared first on MySQL Performance Blog.

Jan
12
2011
--

Percona Live Keynote Speaker: Mark Callaghan

Mark Callaghan has graciously accepted to be the closing keynote speaker for Percona Live: San Francisco!

Mark is best known for his work behind MySQL @ Facebook, where he and his team maintain one of the largest MySQL installations around.  They also contribute back to the community with a publicly available branch of enhancements, improved diagnostic tools, and bug reports which help make MySQL better.

Mark’s keynote will be on “High-value Transaction Processing”.  I assure you, this is a presentation not to be missed.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com