Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

performance troubleshooting MySQL monitoring tools

performance troubleshooting MySQL monitoring toolsPlease join Percona’s Principal Support Escalation Specialist Sveta Smirnova as she presents Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance on Wednesday, June 27th at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).


During the MySQL Troubleshooting webinar series, I covered many monitoring and logging tools such as:

  • General, slow, audit, binary, error log files
  • Performance Schema
  • Information Schema
  • System variables
  • Linux utilities
  • InnoDB monitors
  • PMM

However, I did not spend much time on the impact these instruments have on overall MySQL performance. And they do have an impact.

And this is the conflict many people face. MySQL Server users try exploring these monitoring instruments, see that they slow down their installations, and turn them off. This is unfortunate. If the instrument that can help you resolve a problem is OFF, you won’t have good and necessary information to help understand when, how and why the issue occurred. In the best case, you’ll re-enable instrumentation and wait for the next disaster occurrence. In the worst case, you try various fix options without any real knowledge if they solve the problem or not.

This is why it is important to understand the impact monitoring tools have on your database, and therefore how to minimize it.

Understanding and controlling the impact of MySQL monitoring tools

In this webinar, I cover why certain monitoring tools affect performance, and how to minimize the impact without turning the instrument off. You will learn how to monitor safely and effectively.

Register Now


Sveta Smirnova

Principal Support Escalation Specialist

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance appeared first on Percona Database Performance Blog.


Webinar Thursday June 22, 2017: Deploying MySQL in Production

Deploying MySQL

Join Percona’s Senior Operations Engineer, Daniel Kowalewski as he presents Deploying MySQL in Production on Thursday, June 22, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

 MySQL is famous for being something you can install and get going in less than five minutes in terms of development. But normally you want to run MySQL in production, and at scale. This requires some planning and knowledge. So why not learn the best practices around installation, configuration, deployment and backup?

This webinar is a soup-to-nuts talk that will have you going from zero to hero in no time. It includes discussion of the best practices for installation, configuration, taking backups, monitoring, etc.

Register for the webinar here.

Deploying MySQLDaniel Kowalewski, Senior Technical Operations Engineer

Daniel has been designing and deploying solutions around MySQL for over ten years. He lives for those magic moments where response time drops by 90%, and loves adding more “nines” to everything.


Using MySQL triggers and views in Amazon RDS

I recently had an opportunity to migrate a customer from a physical server into Amazon’s RDS environment. In this particular case the customers’ platform makes extensive use of MySQL triggers and views.  I came across two significant issues that prevented me from following Amazon’s documentation, which basically states “use mysqldump” but doesn’t call out a specific method of dealing with MySQL triggers and views.

Amazon Relational Database Service (Amazon RDS) is a great platform if you’re looking for complete hands-off management of your MySQL environment, but comes at a cost in the area of flexibility, i.e. you don’t have SUPER privilege and this brings up additional challenges.

  1. You need to ensure you set log_bin_trust_function_creators=1 ( by default this is off, 0).
  2. You need to clean up your mysqldump syntax.

#1 is easy, you simply make a configuration change within the Amazon RDS GUI on the node’s Parameter Group to set log_bin_trust_function_creators=1 and then a restart of your Amazon RDS node.  The restart is required since without the SUPER privilege you lose access to changing DYNAMIC variables on the fly.
#2 is a little more complex.  If you go with vanilla mysqldump (from say a 5.5 mysqldump binary) on a schema that has triggers and views, you will see error 1227, something like this:

ERROR 1227 (42000) at line 27311: Access denied; you need (at least one of) the SUPER privilege(s) for this operation

You’re seeing this message because MySQL in Amazon RDS doesn’t provide the SUPER privilege, and thus you cannot set up a trigger or view to run as a different user — only a user with SUPER can do that.

mysqldump will generate syntax for a trigger like this:

/*!50003 CREATE*/ /*!50017 DEFINER=`root`@`%`*/ /*!50003 TRIGGER `after_insert_lead` AFTER INSERT ON `leads` FOR EACH ROW BEGIN
UPDATE analytics.mapping SET id_lead = NEW.id_lead WHERE mc_email = NEW.email;
END */;;

and for a view like this:

/*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER */
/*!50001 VIEW `admin_user_view` AS SELECT ...

The problem is in the “DEFINER” lines.

Here’s one method that worked for me:

  1. Identify all the DEFINER lines in your schema. I found it helpful to dump out a –no-data and then weed through that to get a unique list of the DEFINER lines
  2. Create a sed line for each unique DEFINER line (see my example in a moment)
  3. Include this sed line in your dump/load script

Here’s what my sed matches looked like:

-e 's//*!50017 DEFINER=`root`@`localhost`*///'
-e 's//*!50017 DEFINER=`root`@`%`*///'
-e 's//*!50017 DEFINER=`web`@`%`*///'
-e 's//*!50017 DEFINER=`cron`@`%`*///'
-e 's//*!50013 DEFINER=`cron`@`%` SQL SECURITY DEFINER *///'
-e 's//*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER *///'
-e 's//*!50013 DEFINER=`root`@`%` SQL SECURITY DEFINER *///'
-e 's//*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER *///'

Note: the example above won’t directly work due to WordPress “helpfully” stripping my text… you need to escape the forward slashes and asterisks.

A big caveat: this method is akin to a brute force method of getting your data into Amazon RDS — you’ve lost the elegance & security of running your triggers and views as separate defined users within the database — they are all now going to run as the user you loaded them in as. If this is a show-stopper for you, contact Percona and I’d be happy to take on your case and develop a more comprehensive solution.  :)

Now all that’s left is to integrate this into your dump flow.  Something like this should work:

| sed
-e ... lots of lines
| mysql

I hope this helps someone!

The post Using MySQL triggers and views in Amazon RDS appeared first on MySQL Performance Blog.


Here’s my favorite secret MySQL 5.6 feature. What’s yours?

MySQL 5.6 has a great many new features, including, but certainly not limited to a number of performance improvements. However, besides the widely talked-about features such as InnoDB support for full text search, optimizer, performance schema improvements and GTID, there are also a few tiny improvements that nobody cared to mention.

One such feature is… …well, let me show you.

In the past almost-7-years that I’ve been with Percona, roughly 97.35% of my reports would include the following or similar instructions in them:

… in order to change the size of InnoDB transaction log files, you have to follow this procedure. Beware that if you don’t follow these instructions step-by-step, your server may not restart or rather it will start with InnoDB storage engine disabled and an error in the logs. So, here’s what you want to do:
1. change (or add) the following variable in my.cnf: innodb_log_file_size = XXXM
2. stop MySQL server
3. make sure you see a successful shutdown of InnoDB in the MySQL error log
4. move away old log files (at this point I have to double check where they are) by running the following command:
mv /var/lib/mysql/ib_logfile* /tmp/
5. start MySQL server – it should take a bit longer to start because it is going to be creating new transaction log files
6. When you’re positive that MySQL server has started successfully, you can remove the old log files:
rm /tmp/ib_logfile*

Occasionally, if I’m not careful enough picking the warning words, that would backfire with extra questions such as how safe this is, do they really need to do it etc. and yet in most cases increasing the size of transaction log is the best thing you can do for your InnoDB-backed MySQL server.

The Secret Feature

From now on, systems running MySQL 5.6 will make my life (and yours too) so much easier as the instructions can now be reduced to:

Change the innodb_log_file_size (which I will simply include together with other recommended changes) and restart MySQL server.

Automatic InnoDB transaction log file size change is how I would call it. You want to know how MySQL responds to this? Here’s how:

2013-08-20 13:57:37 5048 [Warning] InnoDB: Resizing redo log from 2*3072 to 2*32768 pages, LSN=1626007
2013-08-20 13:57:37 5048 [Warning] InnoDB: Starting to delete and rewrite log files.
2013-08-20 13:57:37 5048 [Note] InnoDB: Setting log file ./ib_logfile101 size to 512 MB
InnoDB: Progress in MB: 100 200 300 400 500
2013-08-20 13:57:39 5048 [Note] InnoDB: Setting log file ./ib_logfile1 size to 512 MB
InnoDB: Progress in MB: 100 200 300 400 500
2013-08-20 13:57:41 5048 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2013-08-20 13:57:41 5048 [Warning] InnoDB: New log files created, LSN=1626007

And it works both ways! Gotta love this simplicity. Tell me, what is your favorite MySQL 5.6 feature? Don’t worry, it doesn’t have to be secret. Unless you dare, of course.

The post Here’s my favorite secret MySQL 5.6 feature. What’s yours? appeared first on MySQL Performance Blog.


MySQL Partitioning – can save you or kill you

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you’re doing lookup by partitioned key you will look at one (or some of) partitions, however lookups by other keys will need to perform lookup in all partitions and hence can be a lot slower. The gain from updates typically comes from having smaller BTREE on the active partition(s) which allows for a lot better fit. Having potentially fewer level in BTREE is not that significant issue.

So lets see at example:


  1. CREATE TABLE `tbl` (
  2.   `id` bigint(20) UNSIGNED AUTO_INCREMENT NOT NULL,
  3.   `uu` varchar(255) DEFAULT NULL,
  4.   `data` bigint(20) UNSIGNED DEFAULT NULL,
  5.   PRIMARY KEY (`id`),
  6.   KEY `uu` (`uu`),
  7. ) ENGINE=InnoDB

The access pattern to this table is to lookup data by “uu” which has UUID values and when number of deletes by “id” and bunch of inserts. The deletes are mainly clustered around most recent id values.
The table (and index) is much larger than buffer pool size.

The first problem was replication lag, which are mainly due to modifying the uu index. This is because UUID() spreads values prefix very well effectively giving almost uniform access to all BTREE. To solve this problem partitioning was a good choice – PARTITION BY HASH (id div 10000000) PARTITIONS 32 – This allows to partition data to 32 partitions placing sequential ranges of 10M values in the same partition – very handy if you have very active access to values which ave been added to the table recently.

Using this trip replication could be speed up about 10 times as couple of partitions which were actively used could fit in buffer pool completely so replication became CPU bound (single thread) instead of IO bound.

You could celebrate but hey…. you need to check the impact on master too. Master in its turn was getting a lot of lookups by the uu value which is not part of partitioned key and hence we’re looking at 32 logical lookups, one per partition. True only one of the partitions would contain the value but many of them will require physical IO and going down to the leaf key to verify such value does not exist, which reduced performance for random selects by UUID from 400 to 20 per second (from single thread).

Decreasing number of partitions made replication less efficient but the number of selects the table could deliver was increasing and there seems to be a reasonable number which would allow replication to perform better when it is now, while selects still performed in the amount system needs.

What is a take away ? When you’re creating partitions think clearly what you’re trying to archive. Partitioning is not some magic feature which just makes everything a lot faster. I’ve seen some people applying partition to basically all of their tables without much a thought and believe me results were not pretty.

Entry posted by Peter Zaitsev |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


How well does your table fits in innodb buffer pool ?

Understanding how well your tables and indexes fit to buffer pool are often very helpful to understand why some queries are IO bound and others not – it may be because the tables and indexes they are accessing are not in cache, for example being washed away by other queries. MySQL Server does not provide any information of this type, Percona Server however adds number of tables to Information Schema which makes this information possible. It is just few queries away:


  1. mysql> SELECT `schema` AS table_schema,innodb_sys_tables.name AS table_name,innodb_sys_indexes.name AS index_name,cnt,dirty,hashed,round(cnt*100/index_size,2) fit_pct   FROM (SELECT index_id,count(*) cnt,sum(dirty=1) dirty ,sum(hashed=1) hashed FROM innodb_buffer_pool_pages_index GROUP BY index_id) bp JOIN innodb_sys_indexes ON id=index_id JOIN innodb_sys_tables ON table_id=innodb_sys_tables.id JOIN innodb_index_stats ON innodb_index_stats.table_name=innodb_sys_tables.name AND innodb_sys_indexes.name=innodb_index_stats.index_name AND innodb_index_stats.table_schema=innodb_sys_tables.schema  ORDER BY cnt DESC LIMIT 20;
  2. +————–+————–+————–+——+——-+——–+———+
  3. | table_schema | table_name   | index_name   | cnt  | dirty | hashed | fit_pct |
  4. +————–+————–+————–+——+——-+——–+———+
  5. | test         | a            | c            | 7976 |     0 |      0 |   13.73 |
  6. | test         | a            | PRIMARY      |   59 |     0 |      0 |    0.08 |
  7. | sbtest       | sbtest#P#p1  | PRIMARY      |   22 |     0 |      0 |   22.68 |
  8. | sbtest       | sbtest#P#p0  | PRIMARY      |   22 |     0 |      0 |   22.68 |
  9. | sbtest       | sbtest#P#p2  | PRIMARY      |   21 |     0 |      0 |   21.65 |
  10. | sbtest       | sbtest#P#p3  | PRIMARY      |   18 |     0 |      0 |   18.56 |
  11. | sbtest       | sbtest#P#p3  | k            |    4 |     0 |      0 |  100.00 |
  12. | sbtest       | sbtest#P#p2  | k            |    4 |     0 |      0 |  100.00 |
  13. | sbtest       | sbtest#P#p1  | k            |    4 |     0 |      0 |  100.00 |
  14. | sbtest       | sbtest#P#p0  | k            |    4 |     0 |      0 |  100.00 |
  15. | stats        | TABLES       | PRIMARY      |    2 |     0 |      0 |   66.67 |
  16. | stats        | TABLES       | TABLE_SCHEMA |    1 |     0 |      0100.00 |
  17. | percona      | transactions | PRIMARY      |    1 |     0 |      0100.00 |
  18. +————–+————–+————–+——+——-+——–+———+
  19. 13 rows IN SET (0.04 sec)

This query shows information about how many pages are in buffer pool for given table (cnt), how many of them are dirty (dirty), and what is the percentage of index fits in memory (fit_pct)
For illustration purposes I’ve created one table with partitions to show you will have the real “physical” table name which identifies table down to partition, which is very helpful for analyzes of your
access to partitions – you can actually check if your “hot” partitions really end up in the cache and “cold” are out of the cache, or is something happening which pushes them away from the cache.

You can use this feature to tune buffer pool invalidation strategy, for example play with innodb_old_blocks_pct and innodb_old_blocks_time actually observing data stored in buffer pool rather than using some form of temporary measures.

I often check these stats during warmup to see what is really getting warmed up first as well as how buffer pool is affected by batch jobs, alter tables, optimize table etc – the lasting impact these may have on system performance is often caused by impact they have on buffer pool which may take hours to recover.

This tool can be also helpful for capacity planning/performance management. In many cases you would learn you need a certain fit to buffer pool for tables/indexes for reasonable performance, you may try to count it too but it may be pretty hard as there are a lot of variables, including page fill factors etc.

Entry posted by Peter Zaitsev |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Thinking about running OPTIMIZE on your Innodb Table ? Stop!

Innodb/XtraDB tables do benefit from being reorganized often. You can get data physically laid out in primary key order as well as get better feel for primary key and index pages and so using less space,
it is just OPTIMIZE TABLE might not be best way to do it.

If you’re running Innodb Plugin on Percona Server with XtraDB you get benefit of a great new feature – ability to build indexes by sort instead of via insertion. This process can be a lot faster, especially for large indexes which would get inserts in very random order, such as indexes on UUID column or something similar. It also produces a lot better fill factor. The problem is…. OPTIMIZE TABLE for Innodb tables does not get advantage of it for whatever reason.

Lets take a look at little benchmark I done by running OPTIMIZE for a second time on a table which is some 10 times larger than amount of memory I allocated for buffer pool:


  1. CREATE TABLE `a` (
  3.   `c` char(64) DEFAULT NULL,
  4.   PRIMARY KEY (`id`),
  5.   KEY `c` (`c`)
  8. mysql> SELECT * FROM a ORDER BY id LIMIT 10;
  9. +—-+——————————————+
  10. | id | c                                        |
  11. +—-+——————————————+
  12. 1 | 813cf02d7d65de2639014dd1fb574d4c481ecac7 |
  13. 2 | 62960f5d5d50651e5a5983dacaedfa9a73a9ee87 |
  14. 3 | cea33998792ffe28b16b9272b950102a9633439f |
  15. 4 | 8346a7afa0a0791693338d96a07a944874340a1c |
  16. 5 | b00faaa432f507a0d16d2940ca8ec36699f141c8 |
  17. 6 | 8e00926cf6c9b13dc8e0664a744b7116c5c61036 |
  18. 7 | f151fe34b66fd4d28521d5e7ccb68b0d5d81f21b |
  19. 8 | 7fceb5afa200a27b81cab45f94903ce04d6f24db |
  20. 9 | 0397562dc35b5242842d68de424aa9f0b409d60f |
  21. | 10 | af8efbaef7010a1a3bfdff6609e5c233c897e1d5 |
  22. +—-+——————————————+
  23. 10 rows IN SET (0.04 sec)
  25. # This is just random SHA(1) hashes
  27. mysql> OPTIMIZE TABLE a;
  28. +——–+———-+———-+——————————————————————-+
  29. | TABLE  | Op       | Msg_type | Msg_text                                                          |
  30. +——–+———-+———-+——————————————————————-+
  31. | test.a | OPTIMIZE | note     | TABLE does NOT support OPTIMIZE, doing recreate + analyze instead |
  32. | test.a | OPTIMIZE | STATUS   | OK                                                                |
  33. +——–+———-+———-+——————————————————————-+
  34. 2 rows IN SET (3 hours 3 min 35.15 sec)
  36. mysql> ALTER TABLE a DROP KEY c;
  37. Query OK, 0 rows affected (0.46 sec)
  38. Records: 0  Duplicates: 0  Warnings: 0
  40. mysql> OPTIMIZE TABLE a;
  41. +——–+———-+———-+——————————————————————-+
  42. | TABLE  | Op       | Msg_type | Msg_text                                                          |
  43. +——–+———-+———-+——————————————————————-+
  44. | test.a | OPTIMIZE | note     | TABLE does NOT support OPTIMIZE, doing recreate + analyze instead |
  45. | test.a | OPTIMIZE | STATUS   | OK                                                                |
  46. +——–+———-+———-+——————————————————————-+
  47. 2 rows IN SET (4 min 5.52 sec)
  49. mysql> ALTER TABLE a ADD KEY(c);
  50. Query OK, 0 rows affected (5 min 51.83 sec)
  51. Records: 0  Duplicates: 0  Warnings: 0

That’s right ! Optimizing table straight away takes over 3 hours, while dropping indexes besides primary key, optimizing table and adding them back takes about 10 minutes, which is close than 20x speed difference and more compact index in the end.

So if you’re considering running OPTIMIZE on your tables consider using this trick, it is especially handy when you’re running it on the Slave where it is OK table is exposed without indexes for some time.
Note though nothing stops you from using LOCK TABLES on Innodb table to ensure there is not ton of queries starting reading table with no indexes and bringing box down.

You can also use this trick for ALTER TABLE which requires table rebuild. Dropping all indexes; doing ALTER and when adding them back can be a lot faster than straight ALTER TABLE.

P.S I do not know why this was not done when support for creating index by sorting was implemented. It looks very strange to me to have this feature implemented but majority of high level commands
or tools (like mysqldump) do not get advantage of it and will use old slow method of building indexes by insertion.

Entry posted by Peter Zaitsev |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Getting History of Table Sizes in MySQL

One data point which is very helpful but surprisingly few people have is the history of the table sizes. Projection of data growth is very important component for capacity planning and simply watching the growth of space used on partition is not very helpful.

Now as MySQL 5.0+ has information schema collecting and keeping this data is very easy:


  2. USE stats;
  3. CREATE TABLE `tables` (
  4. `DAY` date NOT NULL,
  5. `TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT ,
  6. `TABLE_NAME` varchar(64) NOT NULL DEFAULT ,
  7. `ENGINE` varchar(64) DEFAULT NULL,

And use this query to populate it:



I put it to the cron to run nightly as:



Though if you’re looking to keep it completely inside MySQL you can create appropriate event in MySQL 5.1+

Unless you’re having millions of tables this is something you can set it up and forget but when a year later when someone asks you about growth rate for individual table you will have it handy.

If you’re having large number of MySQL servers, especially in Sharded environment it makes sense to modify the script so you store data on one central server this allows you to do a lot of nice queries to see how evenly data is distributed among shards etc. In replicated environment you should put this cron on Master only. If you use statement based replication it will automatically pick up sizes and even different tables on master and slave.

Entry posted by Peter Zaitsev |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


Replication of MEMORY (HEAP) Tables

Some Applications need to store some transient data which is frequently regenerated and MEMORY table look like a very good match for this sort of tasks. Unfortunately this will bite when you will be looking to add Replication to your environment as MEMORY tables do not play well with replication.

The reason is very simple – both STATEMENT and ROW replication contain the changes to the data in binary logs. This requires the data to be same on Master and Slave. When you restart the slave you will lose contents of your MEMORY tables and replication will break. STATEMENT replication will often continue to run, with contents of the table just being
different as there is a little checks whenever statements produce the same results on the slave. ROW replication will
complain about ROW not exist for UPDATE or DELETE operation.

So what you can do ?

Use Innodb Table Instead Innodb is quite fast when it fits in memory so for most applications this performance will be enough and it will save you from all complexity of different workarounds.

Do not replicate MEMORY tables If you do not really need MEMORY table on the slaves you can skip replicating it specifying replicate-ignore-table=db.memory_table. Note you should not be using STATEMENT level replication with INSERT … SELECT into this memory table for this to work. Be careful using data
on the Slave in this case as table will be empty. Another nice trick sometimes is to make slave to generate its own
copy of the table, for example by running the same cron jobs MASTER runs to refresh this table periodically.

Restart Slaves Carefully I would not use this as long term solution as there are going to be the reasons when SLAVE will not restart normally – power goes down MySQL crashes etc. If you however are using MEMORY table in replication and just want to do a restart without replication breaking you can do the following:
Add skip-slave-start in your my.cnf; run SLAVE STOP; dump all your memory tables using MySQLDump; Restart the MySQL As planned; Load Dumped tables; run SLAVE START; Remove skip-slave-start from config file. Be careful using it with MASTER-MASTER or CHAIN/TREE replication. In this case you will need to disable binary logging while loading data from mysqldump as you may not want these changes to be replicated.

What could have done better ?

MySQL could have features to make it more convenient. It would be great to have MEMORY table option which would save table to on disk file on shutdown and load it back on startup. Of course you would lose the data on unclear start, but it is still handy for a lot of cases.

We could have the option similar to skip-slave-errors but specified on per-table basics. This would allow me to simply allow to avoid all replication errors for MEMORY table which would make things more robust if table is
regenerated periodically. It can be helpful in many other cases too.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks


The story of one MySQL Upgrade

I recently worked on upgrading MySQL from one of very early MySQL 5.0 versions to Percona Server 5.1. This was a classical upgrade scenario which can cause surprises. Master and few slaves need to be upgraded. It is a shared database used by tons of applications written by many people over more than 5 years timeframe. It did not have any extensive test suite we could use for validation. As you might guess in such cases some of the original authors have moved on and nobody is exactly sure what application does or does not do with the database. Database is production critical with serious role in serious company so we can’t just do a reckless upgrade

First we needed to do a sanity check on existing replication setup. As we’re checking replication consistency down the road we need to make sure replication is in sync to begin with to avoid false positives. mk-table-checksum is a tool to do it. It turned out replication indeed had an issue replicating
triggers. The problem should be fixed by upgrade so we just have to keep this into account.

We move database to MySQL 5.1 As the database size is relatively small we do mysqldump and load which is a safest way, considering we’re speaking about 4 years worth of changes in versions. We also ran mysql_fix_privilege_tables to ensure all new privileges are added, which is something I frequently see forgotten

Next step is setup MySQL 5.0 to 5.1 replication to see if it runs properly. It turns out it does not because of the old bug which I’ve also seen causing upgrade problems in number of other environments. INSERT ON DUPLICATE KEY UPDATE had a unfair share of replication issues in MySQL 5.0. There are number of ways the problem can be solved but first we decide to see how broad is it. We let Slave to replicate with skip-slave-errors=1105 to see if we get any other problems spotted and in the meanwhile we go over binary logs for the last month to see how frequently this functionality is used. Happily there are only few INSERT ON DUPLICATE KEY UPDATE query instances, and only one of them into table with AUTO_INCREMENT column (and so affected by this bug). It was easy enough to change the single application not to use INSERT ON DUPLICATE KEY UPDATE in this instance so it was done.

So replication was running properly but does data match ? (This also would cover data improperly loaded with mysqldump if there is such). We stopped 5.0 and 5.1 slave at the same position and used mk-table-checksum to ensure the data is in sync. mk-table-checksum can use replication to check consistency but comparing 2 servers directly is faster and we had a spare capacity which we could use. First we ran the check using default CHECKSUM TABLE algorithm. We got number of tables reporting wrong checksums while running SELECT INTO OUTFILE and diffing these files reported no changes. It turns out there are some subtle changes to CHECKSUM TABLE over the years which could report different checksum in some cases. Rerunning check using BIT_XOR algorithm eliminated those false positives. Another table remained though. We used mk-table-sync –print
as a diff tool for MySQL to see what is different in the tables. It turned out one of the float columns stored “-0” in MySQL 5.0 but it was displayed as “0” when data loaded to Percona Server 5.1. This was not the issue for application and could be ignored.

So at this point we were sure the write traffic replicates properly to the new setup. It was the time to check how read traffic behaves. We stopped both slaves at the same position again and used tcpdump and mk-query-digest to get sample read traffic from both master and slave. –sample=50 (or similar) option is important to check only limited number of samples for each query type – otherwise it can take a lot of time. Running mk-upgrade with these queries showed some results differences which turned out to be false positives too – thanks to TABLE CHECKSUM mk-upgrade uses by default to check result sets. –compare-results-method rows helped to remove them and we were down to only query time differences. In most cases query time differences were not significant or Percona Server 5.1 did better but there were couple of queries where optimizer plan changed to significantly worse one and they were flagged to be fixed.

At this point we were confident enough Slaves can handle the traffic and we could put them in production. Before upgrading Master however we had to think about rollback plan if something goes wrong and we need to go back to MySQL 5.0 on the master. To do this we set up replication from Percona Server 5.1 back to MySQL 5.0 and performed the same checks again – happily replication worked and there were no “drift”. This allows us to simply to hook up old MySQL 5.0 and all it slaves as a slave off new master and keep it for some time, with rollback to old setup being trivial. This was the best choice as with New MySQL version upgrade involves new Operating System and hardware and any of them could be potential cause of rollback.

MySQL Upgrade in my opinion is the process where hiring external consultant it especially makes sense. The team, even if it includes skilled MySQL DBA typically does not need to go through major version upgrades more frequently than 3-5 years, so unless there are a lot of applications being upgraded by the same team it is hard to archive experience. Also problems you encounter during upgrade are very different depending on the upgrade version – upgrade from MySQL 4.1 to 5.0 had a lot of different issues than upgrade from MySQL 5.0 to 5.1

Also, Maatkit is Awesome, though I believe you know already.

Entry posted by peter |

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com