Multi-Threaded Slave Statistics

Multi-Threaded Slave Statistics

Multi-Threaded Slave StatisticsIn this blog post, I’ll talk about multi-threaded slave statistics printed in MySQL error log file.

MySQL version 5.6 and later allows you to execute replicated events using parallel threads. This feature is called Multi-Threaded Slave (MTS), and to enable it you need to modify the


 variable to a value greater than 1.

Recently, a few customers asked about the meaning of some new statistics printed in their error log files when they enable MTS. These error messages look similar to the example stated below:

[Note] Multi-threaded slave statistics for channel '': seconds elapsed = 123; events assigned = 57345; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 0 waited (count) when Workers occupied = 0 waited when Workers occupied = 0

The MySQL reference manual doesn’t show information about these statistics. I’ve filled a bug report asking Oracle to add information about these statistics in the MySQL documentation. I reported this bug as #85747.

Before they update the documentation, we can use the MySQL code to get insight as to the statistics meaning. We can also determine how often these statistics are printed in the error log file. Looking into the rpl_slave.cc file, we find that when you enable MTS – and log-warnings variable is greater than 1 (log-error-verbosity greater than 2 for MySQL 5.7) – the time to print these statistics in MySQL error log is 120 seconds. It is determined by a hard-coded constant number. The code below shows this:

  Statistics go to the error log every # of seconds when --log-warnings > 1
const long mts_online_stat_period= 60 * 2;

Does this mean that every 120 seconds MTS prints statistics to your MySQL error log (if enabled)? The answer is no. MTS prints statistics in the mentioned period depending on the level of activity of your slave. The following line in MySQL code verifies the level of the slave’s activity to print the statistics:

if (rli->is_parallel_exec() && rli->mts_events_assigned % 1024 == 1)

From the above code, you need MTS enabled and the modulo operation between the 


 variable and 1024 equal to 1 in order to print the statistics. The 


 variable stores the number of events assigned to the parallel queue. If you’re replicating a low level of events, or not replicating at all, MySQL won’t print the statistics in the error log. On the other hand, if you’re replicating a high number of events all the time, and the


 variable increased its value until the remainder from the division between this variable and 1024 is 1, MySQL prints MTS statistics in the error log almost every 120 seconds.

You can find the explanation these statistics below (collected from information in the source code):

  1. Worker queues filled over overrun level: MTS tends to load balance events between all parallel workers, and the  

     variable determines the number of workers. This statistic shows the level of saturation that workers are suffering. If a parallel worker queue is close to full, this counter is incremented and the worker replication event is delayed in order to avoid reaching worker queue limits.

  2. Waited due to a Worker queue full: This statistic is incremented when the coordinator thread must wait because of the worker queue gets overfull.
  3. Waited due to the total size: This statistic shows the number of times that the coordinator thread slept due to reaching the limit of the memory available in the worker queue to hold unapplied events. If this statistic keeps increasing when printed in your MySQL error log, you should resize the

     variable to a higher value to avoid the coordinator thread waiting time.

  4. Waited at clock conflicts: In the case of possible dependencies between transactions, this statistic shows the wait time corresponding to logical timestamp conflict detection and resolution.
  5. Waited (count) when used occupied: A counter of how many times the coordinator saw Workers filled up with “enough” with assignments. The enough definition depends on the scheduler type (per-database or clock-based).
  6. Waited when workers occupied: These are statistics to compute coordinator thread waiting time for any worker available, and applies solely to the Commit-clock scheduler.

Multi-threaded slave is an exciting feature that allows you to replicate events faster, and keep in sync with master instances. By changing the log-warnings variable to a value greater than 1, you can get information from the slave error log file about how multi-threaded performance.


Q&A: Multi-threaded Replication in MySQL 5.6 and MySQL 5.7

My webinar “Multi-threaded Replication in MySQL 5.6 and 5.7″ on February 25 generated several excellent questions following the presentation (available here for playback along with the slides). I didn’t have time to answer many of the questions during the session and so in this post I answer all of them. Thanks to everyone who attended!

Q&A: Multi-threaded Replication in MySQL 5.6 and MySQL 5.7Q: What do you expect from MTS with logical clock? Do you think performance would be good as with per database?
A: MTS with 5.6 is not usable if you have a single database. I do not have numbers, but this is quite frequent. With 5.7 everyone should be able to benefit from multi-threaded replication.

Q: When MySQL 5.6 was released, performance of MTS was lower, than in 5.5, for example. Is this addressed now?
A: I am not sure which specific issue or bug you are referring, but if your data is spread across several databases

Q: How does Percona XtraBackup work with MTS? What are the changes in mysqldump?
A: As long as you are using GTIDs, you can safely take a backup from a slave using multi-threaded replication: with XtraBackup, add the --slave-info option as usual when taking a backup from a slave and with mysqldump, use --master-data instead of --dump-slave.

Q: For checkpoint position, what if MTS thread apply Insert before creating table where it inserting data. How MTR checkpoint will keep track of these transactions applying by different thread on slave?
A: The worker threads track all execution gaps to make sure that out-of-order execution is safe and to be able to replay all events without forgetting any of them. So it is not possible that a worker thread will insert data in a table that has not been created yet.

Q: Can you use MTS with all binlog_format options?
A: Yes

Q: Is there any way to have the threads work so that no database contention happens?
A: The short answer is no: the goal of the worker threads is to execute the incoming transactions as fast as possible. If that results in database contention, you should probably decrease the number of worker threads.

Q: Why doesn’t multi-threaded replication perform well on a single DB?
A: With 5.6, parallelization is based on isolating the transactions going to each database. If you only have a single DB, no parallelization is possible. You should look at 5.7 and the logical clock algorithm.

Q: Are there any implications with regards to GTIDs and Multi-Threaded replication when running a Master-to-Master setup?
A: I cannot think of any, however I am not sure master-master replication is still very relevant when using GTIDs.

Q: Is there any inconvenience with memory or cache when using more workers than the number of databases?
A: If the number of workers is just a bit higher than the number of databases (like 5 workers for 3 databases), there should not be any issue. However with ridiculously high numbers (500 workers for 2 databases), there might be performance degradation. I have not tested such cases, so I cannot give a good answer there. However the idea is that the number of workers should be close to the number of databases and should exceed the number of cores on the server.

Q: Is there multi-threaded replication in MySQL 5.7?
A: Yes, multi-threaded replication is available in MySQL 5.7 and offers improvements compared to MySQL 5.6 (mainly the parallelization with logical clock).

Q: Have you used DIM_STAT to created load and measure SLAVE Lag? Any interesting take-a-ways from that effort?
A: I used sysbench to generate load and Seconds_Behind_Master from SHOW SLAVE STATUS to measure slave lag. That mainly shows that if your workload is a good fit for MTS (multiple databases and load shared evenly across ), performance benefits can be significant.

Q: Does multi-threaded replication also work with Percona XtraDB Cluster/Percona Server?
A: Percona Server 5.6 is based on MySQL 5.6, so you can use multi-threaded replication exactly as you would use it on MySQL 5.6.

On Percona XtraDB Cluster, it is a bit different: replication inside the cluster uses Galera replication, which has nothing to do with MySQL replication. Note that Galera has offered parallel replication from the beginning (parallel applying of the replicated writesets to be accurate). However if you are using asynchronous replicas, these replicas can benefit from multi-threaded replication if they are running on MySQL/Percona Server 5.6.

Q: What happens to cross db transactions? Do they not replicate?
A: These transactions will replicate, but they will have to wait until all preceding transactions have been executed. Stated differently, cross db transactions introduce serialization, so you should avoid them as much as possible if you want to benefit from parallel applying.

To be accurate, if you have db1, db2 and db3 and if you execute a transaction involving db1 and db2, transactions on db3 can still be applied in parallel. So if you have many databases, cross db transactions may not be that bad.

Q: When using MTS without GTIDs, is “Seconds_Behind_Master” from SHOW SLAVE STATUS valid?
A: Seconds_Behind_Master is based on Exec_Master_Log_Pos. And with MTS, Exec_Master_Log_Pos is not reliable as it indicates the position of the latest checkpoint and not the position of the latest executed transaction. However in practice, checkpoints will happen at least every 300ms by default, so Seconds_Behind_Master is still a good indication of the replication lag. Of course you should keep in mind the usual limitations, such as with multi-tiered replication (if the setup is A->B->C, C will report its lag against B, not against A) or when there is a replication error (then Seconds_Behind_Master is NULL).

Q: How can all the servers be realistically restarted at the same time? There could be a few sec intervals if you have multiple servers [That was when I explained how to enable GTID replication].
A: With MySQL 5.6, the requirements are pretty strict when it comes to enabling GTIDs: all servers must be restarted at the same point in time in the replication stream. As you mention, it is difficult if you have several servers, so the only viable solution is: stop the writes on the master, wait until replication has caught up on all slaves, stop all servers, change the configuration, restart all servers.

What it means is that there is a time range when all servers are down. This is a showstopper for many people, and that’s why Percona Server 5.6 now includes a patch from Facebook that allows an online migration to GTIDs and that’s why MySQL 5.7.6 also offers this option.

* * *

Thanks for all of the great questions – and I hope to see you next month at OpenStack Live and the Percona Live MySQL Conference and Expo 2015 (April 13-16) – both at the Santa Clara conference center in sunny Silicon Valley. Get more info here.

The post Q&A: Multi-threaded Replication in MySQL 5.6 and MySQL 5.7 appeared first on MySQL Performance Blog.


More on MySQL 5.6 multi-threaded replication and GTIDs (and Feb. 25 webinar)

In a previous post, titled “Multi-threaded replication with MySQL 5.6: Use GTIDs,” I explained that using GTID replication is almost a requirement when using MySQL 5.6 MTS. Let’s see now how to perform the day-to-day operations when MTS and GTIDs are both enabled. (I’ll also be presenting a related webinar next week titled “Multi-threaded Replication in MySQL 5.6 and 5.7″).

Seeing the execution gaps

If you have a look at SHOW SLAVE STATUS while the slave is running, you may not be expecting such an output:

Executed_Gtid_Set: 1381aa44-9a60-11e4-b6d8-94dbc999324d:1-2520:2522:2524:2526-2528:2531-2533:2536-2538:2540-2541:2544:2546-2547:2550-2551:2555:2565-2566:2569:2575-2577:2579-2581:2584-2586:2588:2590-2591:2595-2597:2599:2602:2604-2605:2607-2610:2613:2615-2620:2622-2624:2626-2627:2629:2631:2634:2636-2639:2641-2642:2644:2646-2647:2649:2651-2653:2657-2658:2661-2662:2666-2672:2676-2678:2680:2683-2684:2686-2693:2695:2701:2704:2706-2707:2709:2711:2713-2714:2717:2720-2722:2729-2730:2735:2744:2746:2749:2751-2752:2762:2764-2765:2768-2769:2771:2774:2776:2780-2782:2784:2786-2787:2789:2791:2793:2800:2803:2805-2807:2809:2811-2814:2816-2817:2819-2820:2822-2826:2828-2834:2837-2840:2842:2844-2845:2847:2850-2851:2853:2855:2857-2859:2861-2863:2865-2868:2870-2871:2873-2874:2878:2880-2884:2886-2888:2891:2893:2895-2896:2899:2903:2906-2907:2910:2912:2915-2918:2921-2923:2925-2926:2930:2932:2934:2936:2939-2940:2943-2944:2946:2948-2957:2966:2969-2970:2974:2976:2979-2980:2982:2985:2987:2989:2994:2996-2997:2999:3001:3003:3006:3008:3011-3013

Ouch! What does that insane list of GTIDs mean?

It is actually easy to understand as long as you know that the GTID of all executed transaction is tracked in Executed_Gtid_Set and that execution gaps are allowed with MTS.

Then 1-2520:2522:2524 simply means that transactions #1 to #2520 have been executed, as well as transactions #2522 and #2524, but not #2521 and #2523.

You can also see that a gap at a specific position will not last for long. If you run SHOW SLAVE STATUS an instant later, you will see:

Executed_Gtid_Set: 1381aa44-9a60-11e4-b6d8-94dbc999324d:1-4095:4098:4100:4103-4105[...]

This time the first execution gap is for transaction #4096.

If you stop the writes on the master, all gaps will be filled once replication has caught up and you will simply see:

Executed_Gtid_Set: 1381aa44-9a60-11e4-b6d8-94dbc999324d:1-20599

Dealing with replication errors

Say replication has stopped with an error and you want to skip the offending event. The only option with GTID replication is to inject an empty transaction, which in turn means you must know the GTID of the transaction you want to skip.

Let’s look at SHOW SLAVE STATUS:

Executed_Gtid_Set: 1381aa44-9a60-11e4-b6d8-94dbc999324d:1-1052769:1052805-1052806:1052808:1052810-1052811:1052814:1052816:1052819:1052823:1052825:1052827-1052828:1052830-1052831:1052835:1052838:1052840:1052842:1052844-1052846:1052848-1052849:1052851-1052852:1052855-1052857:1052859:1052862-1052863:1052867-1052868:1052870:1052873-1052875:1052878-1052879:1052882-1052883:1052885:1052887:1052890-1052892:1052896:1052901:1052905:1052908-1052909:1052911:1052915:1052917-1052918:1052922-1052923:1052927-1052929:1052931-1052933:1052937-1052938:1052940:1052943:1052946:1052948-1052949:1052953:1052955-1052956:1052958:1052962-1052964:1052967-1052969:1052972:1052975-1052977:1052979:1052981-1052983:1052985:1052987:1052989:1052991:1052993-1052995:1052999:1053001:1053003:1053005-1053016:1053018:1053020:1053024-1053026:1053029:1053032-1053034:1053037-1053038:1053040:1053043:1053045-1053046

So which transaction should we skip: probably 1381aa44-9a60-11e4-b6d8-94dbc999324d:1052770, right? This is the first transaction that could not be executed.

This is confirmed by looking at the Last_SQL_Error field:

Last_SQL_Error: Worker 0 failed executing transaction '1381aa44-9a60-11e4-b6d8-94dbc999324d:1052770' [...]

Once you know the GTID to skip, it is easy to restart replication (and fix the inconsistency later):

mysql> SET gtid_next='1381aa44-9a60-11e4-b6d8-94dbc999324d:1052770';
mysql> SET gtid_next='AUTOMATIC';

Taking backups

When using GTID replication, taking a backup from a multi-threaded slave is not difficult at all.

With Percona XtraBackup, simply add the --slave-info option as usual and you will get the list of executed GTIDs in the xtrabackup_slave_info file:

$ less xtrabackup_slave_info
SET GLOBAL gtid_purged='1381aa44-9a60-11e4-b6d8-94dbc999324d:1-1095246:1095248-1095249:1095253-1095254:1095257-1095266:1095270-1095271:1095273:1095275:1095284-1095285:1095296:1095305:1095311-1095312:1095315-1095316:1095318:1095321:1095323-1095324:1095326-1095332:1095335:1095337-1095338:1095348-1095351:1095353-1095354:1095356:1095358-1095359:1095363-1095364:1095366:1095368-1095369:1095372:1095375-1095376:1095378:1095380-1095385:1095388:1095390-1095392:1095394-1095411:1095419:1095424-1095425:1095428:1095430-1095433:1095437-1095439:1095442:1095444-1095445:1095447:1095449:1095461-1095464:1095468:1095473:1095482-1095484:1095488-1095490:1095494:1095496-1095497:1095499-1095500:1095502:1095505:1095507:1095509:1095511-1095516:1095521:1095524-1095525:1095527:1095530-1095531:1095534-1095552:1095554:1095557:1095559-1095560:1095563:1095566:1095569:1095572-1095573:1095582:1095594-1095595:1095597:1095601-1095605:1095607-1095610:1095612-1095614:1095618:1095621-1095623:1095625-1095628:1095630:1095632:1095634:1095636:1095639-1095642:1095645:1095649:1095659:1095661:1095664-1095665:1095667-1095669:1095671:1095674';

Then starting replication on a new instance bootstrapped from this backup is easy:

mysql> SET GLOBAL gtid_purged='...';

With mysqldump, simply discard the --dump-slave option as the list of executed GTIDs will be automatically added at the top of the dump:

-- MySQL dump 10.13  Distrib 5.6.22, for linux-glibc2.5 (x86_64)
-- GTID state at the beginning of the backup
SET @@GLOBAL.GTID_PURGED='1381aa44-9a60-11e4-b6d8-94dbc999324d:1-1095246:1095248-1095249:1095253-1095254:1095257-1095266:1095270-1095271:1095273:1095275:1095284-1095285:1095296:1095305:1095311-1095312:1095315-1095316:1095318:1095321:1095323-1095324:1095326-1095332:1095335:1095337-1095338:1095348-1095351:1095353-1095354:1095356:1095358-1095359:1095363-1095364:1095366:1095368-1095369:1095372:1095375-1095376:1095378:1095380-1095385:1095388:1095390-1095392:1095394-1095411:1095419:1095424-1095425:1095428:1095430-1095433:1095437-1095439:1095442:1095444-1095445:1095447:1095449:1095461-1095464:1095468:1095473:1095482-1095484:1095488-1095490:1095494:1095496-1095497:1095499-1095500:1095502:1095505:1095507:1095509:1095511-1095516:1095521:1095524-1095525:1095527:1095530-1095531:1095534-1095552:1095554:1095557:1095559-1095560:1095563:1095566:1095569:1095572-1095573:1095582:1095594-1095595:1095597:1095601-1095605:1095607-1095610:1095612-1095614:1095618:1095621-1095623:1095625-1095628:1095630:1095632:1095634:1095636:1095639-1095642:1095645:1095649:1095659:1095661:1095664-1095665:1095667-1095669:1095671:1095674';

And then replication can be started like stated before.


Seeing the execution gaps in the output of SHOW SLAVE STATUS can be disturbing at first sight, and of course you may have to change a few habits, but overall there is no specific issue when using GTIDs and MTS.

Next week I will be presenting a free webinar on multi-threaded replication (Wednesday February 25th at 10 a.m. PST). If you are interested in learning more on the topic, feel free to register. It will also be recorded – you’ll be able to use that same link to watch the presentation and download my slides.

The post More on MySQL 5.6 multi-threaded replication and GTIDs (and Feb. 25 webinar) appeared first on MySQL Performance Blog.


Multi-threaded replication with MySQL 5.6: Use GTIDs!

MySQL 5.6 allows you to execute replicated events in parallel as long as data is split across several databases. This feature is named “Multi-Threaded Slave” (MTS) and it is easy to enable by setting slave_parallel_workers to a > 1 value. However if you decide to use MTS without GTIDs, you may run into annoying issues. Let’s look at two of them.

Skipping replication errors

When replication stops with an error, a frequent approach is to “ignore now and fix later.” This means you will run SET GLOBAL sql_slave_skip_counter=1 to be able to restart replication as quickly as possible and later use pt-table-checksum/pt-table-sync to resync data on the slave.

Then the day when I hit:

mysql> show slave status;
Last_SQL_Error: Worker 0 failed executing transaction '' at master log mysql-bin.000017, end_log_pos 1216451; Error 'Duplicate entry '1001' for key 'PRIMARY'' on query. Default database: 'db1'. Query: 'INSERT INTO sbtest1 (id, k, c, pad) VALUES (0, 5320, '49123511666-22272014664-85739796464-62261637750-57593947547-00947134109-73607171516-11063345053-55659776318-82888369235', '11400300639-05875856680-20973514928-29618434959-69429576205')'
Exec_Master_Log_Pos: 1005432

I tried to use the trick:

mysql> set global sql_slave_skip_counter=1;
mysql> start slave;


mysql> show slave status;
Last_SQL_Error: Worker 0 failed executing transaction '' at master log mysql-bin.000017, end_log_pos 1216451; Error 'Duplicate entry '1001' for key 'PRIMARY'' on query. Default database: 'db1'. Query: 'INSERT INTO sbtest1 (id, k, c, pad) VALUES (0, 5320, '49123511666-22272014664-85739796464-62261637750-57593947547-00947134109-73607171516-11063345053-55659776318-82888369235', '11400300639-05875856680-20973514928-29618434959-69429576205')'
Exec_Master_Log_Pos: 1005882

Note that the position reported with Exec_Master_Log_Pos has moved forward, but I still have my duplicate key error. What’s wrong?

The issue is that the positions reported by SHOW SLAVE STATUS are misleading when using MTS. Quoting the documentation about Exec_Master_Log_Pos:

When using a multi-threaded slave (by setting slave_parallel_workers to a nonzero value in MySQL 5.6.3 and later), the value in this column actually represents a “low-water” mark, before which no uncommitted transactions remain. Because the current implementation allows execution of transactions on different databases in a different order on the slave than on the master, this is not necessarily the position of the most recently executed transaction.

So the solution to my problem is first to make sure that there is no execution gap, and only then to skip the offending event. There is a specific statement for the first part:

mysql> start slave until sql_after_mts_gaps;

And now I can finally skip the error and restart replication:

mysql> set global sql_slave_skip_counter=1;
mysql> start slave;
mysql> show slave statusG
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

The last thing to do is of course to resync the slave.


If you cannot trust the output of SHOW SLAVE STATUS to get the current binlog position, it means that taking a backup from a slave with parallel replication is tricky.

For instance, if you run mysqldump --dump-slave=2 to get the binlog position of the master, mysqldump will first run STOP SLAVE and then SHOW SLAVE STATUS. Is stopping the slave sufficient to avoid execution gaps? Actually, no.

The only option then seems to be: run STOP SLAVE followed by START SLAVE UNTIL SQL_AFTER_MTS_GAPS, followed by mysqldump while replication is stopped. Not very handy!

GTIDs to the rescue!

The solution for both issues is to use GTIDs.

They help when you want to skip an event because when using GTIDs, you must explicitly specify the transaction you will be skipping. It doesn’t matter whether there are execution holes.

They also help for backups because mysqldump takes the position from gtid_executed which is updated at each transaction commit (XtraBackup does that too).


If your application uses several databases and if you’re fighting with replication lag, MTS can be a great feature for you. But although GTIDs are not technically necessary, you’ll be exposed to tricky situations if you don’t use them.

Is everything rosy when using both GTIDs and MTS? Not exactly… But that will be the topic for a separate post!

By the way, if you are in the Brussels area this weekend, come see me and other great speakers at the MySQL and friends devroom at FOSDEM!

The post Multi-threaded replication with MySQL 5.6: Use GTIDs! appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com