Jun
19
2017
--

Upcoming HA Webinar Wed 6/21: Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication

High Availability

High AvailabilityJoin Percona’s MySQL Practice Manager Kenny Gryp and QA Engineer, Ramesh Sivaraman as they present a high availability webinar around Percona XtraDB Cluster, Galera Cluster, MySQL Group Replication on Wednesday, June 21, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

What are the implementation differences between Percona XtraDB Cluster 5.7, Galera Cluster 5.7 and MySQL Group Replication?

  • How do they work?
  • How do they behave differently?
  • Do these methods have any major issues?

This webinar will describe the differences and shed some light on how QA is done for each of the different technologies.

Register for the webinar here.

High AvailabilityRamesh Sivaraman, QA Engineer

Ramesh joined the Percona QA Team in March 2014. He has almost six years of experience in database administration and, before joining Percona, was giving MySQL database support to various service and product based internet companies. Ramesh’s professional interests include writing shell/Perl script to automate routine tasks and new technology. Ramesh lives in Kerala, the southern part of India, close to his family.

High AvailabilityKenny Gryp, MySQL Practice Manager

Kenny is currently MySQL Practice Manager at Percona.

Feb
08
2017
--

MySQL super_read_only Bugs

super_read_only

super_read_onlyThis blog we describe an issue with MySQL 5.7’s super_read_only feature when used alongside with GTID in chained slave instances.

Background

In MySQL 5.7.5 and onward introduced the gtid_executed table in the MySQL database to store every GTID. This allows slave instances to use the GTID feature regardless whether the binlog option is set or not. Here is an example of the rows in the gtid_executed table:

mysql> SELECT * FROM mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 00005730-1111-1111-1111-111111111111 |              1 |            1 |
| 00005730-1111-1111-1111-111111111111 |              2 |            2 |
| 00005730-1111-1111-1111-111111111111 |              3 |            3 |
| 00005730-1111-1111-1111-111111111111 |              4 |            4 |
| 00005730-1111-1111-1111-111111111111 |              5 |            5 |
| 00005730-1111-1111-1111-111111111111 |              6 |            6 |
| 00005730-1111-1111-1111-111111111111 |              7 |            7 |
| 00005730-1111-1111-1111-111111111111 |              8 |            8 |
| 00005730-1111-1111-1111-111111111111 |              9 |            9 |
| 00005730-1111-1111-1111-111111111111 |             10 |           10 |
...

To save space, this table needs to be compressed periodically by replacing GTIDs rows with a single row that represents that interval of identifiers. For example, the above GTIDs can be represented with the following row:

mysql> SELECT * FROM mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 00005730-1111-1111-1111-111111111111 |              1 |           10 |
...

On the other hand, we have the super_read_only feature, if this option is set to ON, MySQL won’t allow any updates – even from users that have SUPER privileges. It was first implemented on WebscaleSQL and later ported to Percona Server 5.6. MySQL mainstream code implemented a similar feature in version 5.7.8.

The Issue [1]

MySQL’s super_read_only feature won’t allow the compression of the mysql.gtid_executed table. If a high number of transactions run on the master instance, it causes the gtid_executed table to grow to a considerable size. Let’s see an example.

I’m going to use the MySQL Sandbox to quickly setup a Master/Slave configuration, and sysbench to simulate a high number of transactions on master instance.

First, set up replication using GTID:

make_replication_sandbox --sandbox_base_port=5730 /opt/mysql/5.7.17 --how_many_nodes=1 --gtid

Next, set up the variables for a chained slave instance:

echo "super_read_only=ON" >> node1/my.sandbox.cnf
echo "log_slave_updates=ON" >> node1/my.sandbox.cnf
node1/restart

Now, generate a high number of transactions:

sysbench --test=oltp.lua --mysql-socket=/tmp/mysql_sandbox5730.sock --report-interval=1 --oltp-tables-count=100000 --oltp-table-size=100 --max-time=1800 --oltp-read-only=off --max-requests=0 --num-threads=8 --rand-type=uniform --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test prepare

After running sysbench for awhile, we check that the number of rows in the gtid_executed table is increasing faster:

slave1 [localhost] {msandbox} ((none)) > select count(*) from mysql.gtid_executed ;
+----------+
| count(*) |
+----------+
|   300038 |
+----------+
1 row in set (0.00 sec)

By reviewing SHOW ENGINE INNODB STATUS, we can find a compression thread running and trying to compress the gtid_executed table.

---TRANSACTION 4192571, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
9 lock struct(s), heap size 1136, 1533 row lock(s), undo log entries 1525
MySQL thread id 4, OS thread handle 139671027824384, query id 0 Compressing gtid_executed table

This thread runs and takes ages to complete (or may never complete). It has been reported as #84332.

The Issue [2]

What happens if you have to stop MySQL while the thread compressing the gtid_executed table is running? In this special case, if you run the flush-logs command before or at the same time as mysqladmin shutdown, MySQL will actually stop accepting connections (all new connections hang waiting for the server) and will start to wait for the thread compressing the gtid_executed table to complete its work. Below is an example.

First, execute the flush logs command and obtain ERROR 1290:

$ mysql -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox -e "flush logs ;"
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

We’ve tried to shutdown the instance, but it hangs:

$ mysqladmin -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox shutdown
^CWarning;  Aborted waiting on pid file: 'mysql_sandbox5731.pid' after 175 seconds

This bug has been reported and verified as #84597.

The Workaround

If you already have an established connection to your database with SUPER privileges, you can disable the super_read_only feature dynamically. Once that is done, the pending thread compressing the gtid_executed table completes its work and the shutdown finishes successfully. Below is an example.

We check rows in the gtid_executed table:

$ mysql -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox -e "select count(*) from mysql.gtid_executed ;"
+----------+
| count(*) |
+----------+
|   300038 |
+----------+

We disable the super_read_only feature on an already established connection:

$ mysql> set global super_read_only=OFF ;

We check the rows in the gtid_executed table again, verifying that the compress thread ran successfully.

$ mysql -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox -e "select count(*) from mysql.gtid_executed ;"
+----------+
| count(*) |
+----------+
|        1 |
+----------+

Now we can shutdown the instance without issues:

$ mysqladmin -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox shutdown

You can disable the super_read_only feature before you shutdown the instance to compress the gtid_executed table. If you ran into bug above, and don’t have any established connections to your database, the only way to shutdown the server is by issuing a kill -9 on the mysqld process.

Summary

As shown in this blog post, some of the mechanics of MySQL 5.7’s super_read_only command are not working as expected. This can prevent some administrative operations, like shutdown, from happening.

If you are using the super_read_only feature on MySQL 5.7.17 or older, including Percona Server 5.7.16 or older (which ports the mainstream implementation – unlike Percona Server 5.6, which ported Webscale’s super_read_only implementation) don’t use FLUSH LOGS.

Mar
17
2016
--

MySQL replication primer with pt-table-checksum / pt-table-sync, part 2

MySQL replication

MySQL replicationThis is the second and last tutorial blog post on how to use pt-table-checksum / pt-table-sync tools for MySQL replication.

In the first post, I showed you how to use the

pt-table-checksum

  /

pt-table-sync

  tools in a typical MySQL replication setup. In this post, we’ll discuss MySQL replication for more advanced topologies. I will show you how to use these tools in a chained master/slave replication setup in general, and in Percona XtraDB Cluster specifically. (Here is another good post about how to use the

pt-table-checksum

  tool in Percona XtraDB Cluster.)

Let me first show you an example where I tried to run

pt-table-checksum

 /

pt-table-sync

 on a chained master/slave setup. In this example, the setup is Master -> Slave1 -> Slave2, where the master has a binary log format set to STATEMENT, while the chained Slave1 has a binary log format set to ROW.

Master:
mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| binlog_format | STATEMENT |
+---------------+-----------+
Slave1:
mysql> SHOW VARIABLES LIKE 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
[root@master ~]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication.  Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.  If you understand the risks, specify --no-check-binlog-format to disable this check.

As soon as I tried to execute

pt-table-checksum

 on this replication topology, it failed because the binary logs use different formats. The

pt-table-checksum

 documentation mentions this issue in the Limitations section.

The problem here is that Slave1 receives changes from its Master, which will be STATEMENT (

pt-table-checksum

  will force this into STATEMENT binlog_format in its session). Slave1 executes the checksum queries, and all seems good until that point. But then Slave1 will also write the changes to its binary log to replicate to Slave2.

pt-table-checksum

  only works properly when STATEMENT-based formatting is used to generate the checksum. If it’s ROW-based, the row changes to the checksums table are just replicated from the chain master,  and no checksumming happens. In this case, this is what happens with Slave2, which is why 

pt-table-checksum

  doesn’t perform its checksum.

To remedy this problem we need to ignore this checking by using –no-check-binlog-format, or we need to change the binary format to STATEMENT on chain master (Slave1).

In our second example, I’m going to run

pt-table-checksum

  on a three-node Percona XtraDB Cluster. Data inconsistencies can occur in Galera Cluster because of human errors or bugs. For this purpose, I inserted a few rows in Node1, and the Node1/Node2 test.dummy table data look like the following:

Node1 & Node2:
mysql> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)

Whereas Node3 is missing last two records, as shown below (I intentionally deleted the last couple rows with the setting wsrep_on off/on in between):

Node3:
mysql> SELECT * FROM dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
+----+------+
3 rows in set (0.00 sec)

I have filled the DSN table accordingly with checksum user credentials:

mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.10,u=checksum_user,p=checksum_password,P=3306');
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.14,u=checksum_user,p=checksum_password,P=3306');
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.169,u=checksum_user,p=checksum_password,P=3306');

pt-table-checksum

  on Percona XtraDB Cluster works with a single cluster, where all nodes are cluster nodes and not regular replicas. The example above is a good one – Node1, Node2 and Node3 all belong to a single cluster. Cluster-to-cluster setups aren’t supported. (Refer to documentation for more details.)

Let’s try to run

pt-table-checksum

  from authoritative node Node1:

root@node1:# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
08-21T08:27:12      0      1        5       1       0   0.045 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
Differences on node3
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

pt-table-checksum

  prints messages during the checksum process saying it can’t check the replica lag. It’s obvious why: SHOW SLAVE STATUS is not supported on cluster nodes as it would be in a traditional master -> slave(s) replication setup. Further,

pt-table-checksum

  found the difference on the test.dummy table. Now,

pt-table-checksum

  also supports “recursion-method=cluster,” which will auto-discover cluster nodes by querying the variable wsrep_incoming_addresses status. This new recursion-method “cluster” works equally well in Percona XtraDB Cluster. You may use “recursion-method=cluster” in place of “recursion-method=dsn” Let’s find the differences again using recursion-method “cluster”:

root@node1:~# pt-table-checksum --empty-replicate-table --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-24T02:19:48      0      0        3       1       0   0.017 percona.dsns
02-24T02:19:48      0      1        5       1       0   0.017 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
Differences on node3
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

The differences are found on Node3. Now the next big challenge is to synchronize Node3 with the other nodes in the cluster.

pt-table-sync

  is the tool to synchronize MySQL table data differences, but unfortunately, the

pt-table-sync

  tool is not supported on Galera Cluster. I tried few hacks with pt-table-sync in Percona XtraDB Cluster, but no luck (as shown below):

Sync test.dummy on node1 to node3:
root@node1:~# pt-table-sync --dry-run h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     09:11:31 09:11:31 0    test.dummy
root@node1:~# pt-table-sync --print h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;
INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;

I tried to run

pt-table-sync

  from Node1 to sync the differences on Node3. I first used –dry-run to make sure everything was good. Later, I passed the –print option to review queries before actually syncing. In XtraDB Cluster,

pt-table-sync

  prints INSERT/UPDATE/DELETE queries instead of REPLACE queries. Executing

pt-table-sync

  with the –execute option will break the cluster, as those changes will be replicated to all the cluster nodes. Since Node1 and Node2 already have those records in their table, the cluster will break with a “Duplicate Key” error. To workaround this problem, use the

pt-table-sync

  –replace option (as shown below):

root@node1:/var/lib/mysql# pt-table-sync --print --replace h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;

When I run

pt-table-sync

  with “–execute” and “–replace”, it breaks with deadlock errors (see here: https://bugs.launchpad.net/percona-toolkit/+bug/1250450). Let’s try to run those REPLACE statements from one of the authoritative nodes (i.e., Node1):

mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                          |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| node4-bin.000002 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
1 row in set (0.03 sec)
mysql> source replace.sql;
Query OK, 1 row affected (0.02 sec)
Query OK, 1 row affected (0.00 sec)
mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                          |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| node4-bin.000002 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
1 row in set (0.00 sec)

As you can see,

pt-table-sync

  tool prints REPLACE INTO statements, which seems correct as those will be no-op changes for nodes that already have that data, and replication will sync unsynchronized nodes. But here is the problem: Galera Cluster works through a ROW binary log format, and REPLACE statements won’t get to an unsynchronized node since there won’t be a binary log event recorded for REPLACE statements into the binary log if executed from a node that is already synchronized (Node1 and Node2 in this case) since a ROW image change didn’t occur.

So how do we synchronized those changes in Galera Cluster? I’ve found a couple of workarounds to synchronize nodes in Galera Cluster!

Approach # 1:
Generate a .sql script via

pt-table-sync

  tool with –print option, and review the changes that need to occur on the unsynchronized node. Once you are satisfied, you can execute this script directly on the problematic node with binary logging disabled for that particular execution (you can push SET SQL_LOG_BIN=0 at first and enable SQL_LOG_BIN=1 again at last line of script). This ensures those changes don’t synchronize to other nodes of the cluster.

NOTE: This could be risky if you forgot to disable binary logging (i.e., SET SQL_LOG_BIN=0 for the synchronization process) because as previously explained changes will replicate to all nodes, and all the remaining nodes in the cluster will go down (as those rows already exist on the other nodes). So, be careful with this method!

Also, it’s important to mention that when manually syncing with an SQL file, it may not be an atomic operation. Between the time it takes to generate the SQL file and execute it, there is a chance that the rows in question could change. To deal with this, you should engage LOCK TABLE tablename WRITE followed by UNLOCK TABLES to release the lock after the syncing process. In fact, the most secure method is to stop writing to the table that needs to be synchronized on all the nodes before running

pt-table-sync

. You can resume writes on the table after the sync operation for the target table completes. One other important thing is if there are a vast number of table rows to sync via this method, it makes sense to enable wsrep_desync (set global

wsrep_desync=on

 ) until the node gets synchronized, and then disable

wsrep_desync

  again.

As during this synchronization process, this node may lag behind and flow control will no longer take care of the desynced node.

mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)
mysql> LOCK TABLE dummy WRITE;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;
Query OK, 1 row affected (0.00 sec)
mysql> select * from test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.01 sec)
mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)
mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)

Approach # 2:
Another approach is to rebuild database table in question. I got this idea from Jervin’s impressive post. You need to use the pt-online-schema-change tool and a NOOP ALTER. You need to run it through one of the authoritative synchronization nodes of the cluster, as below:

root@node2:~# pt-online-schema-change --execute --alter 'ENGINE=INNODB' D=test,t=dummy
No slaves found.  See --recursion-method if host node4 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `test`.`dummy`...
Creating new table...
Created new table test._dummy_new OK.
Altering new table...
Altered `test`.`_dummy_new` OK.
2016-02-11T01:23:22 Creating triggers...
2016-02-11T01:23:22 Created triggers OK.
2016-02-11T01:23:22 Copying approximately 5 rows...
2016-02-11T01:23:22 Copied rows OK.
2016-02-11T01:23:22 Swapping tables...
2016-02-11T01:23:22 Swapped original and new tables OK.
2016-02-11T01:23:22 Dropping old table...
2016-02-11T01:23:22 Dropped old table `test`.`_dummy_old` OK.
2016-02-11T01:23:22 Dropping triggers...
2016-02-11T01:23:22 Dropped triggers OK.
Successfully altered `test`.`dummy`.

Once the table is rebuild, ROW images will be copied to the node needing synchronization. It might be a good idea to use the –dry-run option before doing the actual changes with –execute.

You’ll need to decide which of these approaches to choose: while this second approach seems a much safer option, it may take some time to complete. If the database table is in gigabytes and terabytes, and the differences in rows are few compared to the actual table size, then the first approach could be faster (but riskier).

In next example, I have added an async node as a replica of Node1, where Node1 acts as the master server for the async slave. The setup looks like:

node1 <-> node2 <-> node3
 |
 |
 +-> replica (async)

pt-table-checksum

  can only detect differences on a replica from the master node (i.e., Node1). In this case, the tool needs to be executed from the master node (Node1). If you run

pt-table-checksum

  from Node2 or Node3, it won’t detect diffs on the replica because its master node is Node1. You can find details about this in the documentation.

Let’s try to see all this in below examples. In this test, I’m going to run the

pt-table-checksum

  tool from Node1 (master) of the async node (slave) to see if it can find the differences on the async replica node.

root@node1:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-03T03:02:48      0      0        3       1       0   0.022 percona.dsns
09-03T03:02:48      0      1        5       1       0   0.041 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
Differences on async1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

The

pt-table-checksum

  tool can find the differences on async node test.dummy table. Let’s try to sync those diffs with the help of

pt-table-sync tool

:

root@async1:~# pt-table-sync --dry-run --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing D=test,h=localhost,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     08:38:55 08:38:55 0    test.dummy
root@async1:~# pt-table-sync --print --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;
root@async1:~# pt-table-sync --execute --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
Deadlock found when trying to get lock; try restarting transaction at line 6115 while doing test.dummy on localhost

As you can see, I executed

pt-table-sync

 tool from the async node to find the exact differences, and it printed the two rows that were missing on the dummy table on the async replica. Further, when executed with the –execute option, the tool fails with a “Deadlock” error. In reality, there is no deadlock found from the InnoDB status. In order to sync the slave with its master, you can take one of the approaches described above: either use the .sql script to execute into slave server directly, or rebuild the table on the master Galera node and it will rebuild across the slaves.

Let me show you one final example where I tried to run

pt-table-checksum

  from Node3 to find the differences on the async slave with Node1 as its master host:

root@node3:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
Diffs cannot be detected because no slaves were found.  Please read the --recursion-method documentation for information.
node3 is a cluster node but no other nodes or regular replicas were found.  Use --recursion-method=dsn to specify the other nodes in the cluster.

Here, Node3 refused to find differences because there is no direct replica attached to it. The documentation mentions this behavior, as cluster environment, differences can only be detected on the master node directly connected to async slave (as is the case here with Node1 to async replica). In order to run a checksum from Node3 to check for differences between cluster nodes, use the –recursion-method mentioned in previous examples.

Conclusion:
Galera Cluster doesn’t support

pt-table-sync

 in general. But here we’ve found a couple of workarounds to solve this problem:

  • You can sync those changes manually by redirecting the
    pt-table-sync --print

      output to file to generate a .SQL template, review changes carefully and execute it on nodes with the missing data. Again, this approach is fast but possibly dangerous.

  • Another option is the noop alter method using
    pt-online-schema-change

      to rebuild the table. It’s the recommended method, but it could be slower.

Finally, after synchronizing the data changes it is a good idea to re-run the checksum tool to verify any discrepancies.

Aug
18
2015
--

Featured Talk: The Future of Replication is Today: New Features in Practice

In the past years, both MySQL 5.6, MySQL 5.7 and MariaDB 10 have been successful implementing new features. For many DBAs, the “old way” of replicating data is comfortable so taking the action to implement these new features seems like a momentous leap rather then a simple step. But perhaps it isn’t that complicated…

Giuseppe Maxia, a Quality Assurance Architect at VMware and loyal member of the Percona Live Confepercona-2015DSC_4112rence Committee will be presenting “The Future of Replication is Today: New Features in Practice” at the Percona Live Data Performance Conference this September in Amsterdam.
Percona’s Community Manager, Tom Diederich had an opportunity to catch up with Giuseppe last week and get an in-depth look at some of the items Giuseppe will be covering in his talk in addition to getting his take on some of the hot sessions to hit while at the conference.  This is how it went:

(Hint: Read to the end to find a special discount code) 

 

Tom: Your talk is titled, “The Future of Replication is today: new features in practice.” What are the top 3 areas in which replication options have improved in MySQL 5.6, MySQL 5.7, and MariaDB 10?
Giuseppe: Replication has been stagnant for over 10 years. Before MySQL 5.6, the only important change in the technology was the introduction of row-based replication in 2008. After that, we had to wait till 2013 to see global transaction identifiers in MySQL 5.6, followed by the same feature, with different implementation in 2014 with MariaDB 10. GTID has been complemented, in both flavors, with crash-safe replication tables, which is a feature that guarantees a reliable resume of replication after a server failure. There is also the parallel applier, a minor feature that has been implemented in both MySQL 5.6 and MariaDB, and improved in latest versions, although it seems to lack proper support for monitoring. The last feature that was introduced in MySQL 5.6 and MariaDB 10 is multi-source replication, i.e. the ability of replicating from multiple masters to a single slave. In both editions, the implementation is quite simple, and not so different from what DBAs are used to do for regular replication.
Tom: For DBAs, how difficult will it be to make the change from the “old way” of replicating data — to stop using the same comfortable features that have been around for several years — and put into practice some of the latest features?
Giuseppe: The adoption of new features can be deceptively simple. For example, GTID in MariaDB comes out of the box and its adoption could be as easy as running a backup followed by a restore, but it can produce unpleasant results if you try to combine this feature with multi-source replication without planning ahead. That said, the transition could be simpler than its counterpart in MySQL.
MySQL 5.6 and 5.7 require some reconfiguration to run GTID, and users can face unpleasant failures due to the complexity of the rules applying to this feature. They will need to read the manual thoroughly and test the deployment extensively before trusting an upgrade in production.
For multi-source replication, the difficulties are, in my experience, hidden in the users expectations. When speaking about multi-source (or multi-masters, as it is commonly referred to), many users have the mistaken expectation that they can easily insert anything in multiple masters as if they were doing it in a single server. However, the nature of asynchronous replication and the current implementation of multi-source topologies do not handle conflicts, and this fact will probably surprise and anger the early adopters.
Tom: What is still missing in replication technology? How can MySQL improve?
Giuseppe: There are two areas where the current implementation is lacking. The first one is monitoring data: while new features have been adding up to replication, there is not enough effort made to cover the monitoring needs. The current way of monitoring replication is hard-wired around the original replication feature, and little has been done to give the users a deeper view of what is going on. With the latest releases at our disposal, we can run parallel replication using multiple masters, and yet we have very little visibility on what goes on inside the dozen of threads that the new features can unchain inside a single slave. It’s like driving a F1 racing car with the dashboard of a Ford model-T. MySQL 5.7 has moved a few steps in that direction, with the new replication tables in performance_schema, but it is still a drop in the ocean compared to what we need.
The second area where replication is still too much tied with its past is in heterogeneous replication. While relational databases are still dominating the front-end of the web economy, its back-end is largely being run by different structures, such as Hadoop, MongoDB, Cassandra. Moving data back and forth between the relational storage and its growing siblings has become an urgent need. There have been a few sparks of change in this direction, but nothing that can qualify as promising changes.
Tom: Which other session(s) are you most looking forward to besides your own?
Giuseppe: I am always interested in the sessions that explain and discuss new features. I am most interested in the talks by Oracle engineers, who have been piling up many features in the latest years, and I am sure they have something more up their sleeve that will appear at the conference. I also attend eagerly sessions about complementary tools, which are usually highly educational and often give me more ideas.

Want to read more on the topic? Visit Giuseppe’s blog:

 MySQL Replication Monitoring 101

The Percona Live Data Performance Conference is the premier event for the rich and diverse MySQL, NoSQL and data in the cloud ecosystems in Europe. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and IoT (Internet of Things) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to MySQL, NoSQL and Data in the Cloud. Attendees will get briefed on the hottest topics, learn about building and maintaining high-performing deployments and hear from top industry leaders.

The Percona Live Europe Data Performance Conference will be September 21-23 at the Mövenpick Hotel Amsterdam City Centre.

Register using code “FeaturedTalk” and save 20 euros off of registration!

Hope to see you in Amsterdam!

The post Featured Talk: The Future of Replication is Today: New Features in Practice appeared first on MySQL Performance Blog.

Aug
11
2015
--

MySQL replication primer with pt-table-checksum and pt-table-sync

MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.

It’s essential to make sure the slave servers have the same set of data as the master to ensure data is consistent within the replication stream. MySQL slave server data can drift from the master for many reasons – e.g. replication errors, accidental direct updates on slave, etc.

Here at Percona Support we highly recommend that our customers periodically run the pt-table-checksum tool to verify data consistency within replication streams. Specifically, after fixing replication errors on slave servers to ensure that the slave has identical data as its master. As you don’t want to put yourself in a situation where you need to failover to a slave server for some reason and you find different data on that slave server.

In this post, I will examine the pt-table-checksum and pt-table-sync tools usage from Percona Toolkit on different replication topologies. We often receive queries from customers about how to run these tools and I hope this post will help.

Percona Toolkit is a free collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.

One of those tools is pt-table-checksum, which works by dividing table rows into chunks of rows. The size of a chunk changes dynamically during the operation to avoid overloading the server. pt-table-checksum has many safeguards including variation into chunk size to make sure queries run in a desired amount of time.

pt-table-checksum verifies chunk size by running EXPLAIN query on each chunk. It also monitors slave server’s continuously in order to make sure replicas not falls too far behind and in this case tool pauses itself to allow slave to catch up. Along with that there are many other safeguards builtin and you can find all the details in this documentation

In my first example case, I am going to run pt-table-checksum against pair of replication servers – i.e. master having only one slave in replication topology. We will run pt-table-checksum tool on master server to verify data integrity on slave and in case If differences found by pt-table-checksum tool we will sync those changes on slave server via pt-table-sync tool.

I have created a dummy table under test database and inserted 10 records on master server as below:

mysql-master> create table dummy (id int(11) not null auto_increment primary key, name char(5)) engine=innodb;
Query OK, 0 rows affected (0.08 sec)
mysql-master> insert into dummy VALUES (1,'a'), (2,'b'), (3,'c'), (4,'d'), (5,'e'), (6,'f'), (7,'g'), (8,'h'), (9,'i'), (10,'j');
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0
mysql-master> select * from dummy;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
|    4 | d    |
|    5 | e    |
|    6 | f    |
|    7 | g    |
|    8 | h    |
|    9 | i    |
|   10 | j    |
+------+------+
10 rows in set (0.00 sec)

Then I intentionally deleted a few records from the slave server to make it inconsistent with the master for the purpose of this post.

mysql-slave> delete from dummy where id>5;
Query OK, 5 rows affected (0.03 sec)
mysql-slave> select * from dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)

Now, in this case the master server has 10 records on the dummy table while the slave server has only 5 records missing records from id>5 – we will run pt-table-checksum at this point on the master server to see if the pt-table-checksum tool catches those differences.

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-11T18:30:13      0      1       10       1       0   1.044 test.dummy

This needs to be executed on the master. The user and password you specify will be used to not only connect to the master but the slaves as well. You need the following privileges for the pt-table-checksum mysql user:

mysql-master> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `checksum_user`@'%' IDENTIFIED BY 'checksum_password';
mysql-master> GRANT ALL PRIVILEGES ON percona.* TO `checksum_user`@'%';

Earlier, in pt-table-checksum command, I used –replicate option which writes replication queries to mentioned table percona.checksums. Next I passed  –ignore-databases option which accepts comma separated list of databases to ignore. Moreover, –create-replicate-table and —empty-replicate-table options are “Yes” by default and you can specify both options explicitly if you want to create database table different then percona.checksums.

pt-table-checksum reported 1 DIFF which is number of chunks which are different from master on one or more slaves. You can find details about tabular columns e.g. TS, ERRORS and so on on documentation of pt-table-checksum. After that, I ran next command to identify which table has difference on slave.

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
Differences on slave
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -5 1

In this command I used –replicate-check-only option which only reports the tables with having differences vice versa only checksum differences on detected replicas are printed. It doesn’t checksum any tables. It checks replicas for differences found by previous checksumming, and then exits.

You may also login to the slave and also execute below query to find out which tables have inconsistencies.

mysql-slave> SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM percona.checksums
WHERE (
master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;

pt-table-checksum identified test.dummy table is different on slave now we are going to use pt-table-sync tool to synchronize table data between MySQL servers.

[root@slave]# pt-table-sync --print --replicate=percona.checksums --sync-to-master h=localhost,u=checksum_user,p=checksum_password
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('6', 'f') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('7', 'g') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('8', 'h') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('9', 'i') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('10', 'j') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;

I ran the pt-table-sync tool from an opposite host this time i.e. from the slave as I used the –sync-to-master option which treats DSN as slave and syncs to master. Again, pt-table-sync will use the mysql username and password you specify to connect to the slave as well as to its master. –replicate option here examines the specified table to find out the data differences and –print just prints the SQL  (REPLACE queries) not actually executes it.

You may audit the queries before executing to sync data between master/slave.  You may see it printed only missing records on the slave. Once you are happy with the results, you can substitute –print with –execute to do actual synchronization.

As a reminder, these queries always executed on the master as this is the only safe way to do the changes on slave. However, on the master it’s no-op changes as these records already exists on master but then falls to slave via replication stream to sync it with master.

If you find lots of differences on your slave server it may lag during synchronization of those changes. As I mentioned earlier, you can use –print option to go through your queries which are going to be executed to sync slave with master server. I found this post useful if you see a huge difference in the table between master/slave(s).

Note, you may use the –dry-run option initially which only analyzes print information about the sync algorithm and then exits. It shows verbose output; it doesn’t do any changes though. –dry-run parameter will basically instruct pt-table-sync to not actually do the sync, but just perform some checks.

Let me present another replication topology, where the master has two slaves where slave2 is running on non-default port 3307 while master and slave1 running on port 3306. Further, slave2 is out of sync with master and I will show you how to sync slave2 which running on port 3307 with master.

mysql-master> SELECT * FROM dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql-slave1> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql-slave2> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
+----+------+

Let’s run pt-table-checksum tool on master database server.

[root@master]# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=192.168.0.130,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-23T13:57:39      0      0        2       1       0   0.310 percona.dsns
07-23T13:57:39      0      1        5       1       0   0.036 test.dummy

I used –recursion-method parameter this time which is method to use find slaves in replication stream and it’s pretty useful when your servers run on non-standard port i.e. other than 3306. I created dsns table under percona database with following entries. You may find dsns table structure in documentation.

mysql> SELECT * FROM dsns;
+----+-----------+------------------------------------------------------------+
| id | parent_id | dsn                                                        |
+----+-----------+------------------------------------------------------------+
|  1 |         1 | h=192.168.0.134,u=checksum_user,p=checksum_password,P=3306 |
|  2 |         2 | h=192.168.0.132,u=checksum_user,p=checksum_password,P=3307 |
+----+-----------+------------------------------------------------------------+

Next I ran below pt-table-checksum command to identify which slave server has differences on test.dummy table.

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=192.168.0.130,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

This shows that slave2 has different data on test.dummy table as compared to the master. Now let’s run pt-table-sync tool to sync those differences and make slave2 identical as the master.

[root@slave2] ./pt-table-sync --print --replicate=percona.checksums --sync-to-master h=192.168.0.132,u=checksum_user,p=checksum_password
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum dst_db:test dst_tbl:dummy dst_dsn:h=192.168.0.132,p=...,u=checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1514 user:root host:slave2*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum dst_db:test dst_tbl:dummy dst_dsn:h=192.168.0.132,p=...,u=checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1514 user:root host:slave2*/;

It shows 2 rows are different on slave2. Substituting –print with –execute synchronized the differences on slave2 and re-running pt-table-checksum tool shows no more differences.

Conclusion:
pt-table-checksum and pt-table-sync are the finest tools from Percona Toolkit to validate data between master/slave(s). With the help of these tools you can easily identify data drifts and fix them. I mentioned a couple of replication topologies above about how to check replication consistency and how to fix it in case of data drift. You may script pt-table-checksum / pt-table-sync steps and cron checksum script to periodically check the data consistency within replication stream.

This procedure is only safe for a single level master-slave(s) hierarchy. I will discuss the procedure for other topologies in future posts – i.e. I will describe more complex scenarios on how to use these tools in chain replication i.e. master -> slave1 -> slave2 pair and in Percona XtraDB Cluster setup.

The post MySQL replication primer with pt-table-checksum and pt-table-sync appeared first on MySQL Performance Blog.

Mar
09
2015
--

5 free handy tools for monitoring and managing MySQL replication

MySQL Replication is very simple to set up. In this post I’ll discuss its importance and five handy tools for monitoring and managing MySQL replication.

What is MySQL Replication? It’s the process of copying the (real-time events) data from one master instance to another slave instance and maintaining the redundant consistent data in a different machine. This enables a distributed database system that shares the same level of information.

In MySQL the replication works based on the three threads as shown below.

1) I/O thread on the slave server:  To start on receiving replication events, an I/O thread starts on the slave server and connects to the master server.

2) Master connection handler thread:  As a connection handier, master starts a thread whenever a replication slave connects to a master. The master server sends the events from its binary log file to the slave I/O thread, notifying slave about newly written events to its binary log. The slave I/O thread which records them to in the slave’s relay log file.

3) Slave SQL thread:  When it starts, immediately reads the events from the relay log and applies on the Slave DB. Once it finishes the processing of every relay log and if the I/O thread is writing the events to a new relay log file then it deletes the processed one. Suppose if the  I/O thread is writing  the events on a relay log and which is the same file SQL thread is reading then the SQL thread pauses until more events are available in the relay log.

MySQL replication (slave) redundant instances is an excellent method of improving data performance and stability. It ensures the availability of another copy of a database whenever there arises any issues with the master server. One of the other advantages is the report query (select) offloading to a slave server, which is a common practice to reduce the workload of the master DB server as there are multiple servers that are able to respond to the queries. The third advantage is to schedule the backup from the slave server, etc.

All the benefits discussed above are smart and effective only if replication is up and running and the data is in sync with the master.

Let us see the set of very useful tools from Percona Toolkit which help you in monitoring and managing the MySQL replication (Slaves).

5 free handy tools for monitoring and managing MySQL replication1) pt-heartbeat: Tool measures/monitor replication lag on a MySQL in real time. It is important to have a replication monitoring system to confirm that replication is up and running and lag is current.

In typical way of monitoring, we use “SHOW SLAVE STATUS” to find out the information like Slave_IO_Running: Yes, Slave_SQL_Running: Yes and Seconds_Behind_Master: 0 etc, but is not reliable as  Seconds_Behind_Master shows difference between last timestamp read in the binlogs and current time. Many reasons like slow network, long running queries, blocking operations or a 2nd level slaves(Master > slave1> slave2) etc. can produce an irrelevant value for the variable.

So I recommend to use pt-heartbeat, which depends only on the heartbeat record being replicated to slave instead of the above said unreliable method of finding the lag. pt-heartbeat will insert/update a row in the master and the time delay is calculated depending on when the data was inserted and when it became available to read in the slave. It works at any depth in the replication hierarchy. For example, it will reliably report how far a slave lags its original master (master’s master).

Example :

On Master:
[root@Tst1Master ~]#pt-heartbeat --daemonize -D test --update -h<IP address> --create-table
On Slave:
[root@Tst1Slave ~]#pt-heartbeat -D test --monitor --master-server-id 1
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]

We used to schedule the backup from the slave to avoid the additional load with the master server. In this case it is important to confirm the slave is current with the master to ascertain the backup is having the recent data. Here is a simple script you can use to verify the replication status on a periodical basis(cron) and to know the status just before the backup scheduled.

#!/bin/bash
#     <300 - [Good]
#     300> <600 - [Warning]
#     > 600 - [Critical]
MAIL_FROM="root@`hostname`"
MAIL_TO="mailid@mail.com"
Warningthreshold=300
Criticalthreshold=600
backup=$1
CMD=$(/root/bin/pt-heartbeat -D test --master-server-id 1 --check | cut -d. -f1)
# Pass the parameter "test.sh backup" to denote the call is from the backup script.
if [ $CMD -lt $Warningthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Good] current delay: "$CMD;
elif [ $CMD -gt $Warningthreshold ] && [ $CMD -lt $Criticalthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Warning] current delay: "$CMD;
elif [ $CMD -gt $Criticalthreshold ]
then
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Critical] current delay: $CMD Check the replication"
else
MESSAGE=`date +'%m:%d:%Y %H:%M:%S'`" [Error] Replication status check failed need to investigate."
fi
#No arguments supplied"
if [ -z "$1" ] && [ $CMD -gt $Warningthreshold ]
then
(echo "Subject: Replication status on `hostname`";
echo "Replication status : "
echo $MESSAGE
)  | /usr/sbin/sendmail -O NoRecipientAction=add-to -f${MAIL_FROM} ${MAIL_TO}
elif [ $# -eq 1 ]
then
(echo "Subject: Replication status check prior to backup on `hostname`";
echo "Replication status prior to backup:"
echo $MESSAGE
)  | /usr/sbin/sendmail -O NoRecipientAction=add-to -f${MAIL_FROM} ${MAIL_TO}
fi

2) pt-slave-find: Finds and prints replication hierarchy of the slaves – shows you the topology and replication hierarchy of your MySQL replication instances.

Example :

[root@Tst1Master ~]# ./pt-slave-find --host=192.168.56.10
192.168.56.10
Version         5.6.22-72.0-log
Server ID       1
Uptime          42:09 (started 2015-03-03T01:40:42)
Replication     Is not a slave, has 1 slaves connected, is not read_only
Filters
Binary logging  STATEMENT
Slave status
Slave mode      STRICT
Auto-increment  increment 1, offset 1
InnoDB version  5.6.22-72.0
+- 192.168.56.11
   Version         5.6.22-72.0
   Server ID       2
   Uptime          41:48 (started 2015-03-03T01:41:03)
   Replication     Is a slave, has 0 slaves connected, is not read_only
   Filters
   Binary logging  STATEMENT
   Slave status    0 seconds behind, running, no errors
   Slave mode      STRICT
   Auto-increment  increment 1, offset 1
   InnoDB version  5.6.22-72.0

[root@Tst1Master ~]# ./pt-slave-find --host=192.168.56.10 --report-format=hostname
192.168.56.10
+- 192.168.56.11

3) pt-slave-restart: Watches the MySQL replication slaves for any error and tries to restart the replication.

The tool is very useful for skipping statements that cause errors and continuing replication. If you use this carelessly, the slave will be having the inconsistent data. However  when you use the tool, I recommended you to confirm the consistency of data between master and slave with help of pt-table-checksum.

Example : Restart the slave for error-numbers=1062 (Duplicate entry ‘1’ for key ‘PRIMARY’)

#pt-slave-restart --socket=/var/lib/mysql/custom-feeds/mysql.sock --ask-pass --error-numbers=1062

4) pt-table-checksum: Performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

Example :

[root@Tst1Master ~]# ./pt-table-checksum -dD
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-03T02:34:44      0      1        2       1       0   0.011 d.t

Note: It  is important to run the pt-table-checksum tool regardless of whether or not you’ve ever skipped an event with pt-slave-restart to make sure we are having the identical data on the slave side.

5) pt-table-sync: Sync the slave with their master (synchronizes data efficiently between MySQL tables.)

Example :

[root@Tst1Slave ~]# ./pt-table-sync -dD --print  --sync-to-master 192.168.56.11
REPLACE INTO `d`.`t`(`id`, `data`) VALUES ('1', 'Test1') /*percona-toolkit src_db:d src_tbl:t src_dsn:P=3306,h=192.168.56.10 dst_db:d dst_tbl:t dst_dsn:h=192.168.56.11 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:6435 user:root host:Tst1Slave.mysql*/;
REPLACE INTO `d`.`t`(`id`, `data`) VALUES ('2', 'Test2') /*percona-toolkit src_db:d src_tbl:t src_dsn:P=3306,h=192.168.56.10 dst_db:d dst_tbl:t dst_dsn:h=192.168.56.11 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:6435 user:root host:Tst1Slave.mysql*/;
[root@Tst1Slave ~]#

[root@Tst1Slave ~]# ./pt-table-sync -dD  --verbose  --execute  --sync-to-master 192.168.56.11
# Syncing h=192.168.56.11
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       2      0      0 Chunk     03:38:09 03:38:09 2    d.t

We have successfully sync the tables so let us try the checksum again and confirm the table is in sync.

[root@Tst1Master ~]# ./pt-table-checksum -dD
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-03T03:03:40      0      0        2       1       0   0.111 d.t

The aforesaid are the simple examples and based on your needs, you can choose the tools, options or modify the scripts. I also recommend that you to go through the documentations for more details on each tools.

The post 5 free handy tools for monitoring and managing MySQL replication appeared first on MySQL Performance Blog.

Oct
07
2014
--

MySQL Replication: ‘Got fatal error 1236? causes and cures

MySQL Replication: 'Got fatal error 1236' causes and curesMySQL replication is a core process for maintaining multiple copies of data – and replication is a very important aspect in database administration. In order to synchronize data between master and slaves you need to make sure that data transfers smoothly, and to do so you need to act promptly regarding replication errors to continue data synchronization. Here on the Percona Support team, we often help customers with replication broken-related issues. In this post I’ll highlight the top most critical replication error code 1236 along with the causes and cure. MySQL replication error “Got fatal error 1236” can be triggered by multiple reasons and I will try to cover all of them.

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ‘log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event ‘binlog.000201′ at 5480571

This is a typical error on the slave(s) server. It reflects the problem around max_allowed_packet size. max_allowed_packet refers to single SQL statement sent to the MySQL server as binary log event from master to slave. This error usually occurs when you have a different size of max_allowed_packet on the master and slave (i.e. master max_allowed_packet size is greater then slave server). When the MySQL master server tries to send a bigger packet than defined on the slave server,  the slave server then fails to accept it and hence the error. In order to alleviate this issue please make sure to have the same value for max_allowed_packet on both slave and master. You can read more about max_allowed_packet here.

This error usually occurs when updating a huge number of rows on the master and it doesn’t fit into the value of slave max_allowed_packet size because slave max_allowed_packet size is lower then the master. This usually happens with queries “LOAD DATA INFILE” or “INSERT .. SELECT” queries. As per my experience, this can also be caused by application logic that can generate a huge INSERT with junk data. Take into account, that one new variable introduced in MySQL 5.6.6 and later slave_max_allowed_packet_size which controls the maximum packet size for the replication threads. It overrides the max_allowed_packet variable on slave and it’s default value is 1 GB. In this post, “max_allowed_packet and binary log corruption in MySQL,”my colleague Miguel Angel Nieto explains this error in detail.

Got fatal error 1236 from master when reading data from binary log: ‘Could not find first log file name in binary log index file’

This error occurs when the slave server required binary log for replication no longer exists on the master database server. In one of the scenarios for this, your slave server is stopped for some reason for a few hours/days and when you resume replication on the slave it fails with above error.

When you investigate you will find that the master server is no longer requesting binary logs which the slave server needs to pull in order to synchronize data. Possible reasons for this include the master server expired binary logs via system variable expire_logs_days – or someone manually deleted binary logs from master via PURGE BINARY LOGS command or via ‘rm -f’ command or may be you have some cronjob which archives older binary logs to claim disk space, etc. So, make sure you always have the required binary logs exists on the master server and you can update your procedures to keep binary logs that the slave server requires by monitoring the “Relay_master_log_file” variable from SHOW SLAVE STATUS output. Moreover, if you have set expire_log_days in my.cnf old binlogs expire automatically and are removed. This means when MySQL opens a new binlog file, it checks the older binlogs, and purges any that are older than the value of expire_logs_days (in days). Percona Server added a feature to expire logs based on total number of files used instead of the age of the binlog files. So in that configuration, if you get a spike of traffic, it could cause binlogs to disappear sooner than you expect. For more information check Restricting the number of binlog files.

In order to resolve this problem, the only clean solution I can think of is to re-create the slave server from a master server backup or from other slave in replication topology.

– Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000525′ at 175770780, the last event read from ‘/data/mysql/repl/mysql-bin.000525′ at 175770780, the last byte read from ‘/data/mysql/repl/mysql-bin.000525′ at 175771648.’

Usually, this caused by sync_binlog <>1 on the master server which means binary log events may not be synchronized on the disk. There might be a committed SQL statement or row change (depending on your replication format) on the master that did not make it to the slave because the event is truncated. The solution would be to move the slave thread to the next available binary log and initialize slave thread with the first available position on binary log as below:

mysql> CHANGE MASTE R TO MASTER_LOG_FILE='mysql-bin.000526', MASTER_LOG_POS=4;

– [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: ‘Client requested master to start replication from impossible position; the first event ‘mysql-bin.010711′ at 55212580, the last event read from ‘/var/lib/mysql/log/mysql-bin.000711′ at 4, the last byte read from ‘/var/lib/mysql/log/mysql-bin.010711′ at 4.’, Error_code: 1236

I foresee master server crashed or rebooted and hence binary log events not synchronized on disk. This usually happens when sync_binlog != 1 on the master. You can investigate it as inspecting binary log contents as below:

$ mysqlbinlog --base64-output=decode-rows --verbose --verbose --start-position=55212580 mysql-bin.010711

You will find this is the last position of binary log and end of binary log file. This issue can usually be fixed by moving the slave to the next binary log. In this case it would be:

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000712', MASTER_LOG_POS=4;

This will resume replication.

To avoid corrupted binlogs on the master, enabling sync_binlog=1 on master helps in most cases. sync_binlog=1 will synchronize the binary log to disk after every commit. sync_binlog makes MySQL perform on fsync on the binary log in addition to the fsync by InnoDB. As a reminder, it has some cost impact as it will synchronize the write-to-binary log on disk after every commit. On the other hand, sync_binlog=1 overhead can be very minimal or negligible if the disk subsystem is SSD along with battery-backed cache (BBU). You can read more about this here in the manual.

sync_binlog is a dynamic option that you can enable on the fly. Here’s how:

mysql-master> SET GLOBAL sync_binlog=1;

To make the change persistent across reboot, you can add this parameter in my.cnf.

As a side note, along with replication fixes, it is always a better option to make sure your replica is in the master and to validate data between master/slaves. Fortunately, Percona Toolkit has tools for this purpose: pt-table-checksum & pt-table-sync. Before checking for replication consistency, be sure to check the replication environment and then, later, to sync any differences.

The post MySQL Replication: ‘Got fatal error 1236′ causes and cures appeared first on MySQL Performance Blog.

Jul
25
2014
--

Monitoring MySQL flow control in Percona XtraDB Cluster 5.6

Monitoring flow control in a Galera cluster is very important. If you do not, you will not understand why writes may sometimes be stalled. Percona XtraDB Cluster 5.6 provides 2 status variables for such monitoring: wsrep_flow_control_paused and wsrep_flow_control_paused_ns. Which one should you use?

What is flow control?

Flow control does not exist with regular MySQL replication, but only with Galera replication. It is simply the mechanism nodes are using when they are not able to keep up with the write load: to keep replication synchronous, the node that is starting to lag instructs the other nodes that writes should be paused for some time so it does not get too far behind.

If you are not familiar with this notion, you should read this blogpost.

Triggering flow control and graphing it

For this test, we’ll use a 3-node Percona XtraDB Cluster 5.6 cluster. On node 3, we will adjust gcs.fc_limit so that flow control is triggered very quickly and then we will lock the node:

pxc3> set global wsrep_provider_options="gcs.fc_limit=1";
pxc3> flush tables with read lock;

Now we will use sysbench to insert rows on node 1:

$ sysbench --test=oltp --oltp-table-size=50000 --mysql-user=root --mysql-socket=/tmp/pxc1.sock prepare

Because of flow control, writes will be stalled and sysbench will hang. So after some time, we will release the lock on node 3:

pxc3> unlock tables;

During the whole process, wsrep_flow_control_paused and wsrep_flow_control_paused_ns are recorded every second with mysqladmin ext -i1. We can then build a graph of the evolution of both variables:

wsrep_flow_control_pxc3

While we can clearly see when flow control was triggered on both graphs, it is much easier to know when flow control was stopped with wsrep_flow_control_paused_ns. It would be even more obvious if we have had several timeframes when flow control is in effect.

Conclusion

Monitoring a server is obviously necessary if you want to be able to catch issues. But you need to look at the right metrics. So don’t be scared if you are seeing that wsrep_flow_control_paused is not 0: it simply means that flow control has been triggered at some point since the server started up. If you want to know what is happening right now, prefer wsrep_flow_control_paused_ns.

The post Monitoring MySQL flow control in Percona XtraDB Cluster 5.6 appeared first on MySQL Performance Blog.

Mar
19
2014
--

How to log slow queries on Slave in MySQL 5.0 with pt-query-digest

Working as a Percona Support Engineer, every day we are seeing lots of issues related to MySQL replication. One very common issue is slave lagging. There are many reasons for slave lag but one common reason is that queries are taking more time on slave then master. How to check and log those long-running queries?  From MySQL 5.1, log-slow-slave-statements variable was introduced, which you can enable on slave and log slow queries. But what if you want to log slow queries on slave in earlier versions like MySQL 5.0?  There is a good solution/workaround: pt-query-digest. How? let’s take a look….

If you want to log all queries that are running on slave (including those, running by sql thread), you can use pt-query-digest with –processlist and –print (In pt-query-digest 2.1.9) OR –output (In pt-query-digest 2.2.7) options and log all queries in specific file. I have tested it in my local environment and it works.

You can start pt-query-digest like below on Slave,

nil@Dell:~$ /percona-toolkit-2.1.9/bin/pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --print --no-report
OR
nil@Dell:-$ /percona-toolkit-2.2.7/bin/pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --no-report --output=slowlog

Run some long running queries on Master,

nil@Dell:~$ mysql -umsandbox -p --socket=/tmp/mysql_sandbox34497.sock
Enter password:
mysql> use percona
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> delete from test limit 5000000;
Query OK, 5000000 rows affected (1 min 54.33 sec)
mysql> delete from test limit 5000000;
Query OK, 5000000 rows affected (1 min 56.42 sec)

mysql>

and you’ll see the output on Slave like this,

nil@Dell:~/Downloads/percona-toolkit-2.1.9/bin$ ./pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --print --no-report
# Time: 2014-03-18T12:10:57
# User@Host: system user[system user] @ []
# Query_time: 114.000000 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
use percona;
delete from test limit 5000000;
nil@Dell:~/Downloads/percona-toolkit-2.2.7/bin$ pt-query-digest --processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock --no-report --output=slowlog
# Time: 2014-03-18T12:21:05
# User@Host: system user[system user] @ []
# Query_time: 117.000000 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
use percona;
delete from test limit 5000000;

You can also run pt-query-digest in background like a daemon and send output to the specific file like slow.log and review it.

i.e /percona-toolkit-2.1.9/bin/pt-query-digest –processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock –print –no-report > slow.log 2>&1

OR

i.e /percona-toolkit-2.2.7/bin/pt-query-digest –processlist u=msandbox,p=msandbox,S=/tmp/mysql_sandbox34498.sock –no-report –output=slowlog > slow.log 2>&1

Here, the default output will be just like slow query log. If we have master-master replication where every master is slave too and we want to log only those statements that are executing by sql_thread then –filter option can be used like this:

pt-query-digest –filter ‘$event->user eq “system user”‘ –no-report –output=slowlog

Since pt-query-digest–processlist polls 10 times/second ( –interval option), it’s not reliable to use for collecting complete query logs, because quick queries could fall in between the polling intervals. And in any case, it won’t measure query time with precision any better than 1/10th of a second. But if the goal is to identify queries that are very long-running, it should be adequate.

The post How to log slow queries on Slave in MySQL 5.0 with pt-query-digest appeared first on MySQL Performance Blog.

Feb
18
2013
--

Percona MySQL University comes to Toronto on March 22

Percona CEO Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Percona CEO Peter Zaitsev leads a track at the inaugural Percona MySQL University event in Raleigh, N.C. on Jan. 29, 2013.

Following our events in Raleigh, Montevideo, Buenos Aires, Percona MySQL University comes to Toronto on March 22nd.

This is going to our most dense event yet, absolutely packed with information. Even though we have just 1 track we have 12 talks and 11 speakers. We had unique opportunity this time because Percona’s Consulting, Support, RemoteDBA, Training team are having internal meeting at the start of the week so we had many speakers available.

Specifically the MySQL High Availability, Replication, Clustering are getting a lot of coverage.

Special thanks to FreshBooks who kindly agreed to host this event at their office.

Event is FREE but space is limited, so reserve your space by registering now!

More info
What is Percona MySQL University? It’s a series of one-day, free events designed to educate and inform developers and system architects on the latest and greatest MySQL products, services and technologies. The practical knowledge you’ll receive will help you be more successful in tackling your own MySQL challenges. Percona MySQL University is an opportunity to learn from, and connect with, some of the world’s top experts in MySQL performance and scaling. It’s also a fantastic networking opportunity among your MySQL peers.

The post Percona MySQL University comes to Toronto on March 22 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com