Feb
26
2015
--

Worrying about the ‘InnoDB: detected cycle in LRU for buffer pool (…)’ message?

If you use Percona Server 5.5 and you have configured it to use multiple buffer pool instances than sooner or later you’ll see the following lines on the server’s error log and chances are you’ll be worried about them:

InnoDB: detected cycle in LRU for buffer pool 5, skipping to next buffer pool.
InnoDB: detected cycle in LRU for buffer pool 3, skipping to next buffer pool.
InnoDB: detected cycle in LRU for buffer pool 7, skipping to next buffer pool.

Worry not as this is mostly harmless. It’s becoming a February tradition for me (Fernando) to face a question about this subject (ok, it’s maybe a coincidence) and this time I’ve teamed up with my dear colleague and software engineer George Lorch to provide you the most complete blog post ever published on this topic(with a belated thank you! to Ernie Souhrada, with whom I’ve also discussed this same matter one year ago).

InnoDB internals: what is “LRU” ?

There’s a short and to-the-point section of the MySQL manual that explains in a clear way what is the InnoDB buffer pool, how it operates and why it plays such an important role in MySQL performance. If you’re interested in understanding InnoDB internals then that page is a good start. In this section we’ll refrain ourselves to explain what the “LRU” that shows in our subject message is so we’ll only slightly dig into InnoDB internals, enough to make for some context. Here’s a quick introduction to the buffer pool, quoting from the above manual page:

InnoDB maintains a storage area called the buffer pool for caching data and indexes in memory. (…) Ideally, you set the size of the buffer pool to as large a value as practical, leaving enough memory for other processes on the server to run without excessive paging. The larger the buffer pool, the more InnoDB acts like an in-memory database, reading data from disk once and then accessing the data from memory during subsequent reads.

In practice, however, we can rarely fit our whole dataset inside the InnoDB buffer pool so there must be a process to manage this limited pool of memory pages:

InnoDB manages the pool as a list, using a variation of the least recently used (LRU) algorithm. When room is needed to add a new block to the pool, InnoDB evicts the least recently used block and adds the new block to the middle of the list.

There you go, InnoDB employs a variation of the Least Recently Used algorithm called midpoint insertion strategy to manage the pages within the buffer pool. We should mention it does makes exceptions, such as during a full table scan, when it knows the loaded pages might end up being read only a single time.

Dumping and reloading the buffer pool

Before we can get to the main point of this article lets first examine why would you want to dump the buffer pool to disk, which is at the core of the matter here: that’s when those warning messages we’re discussing may appear.

When you start a MySQL server the buffer pool is empty by default. Performance is at it’s worse at this point because no data can be found in memory so in practice each request for data results in an I/O operation to retrieve the data in the disk and bring it to memory. With time the buffer pool gets filled and performance improves – more and more data can now be found in memory. With yet more time we reach a peek performance state: the buffer pool not only is full but it is filled with the most popular data. The time between the start of the server and reaching this optimum state in the buffer pool is called server warm up. How long it takes depends mostly on two things: the size of the buffer pool and the level of activity of the server – the less busy it is the less requests it will get and thus more time is needed until the popular data is fully loaded.

Now, there could be a shortcut: what if before we save the buffer pool on a disk file before we stop MySQL? We could later use it to reload the buffer pool to an optimum state when we restart the server, thus decreasing the warm up period dramatically.

Percona was a pioneer in this field related to other MySQL distributions and implemented this functionality in Percona Server 5.5. Later on, MySQL 5.6 was released with a similar functionality which also allowed preloading the buffer pool for a faster warm up. Percona Server 5.6 incorporates this upstream feature, effectively replacing its own implementation. However, while in Percona Server 5.5 we could periodically dump the buffer pool in MySQL and Percona Server 5.6 it is only dumped at shutdown or at request.

“Detected cycle in LRU”

In the section above we introduced a functionality that allows to dump a fingerprint of the buffer pool to disk so we can later reload it at server restart (note that even though the buffer pool might be very large the fingerprint will be small enough to make this practical). What we didn’t mention was that this is yet most useful outside of maintenance time and planned shutdows – that is, when the server crashes. When a crash happens it’s that more important to bring it back to a warm up state soon, so it can resume providing data fast enough. And giving we cannot predict a crash the only way we can arrange to have the latest buffer pool on disk is by flushing it often.

While the buffer pool is divided into pages for efficiency of high-volume read operations it is implemented as a linked list of pages, for efficiency of cache management. During the process of dumping the buffer pool to disk a mutex is acquired on the LRU list. However, this mutex is not hold for the duration of the process – it is periodically released to prevent stalling of the system. The problem is: in between the release of the mutex and the moment it is acquired again the list may get reshuffled. Since the dump keeps a pointer to its position across the mutex boundry, the dump can get put into some artificial cycling.

Lets consider a linked list:

A > B > C > D > E

where each letter corresponds to a memory page. Now lets say the initial dump was partially taken and covered the first three pages, “A > B > C”, placing a pointer on “C” before releasing the mutex. Once the mutex is reacquired the list has been reshuffled:  “A > C > B > D > E”. The resulting junction of the partial list we have already copied and the reshuffled list now includes a loop, which would incur in a cycle: “(A > B > C) > B > D > E”. When the dumping process detects a cycle on the LRU list it stops copying from the actual buffer pool, throws in a warning message, and moves on to the next buffer pool instance – otherwise it would keep dumping in an infinite loop.

How harmless are those messages ?

It is fairly harmless except for the fact you will only have a partial LRU list dump for that buffer pool instance – that is, until the next dump occurs. If the server crashes or is shutdown before the next dump takes place the existing one won’t be totally up to date for the server warm up to complete – it will still be used and will still provide a partially filled, somewhat “warm” buffer pool, just not as optimal as it could have been if the last dump had been taken fully.

The post Worrying about the ‘InnoDB: detected cycle in LRU for buffer pool (…)’ message? appeared first on MySQL Performance Blog.

Feb
20
2015
--

How to setup a PXC cluster with GTIDs (and have async slaves replicating from it!)

This past week was marked by a series of personal findings related to the use of Global Transaction IDs (GTIDs) on Galera-based clusters such as Percona XtraDB Cluster (PXC). The main one being the fact that transactions touching MyISAM tables (and FLUSH PRIVILEGES!) issued on a giving node of the cluster are recorded on a GTID set bearing the node’s server_uuid as “source id” and added to the binary log (if the node has binlog enabled), thus being replicated to any async replicas connected to it. However, they won’t be replicated across the cluster (that is, all of this is by design, if wsrep_replicate_myisam is disabled, which it is by default).

My colleague Stéphane covered this story in one of his recent blog posts titled, “Percona XtraDB Cluster 5.6: a tale of 2 GTIDs,” explaining how those local (in reference to the node) transactions are saved in a different GTID set to the cluster’s main one and the impact this may cause when re-pointing an async slave to replicate from a different node.

GTIDs is a feature introduced in MySQL 5.6 that made replication management much easier and considering there’s a series of advantages in having an async replica attached to a PXC cluster, why hasn’t this popped out earlier to either of us? I guess there aren’t so many people using GTIDs with Galera-based clusters around yet so here’s a post to show you how to do it.

Initializing a PXC cluster configured with GTIDs

My testing environment for a 3-node cluster is composed of node1 (192.168.70.2), node2 (.3) and node3 (.4). All of them have the same PXC binaries installed:

$ rpm -qa |grep -i percona-xtradb-cluster
Percona-XtraDB-Cluster-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el6.x86_64

and are configured with an almost identical /etc/my.cnf (apart from wsrep_node_address and the prompt line):

[mysql]
prompt='mysql {node1} > '
[mysqld]
datadir = /var/lib/mysql
log_warnings=2
server_id=1
log_bin=percona-bin
log_slave_updates
binlog_format = ROW
enforce_gtid_consistency=1
gtid_mode=on
wsrep_cluster_name = my-three-node-cluster
wsrep_cluster_address = gcomm://192.168.70.2,192.168.70.3,192.168.70.4
wsrep_node_address = 192.168.70.2
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sst:secret
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2

server_id, log_bin, log_slave_updates and binlog_format are not needed for the cluster to operate but they are required to setup regular replication later on so I’ve added those to let the nodes ready to operate as masters.

We start with an empty, inexisting datadir on all nodes so I’ll use the mysql_install_db script to create a base datadir with all that is needed for MySQL to work on node1, which will be the reference node of the cluster:

[node1]$  mysql_install_db --user=mysql

We’re now ready to bootstrap the cluster from this reference node:

[node1]$ service mysql bootstrap-pxc

With that, we have an operational reference node:

mysql [node1] > select @@server_uuid;
+--------------------------------------+
| @@server_uuid                        |
+--------------------------------------+
| 834bca7f-b45e-11e4-a7b5-0800272e951c |
+--------------------------------------+
1 row in set (0.00 sec)


Note from above my.cnf settings that I’ve chosen xtrabackup-v2 as the State Snapshot Transfer (SST) method, which requires authentication (wsrep_sst_auth). For this reason, if we now try to start MySQL on node2 it will fail with its error log showing:

2015-02-14 16:58:26 24149 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --addre
ss '192.168.70.3' --auth 'sst:secret' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '24149' --binlog 'percona-bin' : 1 (Operation not permitted)
2015-02-14 16:58:26 24149 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-02-14 16:58:26 24149 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2015-02-14 16:58:26 24149 [ERROR] Aborting


The problem here is that the SST method requires MySQL authentication: even though the credentials are shared on the wsrep_sst_auth variable of all node’s my.cnf configuration file this only tells xtrabackup-v2 to require it, it doesn’t actually configure MySQL with it. That’s a step left for us:

mysql [node1] > GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sst'@'localhost' IDENTIFIED BY 'secret';
Query OK, 0 rows affected (0.02 sec)


And this constitutes our very first commited transaction, which goes into the cluster’s GTID set:

mysql [node1] > select @@global.gtid_executed;
+----------------------------------------+
| @@global.gtid_executed                 |
+----------------------------------------+
| 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1 |
+----------------------------------------+
1 row in set (0.00 sec)


This GRANT only needs to be issued once, in the reference node. Now you start MySQL on node2 and node3: they will use XtraBackup to make a backup of node1’s dataset, which will contain this GRANT, and restore it in their own datadir:

[node2]$ service mysql start


and:

[node3]$ service mysql start

OK, that’s done. But how do I attach an async replica to the cluster?

Just to make it clear, you cannot attach an async replica to the cluster: you need to choose a node that is member of the cluster and make it the master for the async replica. Considering all nodes should have the same data replicated it should be easy to change the async replica’s configuration and make it replicate from any other node from the cluster – and it is, though you may need to do some adjustments (more on this later).

The initial procedure is exactly the same one used with regular replication: you start by taking a backup of the master and restoring it on the replica. We’ll use XtraBackup again to perform the backup and we’ll start by having node2 as the master of our async replica (192.168.70.7). We could stream the backup from node2 directly to the async replica and later “prepare” it (by applying the logs, which needs to be done using the same version of Percona XtraBackup that you’ve used to take the backup), but to make things simple we’ll first take the backup on node2:

[node2]$ innobackupex /tmp
(...)
innobackupex: Using server version 5.6.21-70.1-56-log
innobackupex: Created backup directory /tmp/2015-02-14_17-53-22
(...)
150214 17:53:26 innobackupex: completed OK!

then “prepare” it:

[node2]$ innobackupex --apply-log /tmp/2015-02-14_17-53-22
(...)
150214 17:56:10 innobackupex: Starting the apply-log operation
(...)
150214 17:56:14 innobackupex: completed OK!

and from our async replica we’ll copy the backup from node2 using rsync over ssh (you could use scp or any other mathod to copy the files from one server to the other):

[replica]$ rsync -av -e 'ssh -l root' 192.168.70.3:/tmp/2015-02-14_17-53-22/ /var/lib/mysql

We now need to change the ownership of those files to the ‘mysql‘ user:

[replica]$ chown mysql:mysql -R /var/lib/mysql

and take note of the “replication coordinates” – but related to GTID. We have those in the xtrabackup_info file:

[replica]$ cat /var/lib/mysql/xtrabackup_info
(...)
binlog_pos = GTID of the last change '7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1'
(...)

In our case we took the backup very early in this database’s lifecycle (seeing by the GTID # in there) but in practice that’s seldom the case. We can now start MySQL:

[replica]$ service MySQL start

Before we can point our async replica to replicate from node2 we need to create a replication user with the right set of privileges there:

mysql [node2] > GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO repl@'192.168.70.%' IDENTIFIED BY 'p4ssword';
Query OK, 0 rows affected (0.06 sec)

Now we can use CHANGE MASTER TO on our async replica to point it to node2 using the above credentials:

mysql [replica] > CHANGE MASTER TO MASTER_HOST='192.168.70.3', MASTER_USER='repl', MASTER_PASSWORD='p4ssword', MASTER_AUTO_POSITION = 1;
Query OK, 0 rows affected, 2 warnings (0.05 sec)

If we START SLAVE now we’ll run into error 1236: we need first to set the replication coordinates from when the backup was taken. With GTID replication, this is done in a different manner: instead of providing replicate coordinates in the CHANGE MASTER TO command (where we’ve used MASTER_AUTO_POSITION=1 instead) we do it by redefining the gtid_purged global variable with the GTID sets we got from the xtrabackup_info file (in this example there’s only one set):

mysql [replica] > SET @@global.gtid_purged='7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1';
Query OK, 0 rows affected (0.02 sec)

We can now START SLAVE and check its status:

mysql [replica] > START SLAVE; SHOW SLAVE STATUSG
Query OK, 0 rows affected (0.00 sec)
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.70.3
Master_User: repl
(...)
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
(...)
Master_Server_Id: 2
Master_UUID: 8a157d9c-b465-11e4-aafa-0800272e951c
(...)
Retrieved_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:2
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-2
Auto_Position: 1
1 row in set (0.00 sec)

Note that Executed_Gtid above is showing a second transaction in the cluster’s GTID set: this is related to the GRANT statement we issued on node2 to setup the replication account and it means it is now configured on all nodes members of the cluster (and this async replica as well). With that in place we can easily point our async replica to replicate from a different node, such as node3:

mysql [replica] > STOP SLAVE; CHANGE MASTER TO MASTER_HOST='192.168.70.4'; START SLAVE;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)

Let’s verify this is indeed the case:

mysql [replica] > SHOW SLAVE STATUSG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.70.4
Master_User: repl
(...)
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
(...)
Master_Server_Id: 3
Master_UUID: aa9acb85-b465-11e4-ab09-0800272e951c
(...)
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-2
Auto_Position: 1

Nice! What about the caveats you were talking about in the other blog post?

The problem arises when you start to have other GTID sets on the gtid_executed variable of the nodes. This happens when a DML transaction involving a MyISAM table is issued on the node while having wsrep_replicate_myisam disabled: instead of having that transaction going to the cluster’s main GTID set it will go to a new one, bearing the node’s server_uuid as source id. In fact, that’s the behavior you find on regular MySQL replication configured with GTIDs. Here’s an example:

mysql [node1] > CREATE TABLE test.fernando1 (id int) ENGINE=MYISAM;
Query OK, 0 rows affected (0.03 sec)

Creating a MyISAM table per si is not an issue, because this is a DDL, so the transaction will go to the cluster’s GTID set and replicated to the other nodes:

mysql [node1] > select @@global.gtid_executedG
*************************** 1. row ***************************
@@global.gtid_executed: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3
1 row in set (0.00 sec)

But doing an INSERT in such a table is a DML so the transaction will go to a different GTID set:

mysql [node1] > select @@global.gtid_executedG
*************************** 1. row ***************************
@@global.gtid_executed: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
834bca7f-b45e-11e4-a7b5-0800272e951c:1
1 row in set (0.00 sec)

Now if we move our async replica to node1 it might just works:

mysql [replica] > STOP SLAVE; CHANGE MASTER TO MASTER_HOST='192.168.70.2'; START SLAVE;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.02 sec)
Query OK, 0 rows affected (0.01 sec)
mysql [replica] > show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.70.2
(...)
Retrieved_Gtid_Set: 834bca7f-b45e-11e4-a7b5-0800272e951c:1
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
834bca7f-b45e-11e4-a7b5-0800272e951c:1
Auto_Position: 1
1 row in set (0.00 sec)

and that is because transaction ‘834bca7f-b45e-11e4-a7b5-0800272e951c:1could still be found on the binary logs of node1. Let’s do something slightly different now, but on node2:

mysql [node2] > INSERT INTO test.fernando1 values (2);
Query OK, 1 row affected (0.01 sec)
mysql [node2] > select @@global.gtid_executedG
*************************** 1. row ***************************
@@global.gtid_executed: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
8a157d9c-b465-11e4-aafa-0800272e951c:1
1 row in set (0.00 sec)
mysql [node2] > SHOW MASTER STATUSG
*************************** 1. row ***************************
File: percona-bin.000008
Position: 923
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
8a157d9c-b465-11e4-aafa-0800272e951c:1
1 row in set (0.00 sec)
mysql [node2] > FLUSH LOGS;
Query OK, 0 rows affected (0.01 sec)
mysql [node2] > SHOW MASTER STATUSG
*************************** 1. row ***************************
File: percona-bin.000009
Position: 231
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
8a157d9c-b465-11e4-aafa-0800272e951c:1
1 row in set (0.00 sec)
mysql [node2] > PURGE BINARY LOGS TO 'percona-bin.000009';
Query OK, 0 rows affected (0.01 sec)

Transaction ‘8a157d9c-b465-11e4-aafa-0800272e951c:1‘ that contains the INSERT statement we issued from node2 was recorded into binary log file percona-bin.000008 but the following PURGE BINARY LOGS TO command deleted that file. Now, if we point our async replica back to node2 we’ll run into a problem:

mysql [replica] > STOP SLAVE; CHANGE MASTER TO MASTER_HOST='192.168.70.3'; START SLAVE;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
mysql [replica] > show slave statusG
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.70.3
(...)
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
(...)
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
834bca7f-b45e-11e4-a7b5-0800272e951c:1
Auto_Position: 1
1 row in set (0.00 sec)

The master can no longer provide transaction ‘834bca7f-b45e-11e4-a7b5-0800272e951c:1‘ to the replica as requested thus breaking replication.

Ouch! Is there a fix for this?

Well, kind of. You can trick MySQL into believing it has processed this transaction by injecting an empty transaction bearing the same GTID of the missing one:

mysql [replica] > SELECT GTID_SUBTRACT('7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,8a157d9c-b465-11e4-aafa-0800272e951c:1', '7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,834bca7f-b45e-11e4-a7b5-0800272e951c:1')G
*************************** 1. row ***************************
GTID_SUBTRACT('7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,8a157d9c-b465-11e4-aafa-0800272e951c:1', '7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,834bca7f-b45e-11e4-a7b5-0800272e951c:1'): 8a157d9c-b465-11e4-aafa-0800272e951c:1
1 row in set (0.00 sec)
mysql [replica] > SET GTID_NEXT='8a157d9c-b465-11e4-aafa-0800272e951c:1';
Query OK, 0 rows affected (0.00 sec)
mysql [replica] > BEGIN;COMMIT;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql [replica] > SET gtid_next = 'AUTOMATIC';
Query OK, 0 rows affected (0.00 sec)
mysql [replica] > show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.70.3
(...)
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
(...)
Executed_Gtid_Set: 7cf02e4b-4ba1-ee1b-4599-0c821ea92393:1-3,
834bca7f-b45e-11e4-a7b5-0800272e951c:1,
8a157d9c-b465-11e4-aafa-0800272e951c:1
Auto_Position: 1
1 row in set (0.00 sec)

The problem is that you’ll end up with an inconsistent dataset – injecting an empty transaction is the same as using SQL_SLAVE_SKIP_COUNTER on regular MySQL replication.

Take-home lesson

It is not complicated to configure a PXC cluster to use GTIDs, in fact you may prefer to do so once you get used to it. And it makes that all much easier to have an asynchronous slave replicating from one of the nodes, even moving it around to replicate from a different node. It should all go well while the nodes record their transactions on the cluster’s main GTID set, which is shared by all nodes: this implies all of them will have the same transactions in their binlogs, as designed. The problem appears when we start seeing a deviation on this pattern, with a few nodes recording local transactions on their “own” GTID set. This won’t bother the cluster operation per si, as these other GTID sets are simply ignored and won’t be replicated to other nodes, but they may complicate things for attached async replicas if you need to point them to a different node.

In such environments, strive to maintain a unified GTID set around the cluster’s main one and if you find out that one of the nodes has started adding transactions to a different set, investigate it. You shouldn’t be using MyISAM tables on an XtraDB Cluster as those aren’t officially supported but if you must do them you should probably consider using wsrep_replicate_myisam. Operations on mysql system tables should be done through the use of DDLs instead of DMLs (like GRANT instead of INSERT) and, above all, keep distance from this bug.

The post How to setup a PXC cluster with GTIDs (and have async slaves replicating from it!) appeared first on MySQL Performance Blog.

Dec
08
2014
--

What happens when your application cannot open yet another connection to MySQL

Have you ever experienced a situation where one moment you can connect to the MySQL database and the next moment  you cannot, only to be able to connect again a second later? As you may know one cannot open infinite connections with MySQL. There’s a practical limit and more often than not it is imposed by the underlying operating system. If you’re getting:

ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.10' (99)

…there’s a good chance you’re hitting such limit. What might be misleading in the information above is whom (which side) is preventing the connection from being established.

Understanding the problem at hand

Whenever a client uses the network to connect to a service running on a given port of a server this connection is established through the creation of a socket:

A socket is a tuple of 4 terms: the source and destination IPs and ports.

The destination port is obviously the one where the service is running on the server. For instance usually port 22 for SSH and port 3306 for MySQL. The source port is an arbitrary local network port on the client side, which should show in tools like netstat and ss:

Note: I’ve used netstat and ss alternately in this post but if you read the man page for netstat these days you’ll see a note that says: “This program is obsolete. Replacement for netstat is ss”. I was advised to give preference to ss over netstat on a busy server: besides being faster and providing a more rich set of information, ss puts less stress on the server as it talks directly to the kernel while what netstat does is to scan /dev.

$ ss -na dport = :3306
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.0.100:48681 192.168.0.10:3306

The example above shows that server 192.168.0.100 has a TCP connection established with MySQL (3306) running on server 192.168.0.10 through local port 48681. The range of local ports that can be used for TCP and UDP traffic is found in /proc/sys/net/ipv4/ip_local_port_range:

$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768 61000

Those values from above are the default ones found in many Linux distributions: 32768 denotes the first local port that can be used and 61000 indicates the last one, for a total of 28233 available ports. It may look like a lot but it truly depends on the nature of the local applications connecting to services in other servers through the network.

When it comes to connecting to database servers, many applications chose to open a new connection for a single request only, closing it right after the request is processed. Even though the connection is closed by the client (application) the local port it was using is not immediately released by the OS to be reused by another connection: it will sit in a TIME_WAIT state for (usually) 60 seconds – this value cannot be easily changed as it is hard coded in the kernel:

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
                  * state, about 60 seconds    */

And contrary to what you may have heard or read, tunning /proc/sys/net/ipv4/tcp_fin_timeout is of no use here as it rules a different type of timeout and has no impact in releasing connections hanging in TIME_WAIT state. To better understand the role played by such state I suggest you read Vincent Bernat’s post, from which I reproduce:

There are two purposes for the TIME-WAIT state:

The most known one is to prevent delayed segments from one connection being accepted by a later connection relying on the same quadruplet (source address, source port, destination address, destination port) (…)

The other purpose is to ensure the remote end has closed the connection. When the last ACK is lost, the remote end stays in the LAST-ACK state. Without the TIME-WAIT state, a connection could be reopened while the remote end still thinks the previous connection is valid. (…)

The problem with this situation is that if you keep accumulating connections in TIME_WAIT state you’ll quickly saturate the available local ports. And if all ports are taken then any attempt for a new connection will result in an error similar to the one from above.

Reproducing the problem

It’s easy to verify this scenario, it suffices to decrease the local IP port range from the computer that’ll be starting the connections (usually the application server) to, for example, only half a dozen ports:

$ echo 61001 61006 > /proc/sys/net/ipv4/ip_local_port_range

Then we proceed with opening 6 connections from the application server (192.168.0.100) to the database server (192.168.0.10):

$ ss -na dport = :3306
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.0.100:61005 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61003 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61001 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61004 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61006 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61002 192.168.0.10:3306

Now, when we try to open a seventh connection to the database we’ll hit that error stated in the beginning of this post. What the error actually means is:

$ perror 99
OS error code 99: Cannot assign requested address

In fact, it should be complemented with “… because there’s no available local network ports left“.

And even if we close all 6 MySQL connections right away they’ll all move from ESTABLISHED to TIME_WAIT state and we’ll still need to wait for them to expire until we can open a new connection:

$ ss -na dport = :3306
State Recv-Q Send-Q Local Address:Port Peer Address:Port
TIME-WAIT 0 0 192.168.0.100:61005 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61003 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61001 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61004 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61006 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61002 192.168.0.10:3306

That’s where scalability problems happen if your application server keeps opening more connections than it can have old ones released in time. For example, considering the default port range from 32768 to 61000 and a TIME_WAIT of 60 seconds, in theory we can only open and close around 470 new network connections each second ((61000 – 32768 + 1)/60 = 470.55) before we saturate the available local network ports; that’s not much …

Possible Solutions

This is by no means an exhaustive list but here’s a few possible approaches to consider. If you have something to add about those or happen to known about any other please let me know by leaving a comment below.

Increasing port range

If the server initiating the connections is operating with the default port range you can start by increasing it somewhat. The first 1023 ports are said to be privileged, meaning only root can start an application listening to one of these initial ports. In the other extreme,  the highest port you can have assigned is 65535 (2^16-1). In practice, then, you can increase the port range to the maximum of 1024-65535, which would provide 64512 ports, allowing in theory around 1075 briefly connections being opened and closed per second:

$ sysctl -w net.ipv4.ip_local_port_range="1024 65535"

To make this change permanent and survive a server reboot you need to add the following line to /etc/sysctl.conf:

net.ipv4.ip_local_port_range=1024 65535

You should however pay attention when stretching these values to the limits.

Adding extra IP addresses and listening to multiple ports

Something that wasn’t clear to me at first is that the port range limitation is applied per quadruplet (<source address>:<source port>, <destination address>:<destination port>). As such, if you have port range set from 60001 to 60006 in the client you should be able to open no more than 6 MySQL connections from the same <address>:<port> pair to the same <address>:<port> pair, as well as 6 SSH connections,  6 NC connections, etc:

tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61005 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61003 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61005 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61006 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61001 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61003 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61006 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61001 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.11:3306 ESTABLISHED   <-- here!
tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED

Note in the list above there’s more than one connection established from the same source <address>:<port> pair, though binded to different destination <address>:<port> pairs. There’s even a 7th connection to MySQL, though to one running on a different server (192.168.0.11:3306).

So, another way of increasing the amount of concurrent connections is by adding an additional IP address to either the outgoing side (client) or to the incoming side (server) and making part of the connections happen through it. Likewise, even though you cannot make MySQL listen to multiple ports simultaneously you can configure your firewall to accept connections to other ports and redirect them to port 3306 (or any other you’re using). In the example below I configure iptables to redirect all connections destined to port 80 to port 3306:

$ iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 3306

Modifying the connection behavior of the application(s)

Expanding the number of possible quadruplets will certainly help in increasing the amount of possible concurrent connections but it won’t scale indefinitely. In the long term you may need to review your application configuration behavior and setup/re-engineer it in such a way to avoid it opening and closing many connections over the network too often – if that’s the problem after all. You may resort to some sort of connection pooling instead, but be sure to evaluate it well first.

Tweaking TCP parameter settings

Even though you cannot easily decrease the timeout for TIME_WAIT state there’s at least 3 different TCP parameters you can use to “bypass” this limitation. You should explore this options with caution though as these settings could affect the reliability of TCP connections.

tcp_tw_reuse

tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
              Allow to reuse TIME_WAIT sockets for new connections when it
              is safe from protocol viewpoint.  It should not be changed
              without advice/request of technical experts.

It is possible to force the kernel to reuse a connection hanging in TIME_WAIT state by setting /proc/sys/net/ipv4/tcp_tw_reuse to 1. What happens in practice is that you’ll keep seeing the closed connections hanging in TIME_WAIT until either they expire or a new connection is requested. In the later case, the connection will be “relived”.

Here’s an example from the previous scenario where I had limited the port range to only 6 ports. I enabled tcp_tw_reuse and opened 6 connections with MySQL, closing five of them soon afterwards:

$ netstat -tn|grep 3306
tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 TIME_WAIT

The TIME_WAIT was still in countdown mode (both ss and netstat have option -o/–timers to display those) when I opened a new connection:

$ netstat -ton|grep 3306
tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT timewait (35.07/0/0)
tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT timewait (34.52/0/0)
tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED keepalive (3586.59/0/0)
tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT timewait (33.91/0/0)
tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT timewait (35.65/0/0)
tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED keepalive (7196.66/0/0)

Note the new connection was established in lieu of the one hanging in TIME_WAIT state for longer (using local port 61003).

tcp_tw_recycle

tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
              Enable fast recycling of TIME_WAIT sockets.  Enabling this
              option is not recommended since this causes problems when
              working with NAT (Network Address Translation).

When you enable /proc/sys/net/ipv4/tcp_tw_recycle closed connections will not show under TIME_WAIT anymore – they disappear from netstat altogether. But as soon as you open a new connection (within the 60 seconds mark) it will recycle one of those. But everyone writing about this alternative seems to advise against it’s use. Bottom line is: it’s preferable to reuse a connection than to recycle it.

tcp_max_tw_buckets

tcp_max_tw_buckets (integer; default: see below; since Linux 2.4)
              The maximum number of sockets in TIME_WAIT state allowed in
              the system.  This limit exists only to prevent simple denial-
              of-service attacks.  The default value of NR_FILE*2 is
              adjusted depending on the memory in the system.  If this
              number is exceeded, the socket is closed and a warning is
              printed.

This parameter rules how many connections can remain in TIME_WAIT state concurrently: the kernel will
simply kill connections hanging in such state above that number. For example, continuing with the previous scenario where I had configured the client server with a port range composed of only 6 ports, if I set /proc/sys/net/ipv4/tcp_max_tw_buckets to 5, then open 6 concurrent connections with MySQL and then immediatelly close all 6 I’ll find only 5 of them hanging in the TIME_WAIT state – as with tcp_tw_recycle, one of then will simply disapear from netstat. This situation allows me to immediately open a new connection without needing to wait for a minute. However, I won’t be able to open a second one until one  of the other 5 connections in TIME_WAIT expire and freed the local port it was using. The secret here, then, is to find a compromise between the number of available network ports and the number of connections we allow to remain in TIME_WAIT state. The default value of this setting is 65536, which means by default the system allows all possible connections to go over the TIME_WAIT state when closed.

If “LAMP” server, use local socket

Finally, it’s important to mention that if your application is hosted in the same server as the database you may (should ?) open connections using MySQL’s socket file directly, without going over the network, and thus avoid this port range/TIME_WAIT problematic altogether. Interestingly, you can accomplish this in one of two ways: specifying the socket with the –socket option or using –host=localhost. As MySQL’s manual mentions:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file.

Clarification

I prefer to overstate this problem in favor of avoiding any confusion: the error described in this post and all the situations surrounding it is to be found in the computer that starts the connections (usually the application server), and not in the server holding the services. Whenever a client closes the connection, while it shows hanging in TIME_WAIT state there won’t be any remaining trace of the connection showing in netstat in the server side. However, if for some reason it is the server that “closes the connection first, it gets the TIME-WAIT state while the client will consider the corresponding quadruplet free and hence may reuse it for a new connection.

The post What happens when your application cannot open yet another connection to MySQL appeared first on MySQL Performance Blog.

Oct
21
2014
--

Percona XtraDB Cluster: How to run a 2-node cluster on a single server

I reckon there’s little sense in running 2 or more Percona XtraDB Cluster (PXC) nodes in a single physical server other than for educational and testing purposes – but doing so is still useful in those cases. The most popular way of achieving this seems to be with server virtualization, such as making use of Vagrant boxes. But in the same way you can have multiple instances of MySQL running in parallel on the OS level in the form of concurrent mysqld processes, so too can you have multiple Percona XtraDB Cluster nodes. And the way to achieve this is precisely the same: using dedicated datadirs and different ports for each node.

 

Which ports?

4 tcp ports are used by Pecona XtraDB Cluster:
  • the regular MySQL port (default 3306)
  • port for group (Galera) communication (default 4567)
  • port for State Transfer (default 4444)
  • port for Incremental State Transfer (default is: port for group communication (4567) + 1 = 4568)
Of course, when you have multiple instances in the same server default values won’t work for all of them so we need to define new ports  for the additional instances and make sure to have the local firewall open to them, if there is one active (iptables, selinux,…).

[{ loading … }]

Installing Percona XtraDB Cluster, configuring and starting the first node

My test server was a fresh CentOS 6.5 configured with Percona yum repository, from which I installed the latest Percona XtraDB Cluster (5.6.20-25.7.888.el6); note that you’ll need the EPEL repository as well to install socat, which is a dependency (see this bug). To avoid confusion, I’ve prevented the mysql service to start automatically:
chkconfig --level 3 mysql off
chkconfig --del mysql


I could have installed PXC from the tarball but I decided to do it from the repositories to have all dependencies covered by yum. This is how my initial /etc/my.cnf looked like (note the use of default values):

[mysqld]
datadir = /var/lib/mysql
port=3306
socket=/var/lib/mysql/mysql-node1.sock
pid-file=/var/lib/mysql/mysql-node1.pid
log-error=/var/lib/mysql/mysql-node1.err
binlog_format=ROW
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_cluster_name = singlebox
wsrep_node_name = node1
wsrep_cluster_address=gcomm://
I’ve started by manually bootsrapping the cluster with this single node with the command:
$ mysqld_safe --defaults-file=/etc/my.cnf --wsrep-new-cluster


You should then be able to access this node through the local socket:

$ mysql -S /var/lib/mysql/mysql-node1.sock

 

Configuring and starting the second node

Then I created a similar configuration configuration file for the second instance, which I named /etc/my2.cnf, with the following modifications:
[mysqld]
datadir = /var/lib/mysql2
port=3307
socket=/var/lib/mysql2/mysql-node2.sock
pid-file=/var/lib/mysql2/mysql-node2.pid
log-error=/var/lib/mysql2/mysql-node2.err
binlog_format=ROW
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_cluster_name = singlebox
wsrep_node_name = node2
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020
wsrep_provider_options = "base_port=5020;"


Note the use of base_port: by having it defined, port 5020 is used for group communication and 5021 (the one above it) is reserved for IST (it’s the same as using gmcast.listen_addr=tcp://127.0.0.1:5021, just simpler).

You need to create and setup the right permissions to the datadir on this second instance, otherwise MySQL won’t be able to create some files (like .pid and .err), though you don’t need to run the mysql_install_db script:
$ chown -R mysql:mysql /var/lib/mysql2
You can then start this second instance with the following command:
$ mysqld_safe --defaults-file=/etc/my2.cnf
While it starts, watch the log to observe how this second node starts, communicates with the primary node and join the cluster. On a different terminal from the one you’ve started the instance, execute:
$ tail -f /var/log/mysql2/mysql-node2.err
Remember that at any time you can use mysqladmin to stop the nodes, you only need to provide the right socket as argument, like follows:
$ mysqladmin -S /var/lib/mysql/mysql-node1.sock shutdown
Finally, once you have the whole cluster up you should edit the my.cnf of the first node with a complete wsrep_cluster_addres, as show in /etc/my2.cnf above.

 

Using mysqld_multi

My last blog post was on running multiple instances of MySQL with myslqd_multi. It applies here as well, the only exception is that you need to make sure to use “wsrep_cluster_address=gcomm://” in the first node whenever you bootstrap the cluster – and pay attention to start it before the other nodes.
The only advantage I see in using mysqld_multi is facilitating the management (start/stop) of the nodes and concentrating all configuration in a single my.cnf file. In any case, you shouldn’t be running a PXC cluster in a single box for any purpose other than educational.

 

Adding a second Percona XtraDB Cluster node to a production server

What if you have a production cluster composed of multiple physical servers and you want to add a second node to one of them? It works the same way – you’ll just need to use the server’s IP address when configuring it instead of the loopback network interface. Here’s an example of a PXC cluster composed initially by three nodes: 192.168.70.1, 192.168.70.2, and 192.168.70.3. I’ve added a 4th node running on the server that is already hosting the 3rd – the wsrep_cluster_address line looks like as follows after the changes:
wsrep_cluster_address = gcomm://192.168.70.1,192.168.70.2,192.168.70.3:4567,192.168.70.3:5020

 

Additional ressources

We have a documentation page on “How to setup 3 node cluster on single box” that contains more details of what I’ve covered above with a slightly different approach.

 

The post Percona XtraDB Cluster: How to run a 2-node cluster on a single server appeared first on MySQL Performance Blog.

Aug
29
2014
--

Galera data on Percona Cloud Tools (and other MySQL monitoring tools)

I was talking with a Percona Support customer earlier this week who was looking for Galera data on Percona Cloud Tools. (Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses.)

The customer mentioned they were already keeping track of some Galera stats on Cacti, and given they were inclined to use Percona Cloud Tools more and more, they wanted to know if it was already supporting Percona XtraDB Cluster. My answer was: “No, not yet: you can install agents in each node (the regular way in the first node, then manually on the other nodes… and when prompted say “No” to create MySQL user and provide the one you’re using already) and monitor them as autonomous MySQL servers – but the concept of cluster and specially the “Galera bits” has yet to be implemented there.

Except I was wrong.

By “concept of cluster” I mean supporting the notion of group instances, which should allow a single cluster-wide view for metrics and query reports, such as the slow queries (which are recorded locally on the node where the query was run and thus observed in a detached way). This still needs to be implemented indeed, but it’s on the roadmap.

The “Galera bits” I mentioned above are the various “wsrep_” status variables. In fact, those refer to the Write Set REPlication patches (based in the wsrep API), a set of hooks applied to the InnoDB/XtraDB storage engine and other components of MySQL that modifies the way replication works (to put it in a very simplified way), which in turn are used by the Galera library to provide a “generic Synchronous Multi-Master replication plugin for transactional applications.” A patched version of Percona Server together with the Galera libray compose the base of Percona XtraDB Cluster.

As I found out only now, Percona Cloud Tools does collect data from the various wsrep_ variables and it is available for use – it’s just not shown by default. A user only needs to add a dashboard/chart manually on PCT to see these metrics:

adding_wsrep_chart

Click on the picture to enlarge it

Now, I need to call that customer …

Monitoring the cluster

Since I’m touching this topic I thought it would be useful to include some additional information on monitoring a Galera (Percona XtraDB Cluster in particular) cluster, even though most of what I mention below has already been published in different posts here on the MySQL Performance Blog. There’s a succint documentation page bearing the same title of this section that indicates the main wsrep_ variables you should monitor to check the health status of the cluster and how well it’s coping with load along the time (performance). Remember you can get a grasp of the full set of variables at any time by issuing the following command from one (or each one) of the nodes:

mysql> SHOW GLOBAL STATUS LIKE "wsrep_%";

And for a broader and real time view of the wsrep_ status variables you can use Jay Janssen’s myq_gadgets toolkit, which he modified a couple of years ago to account for Galera.

There’s also a specific Galera-template available in our Percona Monitoring Plugins (PMP) package that you can use in your Cacti server. That would cover the “how well the cluster copes with load along the time,” providing historical graphing. And while there isn’t a Galera specific plugin for Nagios in PMP, Jay explains in another post how you can customize pmp-check-mysql-status to “check any status variable you like,” describing his approach to keep a cluster’s “health status” in check by setting alerts on specific stats, on a per node basis.

VividCortex recently added a set of templates for Galera in their product and you can also rely on Severalnines’ ClusterControl monitoring features to get that “global view” of your cluster that Percona Cloud Tools doesn’t yet provide. Even though ClusterControl is a complete solution for cluster deployment and management, focusing on the automation of the whole process, the monitoring part alone is particularly interesting as it encompasses cluster-wide information in a customized way, including the “Galera bits”. You may want to give it a try as the monitoring features are available in the Community version of the product (and if you’re a Percona customer with a Support contract covering Percona XtraDB Cluster, then you’re entitled to get support for it from us).

One thing I should note that differentiate the monitoring solutions from above is that while you can install Cacti, Nagios and ClusterControl as servers in your own infrastructure both Percona Cloud Tools and VividCortex are hosted, cloud-based services. Having said that, neither one nor the other upload sensitive data to the cloud and both provide options for query obfuscation.

Summary

Contrary to what I believed, Percona Cloud Tools does provide support for “Galera bits” (the wsrep_ status variables), even though it has yet to implement support for the notion of group instances, which will allow for cluster-wide view for metrics and query reports. You can also rely on the Galera template for Cacti provided by Percona Monitoring Plugins for historical graphing and some clever use of Nagios’ pmp-check-mysql-status for customized cluster alerts. VividCortex and ClusterControl also provide monitoring for Galera.

Percona Cloud Tools, now in free beta, is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. Sign up to request access to the beta today.  

The post Galera data on Percona Cloud Tools (and other MySQL monitoring tools) appeared first on MySQL Performance Blog.

Aug
26
2014
--

mysqld_multi: How to run multiple instances of MySQL

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client]
port		= 3306
socket		= /var/run/mysqld/mysqld.sock
user            = john
password        = p455w0rd
[mysqld]
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306
datadir		= /var/lib/mysql
[xtrabackup]
target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld]
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306
datadir		= /var/lib/mysql
[mysqld1]
user		= mysql
pid-file	= /var/run/mysqld/mysqld1.pid
socket		= /var/run/mysqld/mysqld1.sock
port		= 3307
datadir		= /data/mysql/mysql1
[mysqld7]
user		= mysql
pid-file	= /var/run/mysqld/mysqld7.pid
socket		= /var/run/mysqld/mysqld7.sock
port		= 3308
datadir		= /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass';
mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi]
mysqld     = /usr/bin/mysqld_safe
mysqladmin = /usr/bin/mysqladmin
user       = multi_admin
password   = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld0 is not running
MySQL (Percona Server) from group: mysqld1 is not running
MySQL (Percona Server) from group: mysqld7 is not running
$ mysqld_multi start
$ mysqld_multi report
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld0 is running
MySQL (Percona Server) from group: mysqld1 is running
MySQL (Percona Server) from group: mysqld7 is running
$ mysqld_multi stop 7,0
$ mysqld_multi report 7
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld7 is not running
$ mysqld_multi report
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld0 is not running
MySQL (Percona Server) from group: mysqld1 is running
MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr
bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0
$ cp -r /data/mysql/mysql1 /data/mysql/mysql5620
$ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620]
user            = mysql
pid-file        = /var/run/mysqld/mysqld5620.pid
socket          = /var/run/mysqld/mysqld5620.sock
port            = 3309
datadir         = /data/mysql/mysql5620
basedir         = /opt/mysql-5.6.20-linux-glibc2.5-x86_64
mysqld          = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.

Backups

The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.

Summary

Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

Aug
26
2014
--

mysqld_multi: How to run multiple instances of MySQL

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client]
port		= 3306
socket		= /var/run/mysqld/mysqld.sock
user            = john
password        = p455w0rd
[mysqld]
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306
datadir		= /var/lib/mysql
[xtrabackup]
target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld]
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306
datadir		= /var/lib/mysql
[mysqld1]
user		= mysql
pid-file	= /var/run/mysqld/mysqld1.pid
socket		= /var/run/mysqld/mysqld1.sock
port		= 3307
datadir		= /data/mysql/mysql1
[mysqld7]
user		= mysql
pid-file	= /var/run/mysqld/mysqld7.pid
socket		= /var/run/mysqld/mysqld7.sock
port		= 3308
datadir		= /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass';
mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi]
mysqld     = /usr/bin/mysqld_safe
mysqladmin = /usr/bin/mysqladmin
user       = multi_admin
password   = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld0 is not running
MySQL (Percona Server) from group: mysqld1 is not running
MySQL (Percona Server) from group: mysqld7 is not running
$ mysqld_multi start
$ mysqld_multi report
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld0 is running
MySQL (Percona Server) from group: mysqld1 is running
MySQL (Percona Server) from group: mysqld7 is running
$ mysqld_multi stop 7,0
$ mysqld_multi report 7
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld7 is not running
$ mysqld_multi report
Reporting MySQL (Percona Server) servers
MySQL (Percona Server) from group: mysqld0 is not running
MySQL (Percona Server) from group: mysqld1 is running
MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr
bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0
$ cp -r /data/mysql/mysql1 /data/mysql/mysql5620
$ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620]
user            = mysql
pid-file        = /var/run/mysqld/mysqld5620.pid
socket          = /var/run/mysqld/mysqld5620.sock
port            = 3309
datadir         = /data/mysql/mysql5620
basedir         = /opt/mysql-5.6.20-linux-glibc2.5-x86_64
mysqld          = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.

Backups

The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.

Summary

Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

Aug
21
2014
--

A closer look at the MySQL ibdata1 disk space issue and big tables

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.

There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.

One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size,  for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.

Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1″.

A big table scenario

The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.

But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 – it is enabled in the later versions.

If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).

What to do then ?

We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:

1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.

2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.

3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.

4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.

5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.

Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.

What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.

Conclusion

The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.

This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.

The post A closer look at the MySQL ibdata1 disk space issue and big tables appeared first on MySQL Performance Blog.

Jun
26
2014
--

Percona Server with TokuDB (beta): Installation, configuration

My previous post was an introduction to the TokuDB storage engine and aimed at explaining the basics of its design and how it differentiates from InnoDB/XtraDB. This post is all about motivating you to give it a try and have a look for yourself. Percona Server is not officially supporting TokuDB as of today, though the guys in the development team are working hard on this and the first GA release of Percona Server with TokuDB is looming on the horizon. However, there’s a beta version available now. For the installation tests in this post I’ve used the latest version of Percona Server alongside the accompanying TokuDB complement, which was published last week.

Installing Percona Server with TokuDB on a sandbox

One of the tools Percona Support Engineers really love is Giuseppe Maxia’s MySQL Sandbox. It allows us to setup a sandbox running a MySQL instance of our choice and makes executing multiple ones for testing purposes very easily. Whenever a customer reaches us with a problem happening on a particular version of MySQL or Percona Server that we can reproduce, we quickly spin off a new sandbox and test it ourselves, so it’s very handy. I’ll use one here to explore this beta version of Percona Server with TokuDB but if you prefer you can install it the regular way using a package from our apt experimental or yum testing repositories.

We start by downloading the tarballs from here: TokuDB’s plugin has been packaged in its own tarball, so there are two to download. Once you get them let’s decompress both and create a unified working directory, compressing it again to create a single tarball we’ll use as source to create our sandbox:

[nando@test01 ~]# tar zxvf Percona-Server-5.6.17-rel66.0-608.Linux.x86_64.tar.gz
[nando@test01 ~]# tar zxvf Percona-Server-5.6.17-rel66.0-608.TokuDB.Linux.x86_64.tar.gz
[nando@test01 ~]# tar cfa Percona-Server-5.6.17-rel66.0-608.Linux.x86_64.tar.gz Percona-Server-5.6.17-rel66.0-608.Linux.x86_64/

Before going ahead, verify if you have transparent huge pages enabled as TokuDB won’t run if it is set. See this documentation page for explanation on what this is and how to disable it on Ubuntu. In my CentOS test server it was defined in a slightly different place and I’ve done the following to temporarily disable it:

[nando@test01]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
[nando@test01]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

We’re now ready to create our sandbox. The following command should be enough (I’ve chosen to run Percona Server on port 5617, you can use any other available one):

[nando@test01 ~]# make_sandbox Percona-Server-5.6.17-rel66.0-608.Linux.x86_64.tar.gz -- --sandbox_directory=tokudb --sandbox_port=5617

If the creation process goes well you will see something like the following at the end:

.... sandbox server started
Your sandbox server was installed in $HOME/sandboxes/tokudb

You should now be able to access the MySQL console on the sandbox with the default credentials; if you cannot, verify the log-in $HOME/sandboxes/tokudb/data/msandbox.err:

[nando@test01 ~]# mysql --socket=/tmp/mysql_sandbox5617.sock -umsandbox -pmsandbox

Alternatively, you can make use of the “use” script located inside the sandbox directory, which employs the same credentials (configured in the client section of the configuration file my.sandbox.cnf):

[nando@test01 ~]# cd sandboxes/tokudb/
[nando@test01 tokudb]# ./use

First thing to check is if TokuDB is being listed as an available storage engine:

mysql> SHOW ENGINES;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                    | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| (...)              | (...)   | (...)                                                                      | (...)        | (...)| (...)      |
| TokuDB             | YES     | Tokutek TokuDB Storage Engine with Fractal Tree(tm) Technology             | YES          | YES  | YES        |
| InnoDB             | DEFAULT | Percona-XtraDB, Supports transactions, row-level locking, and foreign keys | YES          | YES  | YES        |
| (...)          | (...)       | (...)                                                                      | NO           | (...)| (...)      |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+

If that’s not the case, you may need to load the plugins manually – I had to do so in my sandbox; you may not need if you’re installing it from a package in a fresh setup:

mysql> INSTALL PLUGIN tokudb SONAME 'ha_tokudb.so';

TokuDB should now figure in the list of supported ENGINES but you still need to activate the related plugins:

mysql> INSTALL PLUGIN tokudb_file_map SONAME 'ha_tokudb.so';
mysql> INSTALL PLUGIN tokudb_fractal_tree_info SONAME 'ha_tokudb.so';
mysql> INSTALL PLUGIN tokudb_fractal_tree_block_map SONAME 'ha_tokudb.so';
mysql> INSTALL PLUGIN tokudb_trx SONAME 'ha_tokudb.so';
mysql> INSTALL PLUGIN tokudb_locks SONAME 'ha_tokudb.so';
mysql> INSTALL PLUGIN tokudb_lock_waits SONAME 'ha_tokudb.so';

Please note the INSTALL PLUGIN action results in permanent changes and thus is required only once. No modifications to MySQL’s configuration file are required to have those plugins load in subsequent server restarts.

Now you should see not only the main TokuDB plugin but also the add-ons to the INFORMATION SCHEMA:

mysql> SHOW PLUGINS;
+-------------------------------+----------+--------------------+--------------+---------+
| Name                          | Status   | Type               | Library      | License |
+-------------------------------+----------+--------------------+--------------+---------+
| (...)                         | (...)    | (...)              | (...)        | (...)   |
| TokuDB                        | ACTIVE   | STORAGE ENGINE     | ha_tokudb.so | GPL     |
| TokuDB_trx                    | ACTIVE   | INFORMATION SCHEMA | ha_tokudb.so | GPL     |
| TokuDB_locks                  | ACTIVE   | INFORMATION SCHEMA | ha_tokudb.so | GPL     |
| TokuDB_lock_waits             | ACTIVE   | INFORMATION SCHEMA | ha_tokudb.so | GPL     |
| TokuDB_file_map               | ACTIVE   | INFORMATION SCHEMA | ha_tokudb.so | GPL     |
| TokuDB_fractal_tree_info      | ACTIVE   | INFORMATION SCHEMA | ha_tokudb.so | GPL     |
| TokuDB_fractal_tree_block_map | ACTIVE   | INFORMATION SCHEMA | ha_tokudb.so | GPL     |
+-------------------------------+----------+--------------------+--------------+---------+

We are now ready to create our first TokuDB table – the only different thing to do here is to specify TokuDB as the storage engine to use:

mysql> CREATE TABLE test.Numbers (id INT PRIMARY KEY, number VARCHAR(20)) ENGINE=TokuDB;

Note some unfamiliar files lying in the datadir; the details surrounding those is certainly good material for future posts:

[nando@test01]# ls ~/sandboxes/tokudb/data
auto.cnf                   _test_Numbers_main_3_2_19.tokudb
ibdata1                    _test_Numbers_status_3_1_19.tokudb
ib_logfile0                tokudb.directory
ib_logfile1                tokudb.environment
log000000000000.tokulog25  __tokudb_lock_dont_delete_me_data
msandbox.err               __tokudb_lock_dont_delete_me_environment
mysql                      __tokudb_lock_dont_delete_me_logs
mysql_sandbox5617.pid      __tokudb_lock_dont_delete_me_recovery
performance_schema         __tokudb_lock_dont_delete_me_temp
tc.log                     tokudb.rollback
test

Configuration: what’s really important

As noted by Vadim long ago, “Tuning of TokuDB is much easier than InnoDB, there’re only a few parameters to change, and actually out-of-box things running pretty well“:

mysql> show variables like 'tokudb_%';
+---------------------------------+------------------+
| Variable_name                   | Value            |
+---------------------------------+------------------+
| tokudb_alter_print_error        | OFF              |
| tokudb_analyze_time             | 5                |
| tokudb_block_size               | 4194304          |
| tokudb_cache_size               | 522651648        |
| tokudb_check_jemalloc           | 1                |
| tokudb_checkpoint_lock          | OFF              |
| tokudb_checkpoint_on_flush_logs | OFF              |
| tokudb_checkpointing_period     | 60               |
| tokudb_cleaner_iterations       | 5                |
| tokudb_cleaner_period           | 1                |
| tokudb_commit_sync              | ON               |
| tokudb_create_index_online      | ON               |
| tokudb_data_dir                 |                  |
| tokudb_debug                    | 0                |
| tokudb_directio                 | OFF              |
| tokudb_disable_hot_alter        | OFF              |
| tokudb_disable_prefetching      | OFF              |
| tokudb_disable_slow_alter       | OFF              |
| tokudb_disable_slow_update      | OFF              |
| tokudb_disable_slow_upsert      | OFF              |
| tokudb_empty_scan               | rl               |
| tokudb_fs_reserve_percent       | 5                |
| tokudb_fsync_log_period         | 0                |
| tokudb_hide_default_row_format  | ON               |
| tokudb_init_flags               | 11403457         |
| tokudb_killed_time              | 4000             |
| tokudb_last_lock_timeout        |                  |
| tokudb_load_save_space          | ON               |
| tokudb_loader_memory_size       | 100000000        |
| tokudb_lock_timeout             | 4000             |
| tokudb_lock_timeout_debug       | 1                |
| tokudb_log_dir                  |                  |
| tokudb_max_lock_memory          | 65331456         |
| tokudb_pk_insert_mode           | 1                |
| tokudb_prelock_empty            | ON               |
| tokudb_read_block_size          | 65536            |
| tokudb_read_buf_size            | 131072           |
| tokudb_read_status_frequency    | 10000            |
| tokudb_row_format               | tokudb_zlib      |
| tokudb_tmp_dir                  |                  |
| tokudb_version                  | tokudb-7.1.7-rc7 |
| tokudb_write_status_frequency   | 1000             |
+---------------------------------+------------------+
42 rows in set (0.00 sec)

The most important of the tokudb_ variables is arguably tokudb_cache_size. The test server where I ran those tests (test01) have a little less than 1G of memory and as you can see above TokuDB is “reserving” half (50%) of them to itself. That’s the default behavior but, of course, you can change it. And you must do it if you are also going to have InnoDB tables on your server – you should not overcommit memory between InnoDB and TokuDB engines. Shlomi Noach wrote a good post explaining the main TokuDB-specific variables and what they do. It’s definitely a worth read.

I hope you have fun testing Percona Server with TokuDB! If you run into any problems worth reporting, please let us know.

The post Percona Server with TokuDB (beta): Installation, configuration appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com