Jul
10
2015
--

How to create a rock-solid MySQL database backup & recovery strategy

Percona MySQL and MongoDB WebinarsHave you ever wondered what could happen if your MySQL database goes down?

Although it’s evident such a crash will cause downtime – and surely some business impact in terms of revenue – can you do something to reduce this impact?

The simple answer is “yes” by doing regular backups (of course) but are you 100% sure that your current backup strategy will really come through when an outage occurs? And how much precious time will pass (and how much revenue will be lost) before you get your business back online?

I usually think of backups as the step after HA fails. Let’s say we’re in M<>M replication and something occurs that kills the db but the HA can’t save the day. Let’s pretend that the UPS fails and those servers are completely out. You can’t failover; you have to restore data. Backups are a key piece of “Business Continuity.” Also factor in the frequent need to restore data that’s been altered by mistake. No ‘WHERE’ clause or DROP TABLE in prod instead of DEV. These instances are where backups are invaluable.

Let’s take some time and discuss the possible backup strategies with MySQL…  how to make backups efficiently and also examine the different tools that are available. We’ll cover these topics and more during my July 15  webinar: “Creating a Best-in-Class Backup and Recovery System for Your MySQL Environment” starting at 10 a.m. Pacific time.

On a related note, did you know that most online backups are possible with mysqldump and you can save some space on backups by using simple Linux tools? I’ll also cover this so be sure to join me next Wednesday. Oh, and it’s a free webinar, too!

Stay tuned!

The post How to create a rock-solid MySQL database backup & recovery strategy appeared first on MySQL Performance Blog.

Dec
08
2014
--

What happens when your application cannot open yet another connection to MySQL

Have you ever experienced a situation where one moment you can connect to the MySQL database and the next moment  you cannot, only to be able to connect again a second later? As you may know one cannot open infinite connections with MySQL. There’s a practical limit and more often than not it is imposed by the underlying operating system. If you’re getting:

ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.10' (99)

…there’s a good chance you’re hitting such limit. What might be misleading in the information above is whom (which side) is preventing the connection from being established.

Understanding the problem at hand

Whenever a client uses the network to connect to a service running on a given port of a server this connection is established through the creation of a socket:

A socket is a tuple of 4 terms: the source and destination IPs and ports.

The destination port is obviously the one where the service is running on the server. For instance usually port 22 for SSH and port 3306 for MySQL. The source port is an arbitrary local network port on the client side, which should show in tools like netstat and ss:

Note: I’ve used netstat and ss alternately in this post but if you read the man page for netstat these days you’ll see a note that says: “This program is obsolete. Replacement for netstat is ss”. I was advised to give preference to ss over netstat on a busy server: besides being faster and providing a more rich set of information, ss puts less stress on the server as it talks directly to the kernel while what netstat does is to scan /dev.

$ ss -na dport = :3306
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.0.100:48681 192.168.0.10:3306

The example above shows that server 192.168.0.100 has a TCP connection established with MySQL (3306) running on server 192.168.0.10 through local port 48681. The range of local ports that can be used for TCP and UDP traffic is found in /proc/sys/net/ipv4/ip_local_port_range:

$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768 61000

Those values from above are the default ones found in many Linux distributions: 32768 denotes the first local port that can be used and 61000 indicates the last one, for a total of 28233 available ports. It may look like a lot but it truly depends on the nature of the local applications connecting to services in other servers through the network.

When it comes to connecting to database servers, many applications chose to open a new connection for a single request only, closing it right after the request is processed. Even though the connection is closed by the client (application) the local port it was using is not immediately released by the OS to be reused by another connection: it will sit in a TIME_WAIT state for (usually) 60 seconds – this value cannot be easily changed as it is hard coded in the kernel:

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
                  * state, about 60 seconds    */

And contrary to what you may have heard or read, tunning /proc/sys/net/ipv4/tcp_fin_timeout is of no use here as it rules a different type of timeout and has no impact in releasing connections hanging in TIME_WAIT state. To better understand the role played by such state I suggest you read Vincent Bernat’s post, from which I reproduce:

There are two purposes for the TIME-WAIT state:

The most known one is to prevent delayed segments from one connection being accepted by a later connection relying on the same quadruplet (source address, source port, destination address, destination port) (…)

The other purpose is to ensure the remote end has closed the connection. When the last ACK is lost, the remote end stays in the LAST-ACK state. Without the TIME-WAIT state, a connection could be reopened while the remote end still thinks the previous connection is valid. (…)

The problem with this situation is that if you keep accumulating connections in TIME_WAIT state you’ll quickly saturate the available local ports. And if all ports are taken then any attempt for a new connection will result in an error similar to the one from above.

Reproducing the problem

It’s easy to verify this scenario, it suffices to decrease the local IP port range from the computer that’ll be starting the connections (usually the application server) to, for example, only half a dozen ports:

$ echo 61001 61006 > /proc/sys/net/ipv4/ip_local_port_range

Then we proceed with opening 6 connections from the application server (192.168.0.100) to the database server (192.168.0.10):

$ ss -na dport = :3306
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.0.100:61005 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61003 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61001 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61004 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61006 192.168.0.10:3306
ESTAB 0 0 192.168.0.100:61002 192.168.0.10:3306

Now, when we try to open a seventh connection to the database we’ll hit that error stated in the beginning of this post. What the error actually means is:

$ perror 99
OS error code 99: Cannot assign requested address

In fact, it should be complemented with “… because there’s no available local network ports left“.

And even if we close all 6 MySQL connections right away they’ll all move from ESTABLISHED to TIME_WAIT state and we’ll still need to wait for them to expire until we can open a new connection:

$ ss -na dport = :3306
State Recv-Q Send-Q Local Address:Port Peer Address:Port
TIME-WAIT 0 0 192.168.0.100:61005 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61003 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61001 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61004 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61006 192.168.0.10:3306
TIME-WAIT 0 0 192.168.0.100:61002 192.168.0.10:3306

That’s where scalability problems happen if your application server keeps opening more connections than it can have old ones released in time. For example, considering the default port range from 32768 to 61000 and a TIME_WAIT of 60 seconds, in theory we can only open and close around 470 new network connections each second ((61000 – 32768 + 1)/60 = 470.55) before we saturate the available local network ports; that’s not much …

Possible Solutions

This is by no means an exhaustive list but here’s a few possible approaches to consider. If you have something to add about those or happen to known about any other please let me know by leaving a comment below.

Increasing port range

If the server initiating the connections is operating with the default port range you can start by increasing it somewhat. The first 1023 ports are said to be privileged, meaning only root can start an application listening to one of these initial ports. In the other extreme,  the highest port you can have assigned is 65535 (2^16-1). In practice, then, you can increase the port range to the maximum of 1024-65535, which would provide 64512 ports, allowing in theory around 1075 briefly connections being opened and closed per second:

$ sysctl -w net.ipv4.ip_local_port_range="1024 65535"

To make this change permanent and survive a server reboot you need to add the following line to /etc/sysctl.conf:

net.ipv4.ip_local_port_range=1024 65535

You should however pay attention when stretching these values to the limits.

Adding extra IP addresses and listening to multiple ports

Something that wasn’t clear to me at first is that the port range limitation is applied per quadruplet (<source address>:<source port>, <destination address>:<destination port>). As such, if you have port range set from 60001 to 60006 in the client you should be able to open no more than 6 MySQL connections from the same <address>:<port> pair to the same <address>:<port> pair, as well as 6 SSH connections,  6 NC connections, etc:

tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61005 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61003 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61005 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61006 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61001 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61003 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61006 192.168.0.10:9999 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61001 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.11:3306 ESTABLISHED   <-- here!
tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61002 192.168.0.10:22 ESTABLISHED
tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED

Note in the list above there’s more than one connection established from the same source <address>:<port> pair, though binded to different destination <address>:<port> pairs. There’s even a 7th connection to MySQL, though to one running on a different server (192.168.0.11:3306).

So, another way of increasing the amount of concurrent connections is by adding an additional IP address to either the outgoing side (client) or to the incoming side (server) and making part of the connections happen through it. Likewise, even though you cannot make MySQL listen to multiple ports simultaneously you can configure your firewall to accept connections to other ports and redirect them to port 3306 (or any other you’re using). In the example below I configure iptables to redirect all connections destined to port 80 to port 3306:

$ iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 3306

Modifying the connection behavior of the application(s)

Expanding the number of possible quadruplets will certainly help in increasing the amount of possible concurrent connections but it won’t scale indefinitely. In the long term you may need to review your application configuration behavior and setup/re-engineer it in such a way to avoid it opening and closing many connections over the network too often – if that’s the problem after all. You may resort to some sort of connection pooling instead, but be sure to evaluate it well first.

Tweaking TCP parameter settings

Even though you cannot easily decrease the timeout for TIME_WAIT state there’s at least 3 different TCP parameters you can use to “bypass” this limitation. You should explore this options with caution though as these settings could affect the reliability of TCP connections.

tcp_tw_reuse

tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
              Allow to reuse TIME_WAIT sockets for new connections when it
              is safe from protocol viewpoint.  It should not be changed
              without advice/request of technical experts.

It is possible to force the kernel to reuse a connection hanging in TIME_WAIT state by setting /proc/sys/net/ipv4/tcp_tw_reuse to 1. What happens in practice is that you’ll keep seeing the closed connections hanging in TIME_WAIT until either they expire or a new connection is requested. In the later case, the connection will be “relived”.

Here’s an example from the previous scenario where I had limited the port range to only 6 ports. I enabled tcp_tw_reuse and opened 6 connections with MySQL, closing five of them soon afterwards:

$ netstat -tn|grep 3306
tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED
tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT
tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 TIME_WAIT

The TIME_WAIT was still in countdown mode (both ss and netstat have option -o/–timers to display those) when I opened a new connection:

$ netstat -ton|grep 3306
tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT timewait (35.07/0/0)
tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT timewait (34.52/0/0)
tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED keepalive (3586.59/0/0)
tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT timewait (33.91/0/0)
tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT timewait (35.65/0/0)
tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED keepalive (7196.66/0/0)

Note the new connection was established in lieu of the one hanging in TIME_WAIT state for longer (using local port 61003).

tcp_tw_recycle

tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
              Enable fast recycling of TIME_WAIT sockets.  Enabling this
              option is not recommended since this causes problems when
              working with NAT (Network Address Translation).

When you enable /proc/sys/net/ipv4/tcp_tw_recycle closed connections will not show under TIME_WAIT anymore – they disappear from netstat altogether. But as soon as you open a new connection (within the 60 seconds mark) it will recycle one of those. But everyone writing about this alternative seems to advise against it’s use. Bottom line is: it’s preferable to reuse a connection than to recycle it.

tcp_max_tw_buckets

tcp_max_tw_buckets (integer; default: see below; since Linux 2.4)
              The maximum number of sockets in TIME_WAIT state allowed in
              the system.  This limit exists only to prevent simple denial-
              of-service attacks.  The default value of NR_FILE*2 is
              adjusted depending on the memory in the system.  If this
              number is exceeded, the socket is closed and a warning is
              printed.

This parameter rules how many connections can remain in TIME_WAIT state concurrently: the kernel will
simply kill connections hanging in such state above that number. For example, continuing with the previous scenario where I had configured the client server with a port range composed of only 6 ports, if I set /proc/sys/net/ipv4/tcp_max_tw_buckets to 5, then open 6 concurrent connections with MySQL and then immediatelly close all 6 I’ll find only 5 of them hanging in the TIME_WAIT state – as with tcp_tw_recycle, one of then will simply disapear from netstat. This situation allows me to immediately open a new connection without needing to wait for a minute. However, I won’t be able to open a second one until one  of the other 5 connections in TIME_WAIT expire and freed the local port it was using. The secret here, then, is to find a compromise between the number of available network ports and the number of connections we allow to remain in TIME_WAIT state. The default value of this setting is 65536, which means by default the system allows all possible connections to go over the TIME_WAIT state when closed.

If “LAMP” server, use local socket

Finally, it’s important to mention that if your application is hosted in the same server as the database you may (should ?) open connections using MySQL’s socket file directly, without going over the network, and thus avoid this port range/TIME_WAIT problematic altogether. Interestingly, you can accomplish this in one of two ways: specifying the socket with the –socket option or using –host=localhost. As MySQL’s manual mentions:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file.

Clarification

I prefer to overstate this problem in favor of avoiding any confusion: the error described in this post and all the situations surrounding it is to be found in the computer that starts the connections (usually the application server), and not in the server holding the services. Whenever a client closes the connection, while it shows hanging in TIME_WAIT state there won’t be any remaining trace of the connection showing in netstat in the server side. However, if for some reason it is the server that “closes the connection first, it gets the TIME-WAIT state while the client will consider the corresponding quadruplet free and hence may reuse it for a new connection.

The post What happens when your application cannot open yet another connection to MySQL appeared first on MySQL Performance Blog.

May
14
2014
--

Tips on benchmarking Go + MySQL

We just released, as an open source release, our new percona-agent (https://github.com/percona/percona-agent), the agent to work with Percona Cloud Tools. This agent is written in Go.

I will give a webinar titled “Monitoring All MySQL Metrics with Percona Cloud Tools” on June 25 that will cover the new features in percona-agent and Percona Cloud Tools, where I will also explain how it works. You are welcome to register now and join me.

There will be more posts about percona-agent, but in the meantime I want to dedicate this one to Go, Go with MySQL and some performance topics.

I have had an interest in the Go programming language for a long time, but in initial versions I did not quite like the performance of the gorountine scheduler. See my report from more than two years ago on runtime: reduce scheduling contention for large $GOMAXPROCS.

Supposedly this performance issue was fixed in Go 1.1, so this is a good time to revisit my benchmark experiment.

A simple run of prime or fibonachi numbers calculation in N threas is quite boring, so I am going to run queries against Percona Server. Of course it adds some complication as there are more moving parts (i.e. go scheduler, go sql driver, MySQL by itself), but it just makes the experiment more interesting.

Source code of my benchmark: Go-pk-bench:
This is probably not the best example of how to code in Go, but that was not the point of this exercise. This post is really about some tips to take into account when writing an application in Go using a MySQL (Percona Server) database.

So, first, we will need a MySQL driver for Go. The one I used two years ago (https://github.com/Philio/GoMySQL) is quite outdated. It seems the most popular choice today is Go-MySQL-Driver, and this is the one we use for internal development. This driver is based on the standard Go “database/sql” package. This package kind of provides a standard Go-way to deal with SQL-like databases. “database/sql” seems to work out OK, with some questionable design decisions as for my taste. So using “database/sql” and Go-MySQL-Driver you will need to deal with some quirks like almost unmanageable connection pool.

The first thing you should take into account it is a proper setting of
runtime.GOMAXPROCS().

If you do not do that, Go scheduler will use the default, which is 1. That binary will use one and only 1 CPU (so much for a modern concurrent language).

The command runtime.GOMAXPROCS(runtime.NumCPU())
will prescribe to use all available CPUs. Always remember to use this if you care about multi-threaded performance.

The next problem I faced in the benchmark is that when I ran queries in a loop, i.e. to repeat as much possible…

rows, err := db.Query("select k from sbtest"+strconv.Itoa(tab+1)+" where id = "+strconv.Itoa(i))

… very soon we ran out of TCP ports. Apparently “database/sql” and Go-MySQL-Driver and its smart connection pool creates a NEW CONNECTION for each query. I can explain why this happens, but using the following statement:

'db.SetMaxIdleConns(10000)'

helps (I hope somebody with “database/sql” knowledge will explain what it is doing).

So after these adjustments we now can run the benchmark, which by query you see is quite simple – run primary key lookups against Percona Server which we know scales perfectly in this scenario (I used sysbench to create 64 tables 1mln rows each, all this fits into memory). I am going to run this benchmark with 1, 2, 4, 8, 16, 24, 32, 48, 64 user threads.

Below you can see graphs for MySQL Throughput and CPU Usage (both graph are built using new metrics graphing in Percona Cloud Tools)

MySQL Throughput (user threads are increasing from 1 to 64)
mysql-go

CPU Usage (user threads are increasing from 1 to 64)
Cpu-go

I would say the result scales quite nicely, at least it is really much better than it was two years ago. It is interesting to compare with something, so there is a graph from an identical run, but now I will use sysbench + lua for main workload driver.

MySQL Throughput (sysbench, user threads are increasing from 1 to 64)
mysql-sysbench

CPU Usage (sysbench, user threads are increasing from 1 to 64)
cpu-sysbench

From the graphs (this is what I like them for), we can clearly see increases in User CPU utilization (and actually we are able to use CPUs on 100% in user+system usage) and it clearly corresponds to increased throughput.

And if you are a fan of raw numbers:

MySQL Throughput, q/s (more is better)
Threads  | Go-MySQL | sysbench
1	|  13,189	|  16,765
2	|  26,837	|  33,534
4	|  52,629	|  65,943
8	|  95,553	| 116,953
16	| 146,979	| 182,781
24	| 169,739	| 231,895
32	| 181,334	| 245,939
48	| 198,238	| 250,497
64	| 207,732	| 251,972

(one with a knowledge of Universal Scalability Law can draw a prediction till 1000 threads, I leave it as a homework)

So, in conclusion, I can say that Go+MySQL is able to show decent results, but it is still not as effective as plan raw C (sysbench), as it seems it spends some extra CPU time in system calls.

If you want to try these new graphs in Percona Cloud Tools and see how it works with your system – join the free beta!

The post Tips on benchmarking Go + MySQL appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com