Dec
19
2014
--

Store UUID in an optimized way

A few years ago Peter Zaitsev, in a post titled “ href="http://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/" >To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID

  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows: /> When the timestamp has the (60 bit) hexadecimal value: style="color: purple;">1d8 style="color: green;">eebc style="color: blue;">58e0a7d7. The following parts of the UUID are set:: style="color: blue;">58e0a7d7- style="color: green;">eebc- style="color: orange;">1 style="color: purple;">1d8-9669-0800200c9a66. The style="color: orange;">1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them. /> 58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66

Benchmarking

I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

    • Data Size /> Horizontal Axis – Number of inserts x 25,000 /> Vertical Axis – Data Size in MB /> href="http://www.percona.com/blog/wp-content/uploads/2015/12/11.png"> class="alignnone size-large wp-image-27404" src="http://www.percona.com/blog/wp-content/uploads/2015/12/11-1024x512.png" alt="Data Size" width="1024" height="512" /> /> The data size for UUID table is more than other two tables.
    • Index Size /> Horizontal axis – Number of inserts x 25,000 /> Vertical axis – Index Size in MB /> href="http://www.percona.com/blog/wp-content/uploads/2015/12/21.png"> class="alignnone size-large wp-image-27405" src="http://www.percona.com/blog/wp-content/uploads/2015/12/21-1024x523.png" alt="Index Size" width="1024" height="523" />
    • Total Size /> Horizontal Axis – Number of inserts x 25,000 /> Vertical Axis – Total Size in MB /> href="http://www.percona.com/blog/wp-content/uploads/2015/12/31.png"> class="alignnone size-large wp-image-27406" src="http://www.percona.com/blog/wp-content/uploads/2015/12/31-1024x476.png" alt="Total Size" width="1024" height="476" />
    • Time taken /> Horizontal axis – Number of inserts x 25,000 /> Vertical axis – Time Taken in seconds /> href="http://www.percona.com/blog/wp-content/uploads/2015/12/41.png"> class="alignnone size-large wp-image-27407" src="http://www.percona.com/blog/wp-content/uploads/2015/12/41-1024x597.png" alt="Time Taken" width="1024" height="597" />

For the table with UUID as PRIMARY KEY, you can notice that as the table grows big, the time taken to insert rows is increasing almost linearly. Whereas for other tables, the time taken is almost constant.

The size of UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time taken to insert rows and the size are almost same. But they may vary slightly based on the index structure.

root@localhost:~# ls -lhtr /media/data/test/ | grep ibd
-rw-rw---- 1 mysql mysql  13G Jul 24 15:53 events_uuid_ordered.ibd
-rw-rw---- 1 mysql mysql  20G Jul 25 02:27 events_uuid.ibd
-rw-rw---- 1 mysql mysql  15G Jul 25 07:59 events_int.ibd

Table Structure

#1 events_int
CREATE TABLE `events_int` ( 
`count` bigint(20) NOT NULL AUTO_INCREMENT, 
`id` binary(16) NOT NULL, 
`unit_id` binary(16) DEFAULT NULL, 
`event` int(11) DEFAULT NULL, 
`ref_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, 
`campaign_id` binary(16) COLLATE utf8_unicode_ci DEFAULT '', 
`unique_id` binary(16) COLLATE utf8_unicode_ci DEFAULT NULL, 
`user_agent` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL, 
`city` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL, 
`country` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL, 
`demand_partner_id` binary(16) DEFAULT NULL, 
`publisher_id` binary(16) DEFAULT NULL, 
`site_id` binary(16) DEFAULT NULL, 
`page_id` binary(16) DEFAULT NULL, 
`action_at` datetime DEFAULT NULL, 
`impression` smallint(6) DEFAULT NULL, 
`click` smallint(6) DEFAULT NULL, 
`sold_impression` smallint(6) DEFAULT NULL, 
`price` decimal(15,7) DEFAULT '0.0000000', 
`actioned_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', 
`unique_ads` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, 
`notification_url` text COLLATE utf8_unicode_ci, 
PRIMARY KEY (`count`), 
KEY `id` (`id`), 
KEY `index_events_on_actioned_at` (`actioned_at`), 
KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
#2 events_uuid
CREATE TABLE `events_uuid` ( 
`id` binary(16) NOT NULL, 
`unit_id` binary(16) DEFAULT NULL,
~
~
PRIMARY KEY (`id`), 
KEY `index_events_on_actioned_at` (`actioned_at`), 
KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
#3 events_uuid_ordered
CREATE TABLE `events_uuid_ordered` (  
`id` binary(16) NOT NULL,  
`unit_id` binary(16) DEFAULT NULL,
~
~
PRIMARY KEY (`id`),  
KEY `index_events_on_actioned_at` (`actioned_at`),  
KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Conclusions

 

    • Create function to rearrange UUID fields and use it
DELIMITER //
CREATE DEFINER=`root`@`localhost` FUNCTION `ordered_uuid`(uuid BINARY(36))
RETURNS binary(16) DETERMINISTIC
RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25)));
//
DELIMITER ;

Inserts

INSERT INTO events_uuid_ordered VALUES (ordered_uuid(uuid()),'1','M',....);

Selects

SELECT HEX(uuid),is_active,... FROM events_uuid_ordered ;

    • Define UUID as binary(16) as binary does not have any character set

 

References

      • rel="nofollow" href="http://dev.mysql.com/doc/refman/5.1/en/miscellaneous-functions.html#function_uuid" rel="nofollow">http://dev.mysql.com/doc/refman/5.1/en/miscellaneous-functions.html#function_uuid
      • rel="nofollow" href="http://www.famkruithof.net/guid-uuid-timebased.html" rel="nofollow">http://www.famkruithof.net/guid-uuid-timebased.html
      • href="http://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/">http://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/
      • rel="nofollow" href="http://blog.codinghorror.com/primary-keys-ids-versus-guids/" rel="nofollow">http://blog.codinghorror.com/primary-keys-ids-versus-guids/

 

The post rel="nofollow" href="http://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/">Store UUID in an optimized way appeared first on rel="nofollow" href="http://www.percona.com/blog">MySQL Performance Blog.

Dec
18
2014
--

Mobile Commerce Consolidation As Mozido Takes Majority Stake In CorFire

corfire Just days after Dutch digital commerce provider Adyen announced a $250 million round of funding, another heavily capitalised player in the space has made an acquisition to up its game. Mozido, a mobile commerce specialist that raised $185 million in October 2014 after raising $103 million in May, has taken a majority stake in CorFire, another mobile commerce startup that focuses on… Read More

Dec
18
2014
--

Making HAProxy 1.5 replication lag aware in MySQL

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, href="http://www.percona.com/blog/2012/06/20/percona-xtradb-cluster-reference-architecture-with-haproxy/" >explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for  href="http://www.percona.com/software/percona-xtradb-cluster" >Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

href="http://www.percona.com/blog/wp-content/uploads/2014/12/Making-HAProxy-1.5-replication-lag-aware-in-MySQL.jpg"> class="alignright size-full wp-image-27625" src="http://www.percona.com/blog/wp-content/uploads/2014/12/Making-HAProxy-1.5-replication-lag-aware-in-MySQL.jpg" alt="Making HAProxy 1.5 replication lag aware in MySQL" width="200" height="150" />HAProxy 1.5 allows us to run an rel="nofollow" href="http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#5.2-agent-check" rel="nofollow">agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php
= 10 && $lag < 60){
		return "up 50%";
	}
	else
		return "up 5%";
}
set_time_limit(0);
$socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr);
if (!$socket) {
	echo "$errstr ($errno)
n";
} else {
	while ($conn = stream_socket_accept($socket,9999999999999)) {
		$cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '";
		exec("$cmd",$lag);
		$weight = set_weight($lag[0]);
		unset($lag);
		fputs ($conn, $weight);
		fclose ($conn);
	}
	fclose($socket);
}
?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
up 100%
Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front
bind *:3307
mode tcp
option tcplog
log global
default_backend read_only-back
backend read_only-back
mode tcp
balance leastconn
server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions
server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000  rise 1  fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# HAProxy
$ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,100
read_only-back,slave2,UP,100
read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 25
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,50
read_only-back,slave2,UP,100
read_only-back,BACKEND,UP,150

Slave2 down

# Slave1
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: 0
# Slave2
$ mysql -Ee "show slave status" | grep Seconds_Behind_Master
        Seconds_Behind_Master: NULL
# echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19
# pxname,svname,status,weight
read_only-front,FRONTEND,OPEN,
read_only-back,slave1,UP,100
read_only-back,slave2,DOWN (agent),100
read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s href="http://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html" >pt-heartbeat instead of Seconds_Behind_Master.

The post rel="nofollow" href="http://www.percona.com/blog/2014/12/18/making-haproxy-1-5-replication-lag-aware-in-mysql/">Making HAProxy 1.5 replication lag aware in MySQL appeared first on rel="nofollow" href="http://www.percona.com/blog">MySQL Performance Blog.

Dec
18
2014
--

Skytap Raises $35M For Its Cloud-Based Enterprise Development And Testing Service

2014-12-17_1712 Skytap is a service that aims to help dev and test teams in the enterprise work more efficiently (and test their code more often). The company today announced that it has raised a $35 million funding round led by Insight Venture Partners, with participation from all of the company’s previous investors, including OpenView Venture Partners, Ignition Partners, Madrona Venture Group,… Read More

Dec
17
2014
--

Windows 10 Hits 1.5M Testers, With 450K Using The OS “Day To Day”

windows10 Microsoft’s Windows 10 has around 450,000 users on a daily basis, according to the software company. A new blog post from the Windows team highlighted the statistic, also noting that it has collected 1.5 million testers for its preview operating system. That number was 1 million on October 13. So, the pace of new signups to test Windows 10 has declined — it attracted its… Read More

Dec
17
2014
--

Sinch Goes Global With SMS API For Developers

Screenshot 2014-12-17 11.43.57 Sinch, the newly spun out Rebtel company, has launched its SMS API globally, offering the ability for developers to add SMS integration to their app or website in 105 different countries across the globe. The product was initially developed under the Rebtel umbrella before Sinch took $12 million in funding to spin out the business this May. Since then, the company has test launched the SMS… Read More

Dec
17
2014
--

InnoDB’s multi-versioning handling can be Achilles’ heel

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password  --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: sbtest1
         type: index
possible_keys: NULL
          key: k_1
      key_len: 4
          ref: NULL
         rows: 953860873
        Extra: Using index
1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to look absolutely bizarre:

[207850s] threads: 64, tps: 0.20, reads: 7.40, writes: 0.80, response time: 222481.28ms (95%), errors: 0.70, reconnects:  0.00
[207860s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207870s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207880s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207890s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207900s] threads: 64, tps: 2.70, reads: 47.60, writes: 11.60, response time: 268815.49ms (95%), errors: 0.00, reconnects:  0.00
[207910s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207920s] threads: 64, tps: 2.30, reads: 31.60, writes: 9.50, response time: 294954.28ms (95%), errors: 0.00, reconnects:  0.00
[207930s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[207940s] threads: 64, tps: 2.90, reads: 42.00, writes: 12.20, response time: 309332.04ms (95%), errors: 0.00, reconnects:  0.00
[207950s] threads: 64, tps: 0.20, reads: 4.60, writes: 1.00, response time: 318922.41ms (95%), errors: 0.40, reconnects:  0.00
[207960s] threads: 64, tps: 0.20, reads: 1.90, writes: 0.50, response time: 335170.09ms (95%), errors: 0.00, reconnects:  0.00
[207970s] threads: 64, tps: 0.60, reads: 13.20, writes: 2.60, response time: 292842.88ms (95%), errors: 0.60, reconnects:  0.00
[207980s] threads: 64, tps: 2.60, reads: 37.60, writes: 10.20, response time: 351613.43ms (95%), errors: 0.00, reconnects:  0.00
[207990s] threads: 64, tps: 5.60, reads: 78.70, writes: 22.10, response time: 186407.21ms (95%), errors: 0.00, reconnects:  0.00
[208000s] threads: 64, tps: 8.10, reads: 120.20, writes: 32.60, response time: 99179.05ms (95%), errors: 0.00, reconnects:  0.00
[208010s] threads: 64, tps: 19.50, reads: 280.50, writes: 78.90, response time: 27559.69ms (95%), errors: 0.00, reconnects:  0.00
[208020s] threads: 64, tps: 50.70, reads: 691.28, writes: 200.70, response time: 5214.43ms (95%), errors: 0.00, reconnects:  0.00
[208030s] threads: 64, tps: 77.40, reads: 1099.72, writes: 311.31, response time: 2600.66ms (95%), errors: 0.00, reconnects:  0.00
[208040s] threads: 64, tps: 328.20, reads: 4595.40, writes: 1313.40, response time: 911.36ms (95%), errors: 0.00, reconnects:  0.00
[208050s] threads: 64, tps: 538.20, reads: 7531.90, writes: 2152.10, response time: 484.46ms (95%), errors: 0.00, reconnects:  0.00
[208060s] threads: 64, tps: 350.70, reads: 4913.45, writes: 1404.09, response time: 619.42ms (95%), errors: 0.00, reconnects:  0.00
[208070s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208080s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208090s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208100s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208110s] threads: 64, tps: 1.60, reads: 24.20, writes: 6.80, response time: 42385.40ms (95%), errors: 0.10, reconnects:  0.00
[208120s] threads: 64, tps: 0.80, reads: 28.20, writes: 3.40, response time: 51381.54ms (95%), errors: 2.80, reconnects:  0.00
[208130s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208140s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208150s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.00
[208160s] threads: 64, tps: 0.60, reads: 14.20, writes: 2.40, response time: 93248.04ms (95%), errors: 0.80, reconnects:  0.00

As you can see we have long stretches of times when there are no queries completed at all… going to some spikes of higher performance. This is how it looks on the graph:

class="alignnone size-full wp-image-27343 " src="http://www.percona.com/blog/wp-content/uploads/2014/12/img_548215986329c.png" alt="" />

Corresponding CPU usage:

class="alignnone size-full wp-image-27344 " src="http://www.percona.com/blog/wp-content/uploads/2014/12/img_548215de923ab.png" alt="" />

This shows what we are not only observing something we would expect with InnoDB design but also there seems to be some serve starvation happening in this case which we can confirm:

Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790809552640 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4125
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790804735744 has waited at row0sel.cc line 3506 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4125
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790810756864 has waited at row0sel.cc line 4125 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file row0sel.cc line 4125
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357
--Thread 139790811158272 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore:
S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock'
a writer (thread id 139790814770944) has reserved it in mode  wait exclusive

Waiting for the given buffer pool block to become available for more than 3 minutes is a big issue – this lock should never be held by more than a few microseconds.

SHOW PROCESSLIST confirms even most basic selects by primary key can get stalled for long time

|  5499 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5500 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5501 | root          | localhost | sbtest             | Query   |    185 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5502 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5503 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 ORDER BY c                                |         0 |             0 |
|  5504 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5505 | root          | localhost | sbtest             | Query   |     14 | updating     | UPDATE sbtest1 SET k=k+1 WHERE id=1                                                                  |         0 |             0 |
|  5506 | root          | localhost | sbtest             | Query   |    236 | updating     | DELETE FROM sbtest1 WHERE id=1                                                                       |         0 |             0 |
|  5507 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5508 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99                                                    |         0 |             0 |
|  5509 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=1                                                                     |         0 |             0 |
|  5510 | root          | localhost | sbtest             | Query   |     14 | updating     | UPDATE sbtest1 SET c='80873149502-45132831358-41942500598-17481512835-07042367094-39557981480-593123 |         0 |             0 |
|  5511 | root          | localhost | sbtest             | Query   |    236 | updating     | UPDATE sbtest1 SET k=k+1 WHERE id=18                                                                 |         0 |             1 |
|  5512 | root          | localhost | sbtest             | Query   |     14 | statistics   | SELECT c FROM sbtest1 WHERE id=7                                                                     |         0 |             0 |
|  6009 | root          | localhost | sbtest             | Query   | 195527 | Sending data | select avg(k) from sbtest1                                                                           |         0 |             0 |

How do I know it is UNDO space related issue in this case? Because it ends up taking majority of buffer pool

mysql> select page_type,count(*) from INNODB_BUFFER_PAGE group by page_type;
+-------------------+----------+
| page_type         | count(*) |
+-------------------+----------+
| EXTENT_DESCRIPTOR |        1 |
| FILE_SPACE_HEADER |        1 |
| IBUF_BITMAP       |      559 |
| IBUF_INDEX        |      855 |
| INDEX             |     2186 |
| INODE             |        1 |
| SYSTEM            |      128 |
| UNDO_LOG          |   382969 |
| UNKNOWN           |     6508 |
+-------------------+----------+
9 rows in set (1.04 sec)

And it does so in a very crazy way – when there is almost no work being done UNDO_LOG contents of the buffer pool is growing very rapidly while when we’re getting some work done the INDEX type pages take a lot more space. To me this seems like as the index scan is going it touches some hot rows and some not-so-hot ones, containing less row versions and so does not put much pressure on “undo space.”

Take Away: Now you might argue that this given workload and situation is rather artificial and rather narrow. It well might be. My main point here is what if you’re looking at just part of your workload, such as your main short application queries, and not taking reporting or backups into account “because their performance is not important.” In this case you might be in for a big surprise. Those background activities might be taking much more than you would expect, and in addition, they might have much more of a severe impact to your main application workload, like in this case above.

P.S: I’ve done more experiments to validate how bad the problem really is and I can repeat it rather easily even without putting system into overdrive. Even if I run sysbench injecting just 25% of the transactions the system is possibly capable of handling at peak I have “select avg(k) from sbtest1″ query on 1 billion row table to never complete as it looks like the new entries are injected into the index at this point faster than Innodb can examine which of them are visible.

The post rel="nofollow" href="http://www.percona.com/blog/2014/12/17/innodbs-multi-versioning-handling-can-be-achilles-heel/">InnoDB’s multi-versioning handling can be Achilles’ heel appeared first on rel="nofollow" href="http://www.percona.com/blog">MySQL Performance Blog.

Dec
17
2014
--

Salesforce Will Integrate With External File Storage Providers, With Microsoft Up First

screen-shot-2014-05-29-at-1-32-27-pm Salesforce, a SaaS CRM provider, announced a new product this morning that will connect external file storage solutions to its cloud-based products. In English, that means that if you store your digital goods on other services, but use them inside of Salesforce’s various tools, you’ll soon have a far simpler way to do so. The product, called ‘Files Connect,’ is… Read More

Dec
17
2014
--

RapidMiner Adds Streams To Bring Real Time To Its Big Data Processing

Man's hand drawing big data venn diagram with volume, velocity and variety circles overlapping. RapidMiner announced their new Streams service today to enable customers to capture streams of data and process it in real time. RapidMiner president Michele Chambers explained this could be particularly useful when it comes to capturing and processing Internet of Things or industrial sensor data and getting back answers in seconds. The new Streams product pulls the data from your source… Read More

Dec
16
2014
--

Aiming To Be A ‘B2B Google’ For Business Intel, Mattermark Raises $6.5M Led By Foundry Group

big data visualization Mattermark, the business intelligence site that pivoted out of YC-backed curation site Referly, is today announcing that it has raised a round of $6.5 million, funding that the startup will use the funding to build out its ambition to be a “B2B Google.” Read More