Dec
19
2014
--

Store UUID in an optimized way

A few years ago Peter Zaitsev, in a post titled “To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID

  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7eebc11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66

Benchmarking

I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

    • Data Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Data Size in MB
      Data Size
      The data size for UUID table is more than other two tables.
    • Index Size
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Index Size in MB
      Index Size
    • Total Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Total Size in MB
      Total Size
    • Time taken
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Time Taken in seconds
      Time Taken

For the table with UUID as PRIMARY KEY, you can notice that as the table grows big, the time taken to insert rows is increasing almost linearly. Whereas for other tables, the time taken is almost constant.

The size of UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time taken to insert rows and the size are almost same. But they may vary slightly based on the index structure.

root@localhost:~# ls -lhtr /media/data/test/ | grep ibd
-rw-rw---- 1 mysql mysql  13G Jul 24 15:53 events_uuid_ordered.ibd
-rw-rw---- 1 mysql mysql  20G Jul 25 02:27 events_uuid.ibd
-rw-rw---- 1 mysql mysql  15G Jul 25 07:59 events_int.ibd

Table Structure

#1 events_int
CREATE TABLE `events_int` ( 
`count` bigint(20) NOT NULL AUTO_INCREMENT, 
`id` binary(16) NOT NULL, 
`unit_id` binary(16) DEFAULT NULL, 
`event` int(11) DEFAULT NULL, 
`ref_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, 
`campaign_id` binary(16) COLLATE utf8_unicode_ci DEFAULT '', 
`unique_id` binary(16) COLLATE utf8_unicode_ci DEFAULT NULL, 
`user_agent` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL, 
`city` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL, 
`country` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL, 
`demand_partner_id` binary(16) DEFAULT NULL, 
`publisher_id` binary(16) DEFAULT NULL, 
`site_id` binary(16) DEFAULT NULL, 
`page_id` binary(16) DEFAULT NULL, 
`action_at` datetime DEFAULT NULL, 
`impression` smallint(6) DEFAULT NULL, 
`click` smallint(6) DEFAULT NULL, 
`sold_impression` smallint(6) DEFAULT NULL, 
`price` decimal(15,7) DEFAULT '0.0000000', 
`actioned_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', 
`unique_ads` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, 
`notification_url` text COLLATE utf8_unicode_ci, 
PRIMARY KEY (`count`), 
KEY `id` (`id`), 
KEY `index_events_on_actioned_at` (`actioned_at`), 
KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
#2 events_uuid
CREATE TABLE `events_uuid` ( 
`id` binary(16) NOT NULL, 
`unit_id` binary(16) DEFAULT NULL,
~
~
PRIMARY KEY (`id`), 
KEY `index_events_on_actioned_at` (`actioned_at`), 
KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
#3 events_uuid_ordered
CREATE TABLE `events_uuid_ordered` (  
`id` binary(16) NOT NULL,  
`unit_id` binary(16) DEFAULT NULL,
~
~
PRIMARY KEY (`id`),  
KEY `index_events_on_actioned_at` (`actioned_at`),  
KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Conclusions

 

    • Create function to rearrange UUID fields and use it
DELIMITER //
CREATE DEFINER=`root`@`localhost` FUNCTION `ordered_uuid`(uuid BINARY(36))
RETURNS binary(16) DETERMINISTIC
RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25)));
//
DELIMITER ;

Inserts

INSERT INTO events_uuid_ordered VALUES (ordered_uuid(uuid()),'1','M',....);

Selects

SELECT HEX(uuid),is_active,... FROM events_uuid_ordered ;

    • Define UUID as binary(16) as binary does not have any character set

 

References

 

The post Store UUID in an optimized way appeared first on MySQL Performance Blog.

Sep
09
2014
--

Multi-Valued INSERTs, AUTO_INCREMENT & Percona XtraDB Cluster

A common migration path from standalone MySQL/Percona Server to a Percona XtraDB Cluster (PXC) environment involves some measure of time where one node in the new cluster has been configured as a slave of the production master that the cluster is slated to replace. In this way, the new cluster acts as a slave of the production environment – traditional replication takes care of getting the data into the cluster, and then Galera replication handles the intra-cluster traffic. This often works without issue, although there is one case that I’ve encountered recently where special care must be taken to properly configure the stream to ensure that replication does not break. If you use multi-valued inserts with auto-increment columns, then this post is for you.

For purposes of our discussion, assume that we have a basic 3-node PXC cluster that we’ve set up using the PXC Reference Architecture document, and that we’re replicating from an asynchronous master (call it “server A”) into one of the PXC nodes. Without loss of generality, we’ll pick PXC03. Also, for purposes of our discussion, we’ll be working with the following table definition:

serverA> show create table auto_inc_test;
CREATE TABLE `auto_inc_test` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `stuff` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
serverA> SELECT * FROM auto_inc_test;
Empty set (0.00 sec)

If we insert rows into this table one at a time, we have no issues.

serverA> INSERT INTO auto_inc_test(stuff) VALUES ('first row');
serverA> INSERT INTO auto_inc_test(stuff) VALUES ('second row');
serverA> INSERT INTO auto_inc_test(stuff) VALUES ('third row');
serverA> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 3 | third row  |
+---+------------+
PXC03> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 3 | third row  |
+---+------------+

But if we start doing multi-valued inserts, we can run into a problem.

serverA> INSERT INTO auto_inc_test(stuff) VALUES('first row'),('second row'),('third row');
serverA> INSERT INTO auto_inc_test(stuff) VALUES('fourth row'),('fifth row');
serverA> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 3 | third row  |
| 4 | fourth row |
| 5 | fifth row  |
+---+------------+
PXC03> SELECT * FROM auto_inc_test;
+---+------------+
| i | stuff      |
+---+------------+
| 1 | first row  |
| 2 | second row |
| 5 | third row  |
+---+------------+
PXC03> SHOW SLAVE STATUS;
... output elided ...
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO auto_inc_test (stuff) VALUES ('fourth row'),('fifth row')'
... output elided ...

Uh oh. Replication is broken and our data is now inconsistent. So why does this happen and how can we prevent it?

binlog_format

The astute observer will note that I have not yet said anything about the binary log format on the master. If the binary log format on the master is already set to ROW, then the above error will not occur. RBR will properly replicate multi-valued INSERTs to the PXC cluster without issue, and the data will be consistent. Problem solved. However, there may be reasons that the master is not using or cannot use RBR, such as disk space or IOPS limitations, and thus it’s running in MIXED or STATEMENT mode. In that case, we need to look elsewhere….

wsrep_auto_increment_control

When set to ON (the default), this variable has the effect of automatically specifying values for auto_increment_increment and auto_increment_offset based on the cluster size. The idea behind it is to help prevent auto-increment value conflicts when writing to multiple nodes. However, what it also means is that in a multi-node cluster, the auto-increment values generated by any given node will never be consecutive and the “next” auto-increment value on the slave cluster node will always be higher than what the master believes it should be. For example:

serverA> INSERT INTO auto_inc_test (stuff) VALUES ('first row'),('second row'),('third row');
serverA> SHOW CREATE TABLE auto_inc_test;
CREATE TABLE `auto_inc_test` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `stuff` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
PXC03> SHOW CREATE TABLE auto_inc_test;
CREATE TABLE `auto_inc_test` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `stuff` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1

Hmm, that’s not good.

The above scenario can be avoided by setting wsrep_auto_increment_control to OFF on the node that’s acting as a slave while the cluster is still receiving asynchronous replication traffic. This can be configured in /etc/my.cnf or set dynamically from the MySQL command line with:

SET GLOBAL wsrep_auto_increment_control='OFF';

In the testing that I’ve done, this appears to be sufficient to deal with the problem, even if the original master has non-standard values specified for the auto_increment_* variables (e.g., because it’s part of a master-master pair where the odd ID numbers are generated by one server and the even ID numbers from the other one).

In fact, if the cluster is always going to be used in single-writer mode, there’s a compelling argument to be made for setting this variable to OFF on all of the cluster nodes even when going into production: it will prevent the cluster from burning through N auto-increment IDs every time a single row is inserted.

Fixing it when it’s broken

There’s no secret magic here. Percona Toolkit’s pt-table-checksum and pt-table-sync can be used to check and repair the data divergence between the master and the slave cluster node, just as if PXC were not involved, although you may find that it’s just faster to rebuild the cluster, depending upon how many diffs are found. Sometimes this issue happens right away when the master-to-cluster replication is started, and fixing it involves just one or two rows; other times I have seen it not occur for days with lots of rows out of sync once it finally breaks.

The tl;dr version

In sum, if you’re doing traditional replication into a PXC cluster (or any other flavor of MySQL/MariaDB + Galera), you may have issues with multi-valued INSERT statements, and this can be prevented with a configuration change on either side of the replication stream:

  • On the master, set binlog_format=ROW
  • On the PXC slave node, set wsrep_auto_increment_control=OFF

The post Multi-Valued INSERTs, AUTO_INCREMENT & Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Oct
26
2010
--

Sharing an auto_increment value across multiple MySQL tables (revisited)

A couple of weeks ago I blogged about Sharing an auto_increment value across multiple MySQL tables. In the comments, a few people wrote in to suggest alternative ways of implementing this.  I just got around to benchmarking those alternatives today across two large EC2 machines:


(Measured in transactions/second – higher is better)

What is the conclusion?  With the exception of my original option2, they actually all perform fairly similar.  The Flickr and Option1 tests perform marginally better.  Test “arjen2″ is option2, but with a MyISAM table — it suffers a little because EC2 can be a little high for latency, and there’s one additional round trip.  Test arjen2005 is not too dissimilar from the Flickr solution, but uses a MySQL stored function.

Full Disclosure.


Entry posted by Morgan Tocker |
7 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Oct
04
2010
--

Sharing an auto_increment value across multiple MySQL tables

The title is SEO bait – you can’t do it. We’ve seen a few recurring patterns trying to achieve similar – and I thought I would share with you my favorite two:

Option #1: Use a table to insert into, and grab the insert_id:

CREATE TABLE option1 (id int not null primary key auto_increment) engine=innodb;

# each insert does one operations to get the value:
INSERT INTO option1 VALUES (NULL);
# $connection->insert_id();

Option #2: Use a table with one just row:

CREATE TABLE option2 (id int not null primary key) engine=innodb;
INSERT INTO option2 VALUES (1); # start from 1

# each insert does two operations to get the value:
UPDATE option2 SET id=@id:=id+1;
SELECT @id;

So which is better? I don’t think it’s that easy to tell at a first glance, since option 2 does look more elegant – but if the next value is fetched as part of a transaction – I can see a potential for many other transactions to back up waiting on a lock (more on that in a second).

To start with a naive test, I booted two EC2 small instances in the same availability zone. Ping times are ~0.5ms between nodes. –skip-name resolve is enabled on the server. There is some skew from the machine being virtualized. My simulation is:

  1. Create a new sequence value
  2. insert into another table

The testing options are:

  • Option 1 – auto_commit each statement
  • Option 2 – auto_commit each statement
  • Option 1 – nest each loop in a transaction
  • Option 2 – nest each loop in a transaction

The raw results are:

option1 usetransactions =19 seconds for x10000 iterations.
option1 ignoretransactions = 13 seconds for x10000 iterations.
option2 usetransactions = 27 seconds for x10000 iterations.
option2 ignoretransactions =22 seconds for x10000 iterations.

Alright – option1 seems quicker. The problem is that to be like most applications, we can’t really tell until a little concurrency is applied. Using only the “transactional” test in a few more concurrency options:


(results are transactions/second – more is better).

Option #1 starts to flatten out after 4 threads – and this is probably just because the machine I am running it against has 1 CPU. Option #2 stays flat the whole time.. and while we are running it, most of the threads are perpetually in a state of ‘Searching rows for update’ – which is what I suspect is better described as waiting on a lock.

Option #2 will likely scale better in auto_commit, since locks are held for the duration of a transaction, but this is not always possible to do if you have already started modifying data before you need an auto_increment number but you do not want to commit yet.

Full disclosure


Entry posted by Morgan Tocker |
8 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com