Q&A: High availability when using MySQL in the cloud

Percona MySQL webinar followup: Q&ALast week I hosted a webinar on using MySQL in the cloud for High Availability (HA) alongside 451 Research analyst Jason Stamper. You can watch the recording and also download the slides (free) here. Just click the “Register” button at the end of that page.

We had several excellent questions and we didn’t have time to get to several of them in the allotted time. I’m posting them here along with the answers. Feel free to ask follow-up questions in the comments below.

Q: Can the TokuDB engine be used in a PXC environment?

A: No, TokuDB cannot currently be used in a PXC environment, the only supported engine in Percona XtraDB Cluster 5.6 is InnoDB.

Q: With Galera replication (PXC), is balancing the load on each node?

A: No, you need to implement your own load balancing and HA layer between your clients and the Percona XtraDB Cluster server.  Examples mentioned in the webinar include HAProxy and F5 BigIP.

Q: What’s the best version of Percona XtraDB Cluster regarding InnoDB performance?

A: In general for best performance you should be using the latest release of Percona XtraDB Cluster 5.6, which is currently 5.6.24, released on June 3rd, 2015.

Q: Can I redirect my writes in Percona XtraDB Cluster to multiple nodes using the HAProxy? While trying with SysBench I can see write-only goes to first nodes in PXC while reads does goes to multiple nodes.

A: Yes you can configure HAProxy to distribute both reads and writes across all of your nodes in a Percona XtraDB Cluster environment. Perhaps SysBench created only one database connection for all writes, and so haproxy kept those confined to only one host. You may want to experiment with parallel_prepare.lua.

Q: What’s the optimal HA for small datasets (db is less than 10gb)?

A: The optimal HA deployment for small datasets would be dependent on your level of recovery required (tolerance for loss of transactions) and time that you can be in an unavailable state (seconds, minutes, hours?).  Unfortunately there isn’t a single answer to your question, however, if you are interested in further discussion on this point Percona would be happy to coordinate a time to speak.  Please feel free to contact me directly and we can continue the conversation at michael.coburn@percona.com.

 Q: Is there a concept of local master vs. remote master with PXC?

A: No there is no concept of local vs remote master.  All nodes in a Percona XtraDB Cluster can now be classified as Master, regardless of their proximity to the clients.

Q: Are there any concerns when considering AWS RDS or AURORA DB for MySQL HA in the Cloud?

A: Regarding AWS RDS, yes this a good option for MySQL HA in the Cloud.  I unfortunately haven’t worked with Aurora DB that much yet so I don’t have an opinion on it’s suitability for HA in the Cloud.

Q: We tried out PXC awhile back and it used to lock everything whenever any ddl was done. Has that changed?

A: We would have to look at the specifics of your environment, however, there have been numerous improvements in the 1½ years of development since Percona XtraDB Cluster went Generally Available (GA) on January 30th, 2014 in version 5.6.15.

Q: Is using the arbitrator a must?

A: No the arbitrator role via the garbd daemon is generally only used when operating in a minimal environment of two nodes that contain the data and you need a third node for quorum but don’t want to store the data a third time.

Q: Can we do a cluster across different zones?

A: Yes you can. However be aware that the latency incurred for all cluster certification operations will be impacted by the round trip time between nodes.

Q: Does PXC also support the MyISAM database?

A: No, Percona XtraDB Cluster does not support any storage engine other than InnoDB as of PXC 5.6.

Q: How do load balancers affect the throughput in a Galera-based setup given that the write would be limited by the slowest node?

A: Load balancers will introduce some measure of additional latency in the form of CPU time in the load balancer layer as it evaluates its own ruleset, and also in network time due to additional hop via load balancer.  Otherwise there should be no perceptible difference in the write throughput of a Percona XtraDB Cluster with and without a load balancer as it relates to the “slowest node” factor.

Q: Have you used MaxScale yet? If so, what are your thoughts?

A: Unfortunately I haven’t used MaxScale however Yves Trudeau, Percona Principal Architect, has recently written about MaxScale in this blog post.

Q: How do you configure timeout and maintain persistent connection to HAProxy?

A: I would encourage you to refer to the HAProxy Documentation.

The post Q&A: High availability when using MySQL in the cloud appeared first on MySQL Performance Blog.


High availability using MySQL in the cloud

Percona MySQL webinarsNext Wednesday (June 10) I’ll be co-presenting a webinar on using MySQL in the cloud for High Availability (HA). Joining me will be 451 Research analyst Jason Stamper and together we’ll talk about the realities of HA using MySQL in the cloud and how vendors are responding to changing application requirements with new developments that can enhance your deployment.

We’ll also present a comparison of available solutions along with key best practices you can follow for successfully attaining HA in the cloud with MySQL. The webinar is scheduled for June 10 at 10 a.m. Pacific. Register here.

Together we’ll cover:

  • What do HA MySQL deployments in the cloud look like today?
  • What are the developing requirements of applications based on future growth and scalability needs?
  • How are key vendors responding to these needs with new features and solution offerings, including those from OpenStack, Amazon, and others?
  • A high level, more technical comparison of the solutions
  • Keys to a successful HA MySQL deployment, including scaling from a single-node application to a cluster of MySQL instances

At the end of this webinar, you will have a good understanding of the options available for High Availability using MySQL in the cloud and how your current HA MySQL deployment in the cloud compares. You’ll also learn the tradeoffs you face depending on your HA solution and be able to identify which vendors and technologies are best suited for your needs.

This webinar, as usual, is free. Register now to reserve your spot and I hope to see you next Wednesday!

The post High availability using MySQL in the cloud appeared first on MySQL Performance Blog.


Illustrating Primary Key models in InnoDB and their impact on disk usage

On a recent engagement I worked with a customer who makes extensive use of UUID() values for their Primary Key and stores it as char(36), and their row count on this example table has grown to over 1 billion rows.

The table is INSERT-only (no UPDATEs or DELETEs), and the bulk of their retrieval are PK lookups. Lookups by PK were performing acceptably, but they were concerned with the space usage by the table as we were approaching 1TB (running with innodb_file_per_table=1 and Percona Server 5.5).

This schema model presents an increasing burden for backups since they use Percona XtraBackup, and so the question was asked: does their choice of an effectively random Primary Key based on UUID() impact their on-disk storage, and to what extent? And as a neat trick I show towards the end of this post how you can calculate the rate of fragmentation in your table on a regular basis if you’re so inclined. So read on!

For background, the more common approach for a Primary Key in InnoDB is one that uses an integer AUTO_INCREMENT value. One of the benefits of a PK AUTO_INCREMENT is that it allows InnoDB to add new entries at the end of the table, and keeps the BTREE index from having to be split at any point.  More on this splitting reference in a moment.  Note also that this blog post isn’t intended to promote one type of model over another, my goal is really to illustrate the impact your choice of PK will have on the data on disk.

A Primary Key serves multiple purposes in InnoDB:

  • Ensures uniqueness between rows
  • InnoDB saves row data on disk clustered by the Primary Key
  • Depending on the type used and INSERT / UPDATE pattern used, either provides for a unfragmented or severely fragmented Primary Key

I wanted to profile three different Primary Key types:

  1. integer AUTO_INCREMENT – this key will consume 4 bytes
  2. binary(16) using Ordered UUID() – as per Karthik’s UUID()-optimised blog post
  3. char(36) using UUID() – what my customer was used

I then used the powerful tool innodb_space’s function space-lsn-age-illustrate  (from Jeremy Cole’s innodb_ruby project) to plot the LSN (InnoDB’s Log Sequence Number, an always-incrementing value) pages from each table that uses the different Primary Keys via ASCII colour (so hot, right?  Thanks Jeremy!!).  For reference, the legend indicates that the darker the colour, the “older” the page’s updated version is (the LSN), while as you move across the colour spectrum to pink you’re looking at the most recently written LSN values.  What I’m trying to illustrate is that when you use AUTO_INCREMENT or UUID() that has been modified to insert in an ascending order, you get virtually no page splits, and thus consume the minimal amount of database pages.  On the left side you’re looking at the page IDs for this table, and the lower the number of pages consumed, the more efficiently packed the table’s data is within those pages.

This is an example of INSERT-only based on a Primary Key of AUTO_INCREMENT.  Notice how the darker colours are heavy at the earliest pages and lighter as we get to writing out the higher number pages.  Further this table finishes writing somewhere around 700 pages consumed.


Primary Key integer AUTO_INCREMENT

As we look at the optimised-UUID() INSERT pattern we see  that it too has a very evenly distributed pattern with oldest pages at the start (lowest page IDs) and newest written pages at the end of the table.  More pages are consumed however because the Primary Key is wider, so we end somewhere around 1,100 pages consumed.


Ordered UUID()-based Primary Key

Finally we arrive at the UUID() INSERT pattern, and as we expected, the fragmentation is extreme and has caused many page splits — this is the behaviour in InnoDB when a record needs to be written into an existing page (since it falls between two existing values) and InnoDB realises that if this additional value is written that the capacity of the page will be overcommitted, so it then “splits” the page into two pages and writes them both out.  The rash of pink in the image below shows us that UUID() causes significant fragmentation because it is causing pages to be split all throughout the table.  This is deemed “expensive” since the ibd file now is more than 2x greater than the UUID()-optimised method, and about 3x greater than a Primary Key with AUTO_INCREMENT.


UUID() Primary Key

Based on this investigation we determined that the true size of the 1 billion row table was about half the size as reported by Linux when examining the .ibd file.  We happened to have an opportunity to dump and load the table (mysqldump | mysql) and found that on restore the table consumed 450GB of disk — so our estimate was pretty good!

I also wanted to highlight that you can determine for yourself the statistics for data / pages split. As you can see below, the first two PK distributions are very tight, with pages packed up to 90%, however the UUID model leaves you with just slightly higher than 50%.  You can run this against your prepared backups if you use Percona XtraBackup since at least version 2.1 by using the –stats option.

[root@mysql]# xtrabackup --stats --datadir=/data/backups/mysql --target-dir=/data/backups/mysql | grep -A5 test | grep -A5 PRIMARY
  table: test/t1, index: PRIMARY, space id: 13, root page: 3, zip size: 0
  estimated statistics in dictionary:
    key vals: 8, leaf pages: 99, size pages: 161
  real statistics:
     level 1 pages: pages=1, data=1287 bytes, data/pages=7%
        leaf pages: recs=60881, pages=99, data=1461144 bytes, data/pages=90%
  table: test/t2_uuid_ordered, index: PRIMARY, space id: 14, root page: 3, zip size: 0
  estimated statistics in dictionary:
    key vals: 8, leaf pages: 147, size pages: 161
  real statistics:
     level 1 pages: pages=1, data=3675 bytes, data/pages=22%
        leaf pages: recs=60882, pages=147, data=2191752 bytes, data/pages=91%
  table: test/t3_uuid, index: PRIMARY, space id: 15, root page: 3, zip size: 0
  estimated statistics in dictionary:
    key vals: 8, leaf pages: 399, size pages: 483
  real statistics:
     level 2 pages: pages=1, data=92 bytes, data/pages=0%
     level 1 pages: pages=2, data=18354 bytes, data/pages=56%

Below are the table definitions along with the scripts I used to generate the data for this post.

mysql> show create table t1G
 *************************** 1. row ***************************
 Table: t1
 Create Table: CREATE TABLE `t1` (
 `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `c2` char(1) NOT NULL DEFAULT 'a',
 PRIMARY KEY (`c1`),
 KEY `c2` (`c2`)
 1 row in set (0.00 sec)
mysql> show create table t2_uuid_orderedG
 *************************** 1. row ***************************
 Table: t2_uuid_ordered
 Create Table: CREATE TABLE `t2_uuid_ordered` (
 `pk` binary(16) NOT NULL,
 `c2` char(1) NOT NULL DEFAULT 'a',
 PRIMARY KEY (`pk`),
 KEY `c2` (`c2`)
 1 row in set (0.00 sec)
mysql> show create table t3_uuidG
 *************************** 1. row ***************************
 Table: t3_uuid
 Create Table: CREATE TABLE `t3_uuid` (
 `pk` char(36) NOT NULL,
 `c2` char(1) NOT NULL DEFAULT 'a',
 PRIMARY KEY (`pk`),
 KEY `c2` (`c2`)
 1 row in set (0.00 sec)

[root@mysql]# cat make_rows.sh
while [ 1 ] ; do
 mysql -D test -e "insert into t1 (c2) values ('d')" ;
 mysql -D test -e "insert into t2_uuid_ordered (pk, c2) values (ordered_uuid(uuid()), 'a')" ;
 mysql -D test -e "insert into t3_uuid (pk, c2) values (uuid(), 'a')" ;

[root@mysql]# cat space_lsn_age_illustrate.sh
cd /var/lib/mysql
 innodb_space -f test/t1.ibd space-lsn-age-illustrate
 echo "T2 - UUID() ORDERED"
 innodb_space -f test/t2_uuid_ordered.ibd space-lsn-age-illustrate
 echo "T3 - UUID()"
 innodb_space -f test/t3_uuid.ibd space-lsn-age-illustrate

I hope that this post helps you to better understand the impact of random vs ordered Primary Key selection! Please share with me your thoughts on this post in the comments, thanks for reading!

Note to those attentive readers seeking more information: I plan to write a follow-up post that deals with these same models but from a performance perspective. In this post I tried to be as specific as possible with regards to the disk consumption and fragmentation concerns – I feel it addressed the first part and allude to this mysterious “fragmentation” beast but only teases at what that could mean for query response time… Just sit tight, I’m hopeful to get a tag-along to this one post-PLMCE!

By the way, come see me speak at the Percona Live MySQL Conference and Expo in Santa Clara, CA the week of April 13th – I’ll be delivering 5 talks and moderating one Keynote Panel. I hope to see you there!  If you are at PLMCE, attend one my talks or stop me in the hallway and say “Hi Michael, I read your post, now where’s my beer?” – and I’ll buy you a cold one of your choice :)

The post Illustrating Primary Key models in InnoDB and their impact on disk usage appeared first on MySQL Performance Blog.


Using Percona Cloud Tools to solve real-world MySQL problems

For months when speaking with customers I have been positioning Percona Cloud Tools (PCT) as a valuable tool for the DBA/Developer/SysAdmin but only recently have I truly been able to harness the data and make a technical recommendation to a customer that I feel would have been very difficult to accomplish otherwise.

Using Percona Cloud Tools to solve real-world MySQL problemsLet me provide some background: I was tasked with performing a Performance Audit for one of our customers (Performance Audits are extremely popular as they allow you to have a MySQL Expert confirm or reveal challenges within your MySQL environment and make your database run faster!) and as part of our conversation we discussed and agreed to install Percona Cloud Tools. We let the site run for a few days, and then I started my audit. What I noticed was that at regular intervals there was often a CPU spike, along with a corresponding drop in Queries Per Second (QPS), but that lasted only for a few seconds. We decided that further investigation was warranted as the customer was concerned the spikes impacted their users’ experience with the application.

Here are the tasks that Percona Cloud Tools made easy while I worked to identify the source of the CPU spike and QPS drop:

  1. Per-second granularity data capture of PCT allowed me to identify how significant the spike and QPS actually were – if I was looking at the 1 minute or higher average values (such as Cacti would provide) I probably wouldn’t have been able to detect the spike or stall as clearly in the first place, it would have been lost in the average. In the case of PCT the current graphs group at the 1 minute range but you have the ability to view the min and max values during this 1 minute range since they are the true highest and lowest observed 1s intervals during the 1 minute group.
  2. Ability for all graphs to maintain the same resolution time allowed me to zero-in on the problematic time period and then quickly look across all graphs for corresponding deflections. This analysis led me to discover a significant spike in InnoDB disk reads.
  3. Ability to use the Query Analytics functionality to zoom-in again on the problematic query. By adjusting Query Analytics to an appropriate time period narrowed down the range of unique queries that could be considered the cause. This task in my opinion is the best part of using PCT.
  4. Query Analytics allowed me to view the Rows Examined in Total for each query based on just this shortened interval. I then tagged those that had higher than 10k Rows Examined (arbitrary but most queries for this customer seemed to fall below this) so that I could then review in real-time with the customer before making a decision on what to do next. We can only view this sort of information by leveraging the slow query log – this data is not available via Performance_Schema or via network sniffing.

Once we were able to identify the problematic queries then the rest was routine query optimization – 10 minutes work using Percona Cloud Tools for what might have been an hour using traditional methods!

For those of you wondering how else this can be done, assuming you detected the CPU spike / QPS drop (perhaps you are using Graphite or other tool that can deliver per-second resolution) then you’d also need to be capturing the slow query log at a good enough resolution level (I prefer long_query_time=0 to just get it all), and then be adept at leveraging pt-query-digest with –since and –until options to narrow down your range of queries.  The significant drawback to this approach is that each time you want to tweak your time range you probably need to stream through a fairly large slow log file multiple times which can be both CPU and disk intensive operations, which means it can take some time (probably minutes, maybe hours) depending on the size of your log file.  Certainly a workable approach but nowhere near as quick as reloading a page in your browser :)

So what are you waiting for? Start using Percona Cloud Tools today, it’s free! Register for the free beta here.

The post Using Percona Cloud Tools to solve real-world MySQL problems appeared first on MySQL Performance Blog.


Using sysbench 0.5 for performing MySQL benchmarks

Given the recent excitement & interest around OpenStack I wanted to make sure I was ready to conduct appropriate evaluations of system performance.  I generally turn to sysbench since it comes with a variety of different tests (accessed via –test= option interface), including:

  • fileio – File I/O test
  • cpu – CPU performance test
  • memory – Memory functions speed test
  • threads – Threads subsystem performance test
  • mutex – Mutex performance test

As you can see, sysbench lets you stress many of the fundamental components of your hardware and infrastructure, such as your disk subsystem, along with your CPUs and memory. An additional option exists that is designed to perform synthetic stress testing of MySQL, and I was surprised when I didn’t see it in the above list on version 0.5, as it used to show up as “oltp – OLTP test”. What happened to –test=oltp ??

This list is from the latest release of sysbench which is 0.5 — you’re only going to be on this version if you build it yourself or if you use the package provided courtesy of Frederic Descamps (thanks lefred!).  If you’re using the version from EPEL, Ubuntu 14.04, or Debian 7 you’re still using version 0.4.12 (check with sysbench –version).  One thing you’ll notice is that the test type of OLTP doesn’t show up anymore.  What gives?  I was scratching my head until I asked on Percona IRC and found out that in 0.5 the standard OLTP test type was replaced with a different syntax, that instead of passing parameters to sysbench you instead reference scripts written in lua.  The advantage here is that now you have an interface in order to write your own specific load tests (provided you know lua, but it isn’t hard).  For those of you looking to run the pre-canned load tests they still exist but you have to have them as part of the RPM install or otherwise copied to your system.

Fortunately if you use the package provided by lefred you’ll find these lua scripts here (this is using Amazon ami as of August 4th, 2014):

[root@pxc-control ~]# ls -l /usr/share/doc/sysbench/tests/db/
total 44
-rw-r--r-- 1 root root 3585 Sep 7 2012 common.lua
-rw-r--r-- 1 root root 340 Sep 7 2012 delete.lua
-rw-r--r-- 1 root root 830 Sep 7 2012 insert.lua
-rw-r--r-- 1 root root 2925 Sep 7 2012 oltp.lua
-rw-r--r-- 1 root root 342 Sep 7 2012 oltp_simple.lua
-rw-r--r-- 1 root root 425 Sep 7 2012 parallel_prepare.lua
-rw-r--r-- 1 root root 343 Sep 7 2012 select.lua
-rw-r--r-- 1 root root 3964 Sep 7 2012 select_random_points.lua
-rw-r--r-- 1 root root 4066 Sep 7 2012 select_random_ranges.lua
-rw-r--r-- 1 root root 343 Sep 7 2012 update_index.lua
-rw-r--r-- 1 root root 552 Sep 7 2012 update_non_index.lua

So the trick (if you want to call it that) is that instead of passing a single word to the –test directive, instead you pass the full path to the lua script.

This is the old way (sysbench 0.4.12 from EPEL repo):

--test=oltp --oltp-test-mode=complex

This is the new way (sysbench 0.5):


Here is an example of a test I’m running through haproxy against a 3-node PXC cluster doing the INSERT-only test type so you can see the full syntax I pass to sysbench:

[root@pxc-control ~]# cat sys_haproxy.sh

And here’s what the insert.lua script looks like:

[root@pxc-control ~]# cat /usr/share/doc/sysbench/tests/db/insert.lua
pathtest = string.match(test, "(.*/)") or ""
dofile(pathtest .. "common.lua")
function thread_init(thread_id)
function event(thread_id)
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   table_name = "sbtest".. sb_rand_uniform(1, oltp_tables_count)
   if (oltp_auto_inc) then
      i = 0
      i = sb_rand_uniq(1, oltp_table_size)
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
   pad_val = sb_rand_str([[
   rs = db_query("INSERT INTO " .. table_name ..  " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')",i, k_val, c_val, pad_val))

The thing that I like most about sysbench 0.5 (beyond the lua interface, of course!) is that it now comes with a –report-interval option (which I generally set as = 1) so that you get output while the script is running. No more waiting until the end of the test to get feedback! Here’s a sample of sysbench 0.5 in action running the INSERT test through a local haproxy instance and writing to three nodes in a PXC cluster such as OpenStack Trove might do:

[root@pxc-control ~]# ./sys_haproxy.sh run
sysbench 0.5:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 3
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
Threads started!
[   1s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1099.28, response time: 9.86ms (95%)
[   2s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 973.02, response time: 10.77ms (95%)
[   3s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1181.01, response time: 6.23ms (95%)
[   4s] threads: 3, tps: 0.00, reads/s: 0.00, writes/s: 1103.00, response time: 6.77ms (95%)

I would also like to call your attention to a blog post by Nilnandan Joshi from Percona’s Support team where he describes a method to build sysbench 0.5 on Debian 7.  Thanks Nil for pointing this out!

I hope that helps others out there who upgrade to sysbench 0.5 and then have questions about where –test=oltp went to. I’d love to hear your own sysbench use cases, and whether anyone else is publishing lua scripts for their own load testing!

The post Using sysbench 0.5 for performing MySQL benchmarks appeared first on MySQL Performance Blog.


Using MySQL triggers and views in Amazon RDS

I recently had an opportunity to migrate a customer from a physical server into Amazon’s RDS environment. In this particular case the customers’ platform makes extensive use of MySQL triggers and views.  I came across two significant issues that prevented me from following Amazon’s documentation, which basically states “use mysqldump” but doesn’t call out a specific method of dealing with MySQL triggers and views.

Amazon Relational Database Service (Amazon RDS) is a great platform if you’re looking for complete hands-off management of your MySQL environment, but comes at a cost in the area of flexibility, i.e. you don’t have SUPER privilege and this brings up additional challenges.

  1. You need to ensure you set log_bin_trust_function_creators=1 ( by default this is off, 0).
  2. You need to clean up your mysqldump syntax.

#1 is easy, you simply make a configuration change within the Amazon RDS GUI on the node’s Parameter Group to set log_bin_trust_function_creators=1 and then a restart of your Amazon RDS node.  The restart is required since without the SUPER privilege you lose access to changing DYNAMIC variables on the fly.
#2 is a little more complex.  If you go with vanilla mysqldump (from say a 5.5 mysqldump binary) on a schema that has triggers and views, you will see error 1227, something like this:

ERROR 1227 (42000) at line 27311: Access denied; you need (at least one of) the SUPER privilege(s) for this operation

You’re seeing this message because MySQL in Amazon RDS doesn’t provide the SUPER privilege, and thus you cannot set up a trigger or view to run as a different user — only a user with SUPER can do that.

mysqldump will generate syntax for a trigger like this:

/*!50003 CREATE*/ /*!50017 DEFINER=`root`@`%`*/ /*!50003 TRIGGER `after_insert_lead` AFTER INSERT ON `leads` FOR EACH ROW BEGIN
UPDATE analytics.mapping SET id_lead = NEW.id_lead WHERE mc_email = NEW.email;
END */;;

and for a view like this:

/*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER */
/*!50001 VIEW `admin_user_view` AS SELECT ...

The problem is in the “DEFINER” lines.

Here’s one method that worked for me:

  1. Identify all the DEFINER lines in your schema. I found it helpful to dump out a –no-data and then weed through that to get a unique list of the DEFINER lines
  2. Create a sed line for each unique DEFINER line (see my example in a moment)
  3. Include this sed line in your dump/load script

Here’s what my sed matches looked like:

-e 's//*!50017 DEFINER=`root`@`localhost`*///'
-e 's//*!50017 DEFINER=`root`@`%`*///'
-e 's//*!50017 DEFINER=`web`@`%`*///'
-e 's//*!50017 DEFINER=`cron`@`%`*///'
-e 's//*!50013 DEFINER=`cron`@`%` SQL SECURITY DEFINER *///'
-e 's//*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER *///'
-e 's//*!50013 DEFINER=`root`@`%` SQL SECURITY DEFINER *///'
-e 's//*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER *///'

Note: the example above won’t directly work due to WordPress “helpfully” stripping my text… you need to escape the forward slashes and asterisks.

A big caveat: this method is akin to a brute force method of getting your data into Amazon RDS — you’ve lost the elegance & security of running your triggers and views as separate defined users within the database — they are all now going to run as the user you loaded them in as. If this is a show-stopper for you, contact Percona and I’d be happy to take on your case and develop a more comprehensive solution.  :)

Now all that’s left is to integrate this into your dump flow.  Something like this should work:

| sed
-e ... lots of lines
| mysql

I hope this helps someone!

The post Using MySQL triggers and views in Amazon RDS appeared first on MySQL Performance Blog.


Implementing SchemaSpy in your MySQL environment

SchemaSpyLately I have been working with a set of customers on a longer term basis which has given me time to explore new tools using their environments.  One tool that I am finding very helpful is called SchemaSpy.

SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.

One of the things that I like about  ERD tools is that I can quickly generate a graphic representation of the database to see which tables are referenced the most, and which are candidates for dropping (why keep old data if you don’t need to?).  I get up to speed a lot faster and can contribute to the Development process a lot more effectively when I know the relationships between tables versus observing queries only.  An additional benefit is that SchemaSpy is command-line driven and builds the html after each run, so I find it convenient to set it up on crontab so that schema changes are automatically picked up.  Nice, eh?

Usage of this tool is very straightforward and it does a lot of the work for you.  Basically if your database has Foreign Keys then you’re laughing, if not don’t despair, you just have a bit more work ahead of you.  Most of the time SchemaSpy “does the right thing” but sometimes you need to give it a little help in the form of metadata files.

Installation of SchemaSpy and Dependencies

  • Download the latest copy of SchemaSpy jar file (5.0.0 at time of writing)
  • Make sure you have latest copy of Java JRE for your platform
  • You will need a Java driver for your database — I’m using the MySQL Connector/J
  • Install the graphiz package

Creating a mysql.properties file

While not explicitly required, I prefer to create this once so that the  command line is shorter and neater.  In my case here is the properties file in use for my Percona Server 5.6.10 sandbox:


Example Schema

CREATE TABLE `parent` (
 `parent_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 PRIMARY KEY (`parent_id`)
CREATE TABLE `child_A` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `parent_id` int(10) unsigned DEFAULT NULL,
CREATE TABLE `child_B` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `parent_id` int(10) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `parent_id` (`parent_id`),
 CONSTRAINT `child_B_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`parent_id`)

Notice that in the case of child_A there is an implicit FK relationship to parent via parent_id, however in child_B it is explicit.  This has an impact on how SchemaSpy identifies the relationship, and whether you need to provide a hint or not.

Running SchemaSpy (with Foreign Keys)

When you run schemaSpy without a metadata file, you will see that the relationship between parent and child_B is properly displayed:


Table child_A can be made to display if you tag the Implied relationships box, but this generally only works if you have used the exact same name (in this case, parent_id) in the child and parent tables.  Your environment may be like a lot of shops that we see, where the parent table is known as id, and on the child tables they reference it as parent_id — this might make sense to Developers but it doesn’t help SchemaSpy at all.  This is where Metadata files come in to play


Creating a Metadata File

The format of the metadata file is XML-based.  I have included the metadata file I used to properly link child_A to parent table:

<?xml version="1.0" encoding="UTF-8"?>
<schemaMeta xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Main Production Database. Percona, Michael Coburn
<table name="parent" comments="Parent table">
<column name="parent_id" primaryKey="true">
<table name="child_A" comments="Non-FK relationship, implicit relationship to parent">
<column name="id" primaryKey="true">
<column name="parent_id">
<foreignKey table="parent" column="parent_id"/>

If you’re looking for other examples you can find a more comprehensive example here.

Running SchemaSpy

The syntax below if specific for MySQL, but note that SchemaSpy works for pretty much all the popular RDBMS out there.

java -jar schemaSpy_5.0.0.jar -t mysql -u msandbox_ro -p msandbox -meta schemaspy.meta.xml -o /var/www/schemaspy/

The -o directive tells SchemaSpy where to write the output to, make sure this is a directory that can then be accessed by a web server.  Also note that since I created the mysql.properties file in the same directory as the jar file, I don’t need to pass any host:port or schema name information.

Viewing SchemaSpy output

A completed Relationships view of these three tables should look like this:


Final Thoughts

While I find the Relationships tab the most useful component of SchemaSpy, I don’t want to leave you with the impression that this is the only component of the tool.  There are additional tabs:

  • Tables – Names, number of children, parents, count of columns, row counts, and comments — a great way for a high level overview of the table sizes and an easy way to use the search feature of your browser to zero in on a particular table
  • Constraints – lists the explicit Foreign Key constraints in the database (this does not include constraints identified via metadata files!)
  • Anomalies – Identifies possible relationships between columns/tables based on names, tables without indexes, columns flagged ‘nullable’ and ‘must be unique’ (woops!), single column tables, incrementing column names in tables, and tables with the string NULL instead of the actual SQL NULL value. This is basically a quick sanity check of your schema for any significant errors or items requiring review.
  • Columns – a listing of all columns in the schema, really handy to sort by name to see if you have any implicit constraints that you might have missed and can then write into your metadata file.
  • Donate – This is free software and John Currier asks for donations so he can justify the time spent maintaining SchemaSpy to his wife :)

Finally don’t forget to automate SchemaSpy via crontab once you’re done.

What are some ERD tools you use and how do they compare with SchemaSpy?  Feel free to answer via the comments.  Thanks for reading!

The post Implementing SchemaSpy in your MySQL environment appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com