Aug
21
2015
--

Find unused indexes on MongoDB and TokuMX

Finding and removing unused indexes is a pretty common technique to improve overall performance of relational databases. Less indexes means faster insert and updates but also less disk space used. The usual way to do it is to log all queries’ execution plans and then get a list of those indexes that are not used. Same theory applies to MongoDB and TokuMX so in this blog post I’m going to explain how to find those.

Profiling in MongoDB

To understand what profiling is you only need to think about MySQL’s slow query log, it is basically the same idea. It can be enabled with the following command:

db.setProfilingLevel(level, slowms)

There are three different levels:

0: No profiling enabled.
1: Only those queries slower than “slowms” are profiled.
2: All queries are profiled, similar to query_long_time=0.

Once it is enabled you can use db.system.profile.find().pretty() to read it. You would need to scan through all profiles and find those indexes that are never used. To make things easier there is a javascript program that will find the unused indexes after reading all the profile information. Unfortunately, it only works with mongodb 2.x.

The javascript is hosted in this github project https://github.com/wfreeman/indexalizer You just need to start mongo shell with indexStats.js and run db.indexStats() command. This is an sample output:

scanning profile {ns:"test.col"} with 2 records... this could take a while.
{
	"query" : {
		"b" : 1
	},
	"count" : 1,
	"index" : "",
	"cursor" : "BtreeCursor b_1",
	"millis" : 0,
	"nscanned" : 1,
	"n" : 1,
	"scanAndOrder" : false
}
{
	"query" : {
		"b" : 2
	},
	"count" : 1,
	"index" : "",
	"cursor" : "BtreeCursor b_1",
	"millis" : 0,
	"nscanned" : 1,
	"n" : 1,
	"scanAndOrder" : false
}
checking for unused indexes in: col
this index is not being used:
"_id_"
this index is not being used:
"a_1"

 

So “a_1” is not used and could be dropped. We can ignore “_id_” because that one is needed :)

There is a problem with profiling. It will affect performance so you need to run it only for some hours and usually during low peak. That means that there is a possibility that not all possible queries from your application are going to be executed during that maintenance window. What alternative TokuMX provides?

Finding unused indexes in TokuMX

Good news for all of us. TokuMX doesn’t require you to enable profiling. Index usage statistics are stored as part of every query execution and you can access them with a simple db.collection.stats() command. Let me show you an example:

> db.col.stats()
[...]
{
"name" : "a_1",
"count" : 5,
"size" : 140,
"avgObjSize" : 28,
"storageSize" : 16896,
"pageSize" : 4194304,
"readPageSize" : 65536,
"fanout" : 16,
"compression" : "zlib",
"queries" : 0,
"nscanned" : 0,
"nscannedObjects" : 0,
"inserts" : 0,
"deletes" : 0
},
{
"name" : "b_1",
"count" : 5,
"size" : 140,
"avgObjSize" : 28,
"storageSize" : 16896,
"pageSize" : 4194304,
"readPageSize" : 65536,
"fanout" : 16,
"compression" : "zlib",
"queries" : 2,
"nscanned" : 2,
"nscannedObjects" : 2,
"inserts" : 0,
"deletes" : 0
}
],
"ok" : 1
}

 

There are our statistics without profiling enabled. queries means the number of times that index has been used on a query execution. b_1 has been used twice and a_1 has never been used. You can use this small javascript code I’ve written to scan all collections inside the current database:

db.forEachCollectionName(function (cname) {
	output = db.runCommand({collstats : cname });
	print("Checking " + output.ns + "...")
	output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }})
});

 

An example using the same data:

> db.forEachCollectionName(function (cname) {
... output = db.runCommand({collstats : cname });
... print("Checking " + output.ns + "...")
... output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }})
...
... });
Checking test.system.indexes...
Checking test.col...
Unused index: a_1

 

Conclusion

Finding unused indexes is a regular task that every DBA should do. In MongoDB you have to use profiling while in TokuMX nothing needs to be enabled because it will gather information by default without impacting service performance.

The post Find unused indexes on MongoDB and TokuMX appeared first on Percona Data Performance Blog.

Jun
17
2015
--

Getting EXPLAIN information from already running queries in MySQL 5.7

When a new version of MySQL is about to be released we read a lot of blog posts about the performance and scalability improvements. That’s good but sometimes we miss some small features that can help us a lot in our day-to-day tasks. One good example is the blog post that Aurimas wrote about a new small feature in MySQL 5.6 that I didn’t know about until I read it: the Automatic InnoDB transaction log file size change. How cool is that?

I plan to write a series of blog posts that will show some of those small new features in MySQL 5.7 that are going to be really useful. I’m going to start with EXPLAIN FOR CONNECTION.

This feature allows us to run an EXPLAIN for an already running statement. Let’s say that you find a query that has been running for a long time and you want to check why that could be happening. In 5.7 you can just ask MySQL to EXPLAIN the query that a particular connection is running and get the execution path. You can use it if the query is a SELECT, DELETE, INSERT, REPLACE or UPDATE. Won’t work if the query is a prepared statement though.

Let me show you an example of how it works.

We have a long running join.

mysql [localhost] {msandbox} ((none)) > show processlist G
*************************** 1. row ***************************
     Id: 9
   User: msandbox
   Host: localhost
     db: employees
Command: Query
   Time: 49
  State: Sending data
   Info: select count(*) from employees, salaries where employees.emp_no = salaries.emp_no

Let’s see the execution plan for the query:

mysql [localhost] {msandbox} ((none)) > explain for connection 9 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299540
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2803840
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)

The join between those tables is not using any index at all so there is some room for improvement here :)

Conclusion

You can use this feature to see why a query is running for too long and based on the info decide how to fix it and how to proceed. This is going to be a very useful feature for DBAs who want to diagnose performance problems and slow queries.

The post Getting EXPLAIN information from already running queries in MySQL 5.7 appeared first on MySQL Performance Blog.

Apr
15
2015
--

Checking table definition consistency with mysqldiff

Data inconsistencies in replication environments are a pretty common. There are lots of posts that explain how to fix those using pt-table-checksum and pt-table-sync. Usually we only care about the data but from time to time we receive this question in support:

How can I check the table definition consistency between servers?

Replication also allow us to have different table definition between master and slaves. For example, there are some cases that you need some indexes on slaves for querying purposes but are not really needed on the master. There are some other cases where those differences are just a mistake that needs to be fixed.

mysqldiff, included in Oracle’s MySQL Utilities, can help us to find those differences and get the information we need to fix those them. In this post I’m going to show you how to use it with an example.

Find table definition inconsistencies

mysqldiff allows us to find those inconsistencies checking the differences between the tables on the same server (different databases) or on different servers (also possible on different databases). In this example I’m going to search for differences in table definitions between two different servers, server1 and server2.

The command line is pretty simple. This is used to compare the tables on “test” database:

mysqldiff --server1=user@host1 --server2=user@host2 test:test

If the database name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb:anotherdb

If the table name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb.table1:anotherdb.anothertable

Now I want to check the table definition consistency between two servers. The database’s name is “employees”:

# mysqldiff --force --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
# WARNING: Using a password on the command line interface can be insecure.
# server1 on 127.0.0.1: ... connected.
# server2 on 127.0.0.1: ... connected.
# Comparing `employees` to `employees`                             [PASS]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`departments`
+++ `employees`.`departments`
@@ -1,6 +1,6 @@
 CREATE TABLE `departments` (
   `dept_no` char(4) NOT NULL,
-  `dept_name` varchar(40) NOT NULL,
+  `dept_name` varchar(256) DEFAULT NULL,
   PRIMARY KEY (`dept_no`),
   UNIQUE KEY `dept_name` (`dept_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`dept_emp` to `employees`.`dept_emp`       [PASS]
# Comparing `employees`.`dept_manager` to `employees`.`dept_manager`   [PASS]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`employees`
+++ `employees`.`employees`
@@ -5,5 +5,6 @@
   `last_name` varchar(16) NOT NULL,
   `gender` enum('M','F') NOT NULL,
   `hire_date` date NOT NULL,
-  PRIMARY KEY (`emp_no`)
+  PRIMARY KEY (`emp_no`),
+  KEY `last_name` (`last_name`,`first_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`salaries` to `employees`.`salaries`       [PASS]
# Comparing `employees`.`titles` to `employees`.`titles`           [PASS]
Compare failed. One or more differences found.

There are at least two differences. One in departments table and another one in employees table. The output is similar to diff. By default the tool stops after finding the first difference. That’s why we use –force, to tell the tool to continue checking all the tables.

It shows us that on departments the dept_name is varchar(40) on server1 and varchar(256) on server2. For “employees” table, it has a KEY (last_name, first_name) on the server2 that is not present on server1. Why is it taking server2 as a reference? Because of this line:

# Object definitions differ. (--changes-for=server1)

So, the changes shown on the diff are for server1. If you want server2 to be the one to be changed and server1 used as reference, then –changes-for=server2 would be needed.

In some cases the diff output is not really useful. We actually need a SQL query to do the changes on the server. We just need to add –difftype=sql to the command line:

# mysqldiff --force --difftype=sql --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
[...]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Transformation for --changes-for=server1:
ALTER TABLE `employees`.`departments`
  DROP INDEX dept_name,
  ADD UNIQUE INDEX dept_name (dept_name),
  CHANGE COLUMN dept_name dept_name varchar(256) NULL;
[...]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Transformation for --changes-for=server1:
#
ALTER TABLE `employees`.`employees`
  DROP PRIMARY KEY,
  ADD PRIMARY KEY(`emp_no`),
  ADD INDEX last_name (last_name,first_name);

As we can see, the tool is not perfect. There are two problems here:

1- On “departments table” it drops a UNIQUE key that is present in both servers only to add it again. Waste of time and resources.

2- On “employees” table it drops and recreate the PRIMARY KEY, again something that is not needed a all.

I have created a bug report but this also teaches us a good lesson. Don’t just copy and paste commands without first double checking it.

What mysqldiff runs under the hood?

Mostly queries on INFORMATION_SCHEMA. These are the ones used to check inconsistencies on departments:

SHOW CREATE TABLE `departments`;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT CONSTRAINT_NAME, COLUMN_NAME, REFERENCED_TABLE_SCHEMA,
         REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
  FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments' AND
        REFERENCED_TABLE_SCHEMA IS NOT NULL;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';

As a summary, it checks partitions, row format, collation, constraints and so on.

Conclusion

There are different tools for different purposes. We can check the data consistency with pt-table-checkum/pt-table-sync but also the table definitions with mysqldiff.

The post Checking table definition consistency with mysqldiff appeared first on MySQL Performance Blog.

Apr
15
2015
--

Checking table definition consistency with mysqldiff

Data inconsistencies in replication environments are a pretty common. There are lots of posts that explain how to fix those using pt-table-checksum and pt-table-sync. Usually we only care about the data but from time to time we receive this question in support:

How can I check the table definition consistency between servers?

Replication also allow us to have different table definition between master and slaves. For example, there are some cases that you need some indexes on slaves for querying purposes but are not really needed on the master. There are some other cases where those differences are just a mistake that needs to be fixed.

mysqldiff, included in Oracle’s MySQL Utilities, can help us to find those differences and get the information we need to fix those them. In this post I’m going to show you how to use it with an example.

Find table definition inconsistencies

mysqldiff allows us to find those inconsistencies checking the differences between the tables on the same server (different databases) or on different servers (also possible on different databases). In this example I’m going to search for differences in table definitions between two different servers, server1 and server2.

The command line is pretty simple. This is used to compare the tables on “test” database:

mysqldiff --server1=user@host1 --server2=user@host2 test:test

If the database name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb:anotherdb

If the table name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb.table1:anotherdb.anothertable

Now I want to check the table definition consistency between two servers. The database’s name is “employees”:

# mysqldiff --force --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
# WARNING: Using a password on the command line interface can be insecure.
# server1 on 127.0.0.1: ... connected.
# server2 on 127.0.0.1: ... connected.
# Comparing `employees` to `employees`                             [PASS]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`departments`
+++ `employees`.`departments`
@@ -1,6 +1,6 @@
 CREATE TABLE `departments` (
   `dept_no` char(4) NOT NULL,
-  `dept_name` varchar(40) NOT NULL,
+  `dept_name` varchar(256) DEFAULT NULL,
   PRIMARY KEY (`dept_no`),
   UNIQUE KEY `dept_name` (`dept_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`dept_emp` to `employees`.`dept_emp`       [PASS]
# Comparing `employees`.`dept_manager` to `employees`.`dept_manager`   [PASS]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`employees`
+++ `employees`.`employees`
@@ -5,5 +5,6 @@
   `last_name` varchar(16) NOT NULL,
   `gender` enum('M','F') NOT NULL,
   `hire_date` date NOT NULL,
-  PRIMARY KEY (`emp_no`)
+  PRIMARY KEY (`emp_no`),
+  KEY `last_name` (`last_name`,`first_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`salaries` to `employees`.`salaries`       [PASS]
# Comparing `employees`.`titles` to `employees`.`titles`           [PASS]
Compare failed. One or more differences found.

There are at least two differences. One in departments table and another one in employees table. The output is similar to diff. By default the tool stops after finding the first difference. That’s why we use –force, to tell the tool to continue checking all the tables.

It shows us that on departments the dept_name is varchar(40) on server1 and varchar(256) on server2. For “employees” table, it has a KEY (last_name, first_name) on the server2 that is not present on server1. Why is it taking server2 as a reference? Because of this line:

# Object definitions differ. (--changes-for=server1)

So, the changes shown on the diff are for server1. If you want server2 to be the one to be changed and server1 used as reference, then –changes-for=server2 would be needed.

In some cases the diff output is not really useful. We actually need a SQL query to do the changes on the server. We just need to add –difftype=sql to the command line:

# mysqldiff --force --difftype=sql --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
[...]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Transformation for --changes-for=server1:
ALTER TABLE `employees`.`departments`
  DROP INDEX dept_name,
  ADD UNIQUE INDEX dept_name (dept_name),
  CHANGE COLUMN dept_name dept_name varchar(256) NULL;
[...]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Transformation for --changes-for=server1:
#
ALTER TABLE `employees`.`employees`
  DROP PRIMARY KEY,
  ADD PRIMARY KEY(`emp_no`),
  ADD INDEX last_name (last_name,first_name);

As we can see, the tool is not perfect. There are two problems here:

1- On “departments table” it drops a UNIQUE key that is present in both servers only to add it again. Waste of time and resources.

2- On “employees” table it drops and recreate the PRIMARY KEY, again something that is not needed a all.

I have created a bug report but this also teaches us a good lesson. Don’t just copy and paste commands without first double checking it.

What mysqldiff runs under the hood?

Mostly queries on INFORMATION_SCHEMA. These are the ones used to check inconsistencies on departments:

SHOW CREATE TABLE `departments`;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT CONSTRAINT_NAME, COLUMN_NAME, REFERENCED_TABLE_SCHEMA,
         REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
  FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments' AND
        REFERENCED_TABLE_SCHEMA IS NOT NULL;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';

As a summary, it checks partitions, row format, collation, constraints and so on.

Conclusion

There are different tools for different purposes. We can check the data consistency with pt-table-checkum/pt-table-sync but also the table definitions with mysqldiff.

The post Checking table definition consistency with mysqldiff appeared first on MySQL Performance Blog.

Jan
06
2015
--

Getting mutex information from MySQL’s performance_schema

We have been using SHOW ENGINE INNODB MUTEX command for years. It shows us mutex and rw-lock information that could be useful during service troubleshooting in case of performance problems. As Morgan Tocker announced in his blog post the command will be removed from MySQL 5.7 and we have to use performance_schema to get that info.

The documentation of MySQL also says that most of the command output has been removed from 5.6 and that we can find similar info in performance_schema. It doesn’t show any examples of how to use performance_schema or what is the query we need to use from now on. It is also important to mention that 5.6 doesn’t show any warning about the feature being deprecated.

This is a short blog post to show how to configure performance_schema and get the info we need. Hoping it will end up in the official documentation in some way.

The instruments we need are not enabled by default. Those are in wait/synch/mutex/% so the config line we need to use is:

performance-schema-instrument='wait/synch/mutex/innodb/%=ON'

Then, just compare the results from an idle Percona Server 5.6. First the output of SHOW ENGINE…

mysql> show engine innodb mutex;
+--------+------------------------------+------------+
| Type   | Name                         | Status     |
+--------+------------------------------+------------+
| InnoDB | &buf_pool->flush_state_mutex | os_waits=1 |
| InnoDB | &log_sys->checkpoint_lock    | os_waits=2 |
+--------+------------------------------+------------+

Now the results from the query that get us the mutex information from performance_schema:

mysql> SELECT EVENT_NAME, SUM_TIMER_WAIT/1000000000 WAIT_MS, COUNT_STAR FROM performance_schema.events_waits_summary_global_by_event_name
WHERE SUM_TIMER_WAIT > 0
AND EVENT_NAME LIKE 'wait/synch/mutex/innodb/%'
ORDER BY SUM_TIMER_WAIT DESC, COUNT_STAR DESC;
+----------------------------------------------------+---------+------------+
| EVENT_NAME                                         | WAIT_MS | COUNT_STAR |
+----------------------------------------------------+---------+------------+
| wait/synch/mutex/innodb/log_sys_mutex              | 11.1054 |      28279 |
| wait/synch/mutex/innodb/buf_pool_flush_state_mutex |  9.7611 |      94095 |
| wait/synch/mutex/innodb/os_mutex                   |  5.3339 |      58515 |
| wait/synch/mutex/innodb/dict_sys_mutex             |  2.4108 |       4033 |
| wait/synch/mutex/innodb/flush_list_mutex           |  2.3688 |       8036 |
| wait/synch/mutex/innodb/lock_wait_mutex            |  2.2412 |       4016 |
| wait/synch/mutex/innodb/buf_pool_LRU_list_mutex    |  2.1912 |       4182 |
| wait/synch/mutex/innodb/fil_system_mutex           |  0.9789 |       5060 |
| wait/synch/mutex/innodb/mutex_list_mutex           |  0.1723 |       8523 |
| wait/synch/mutex/innodb/rw_lock_list_mutex         |  0.1706 |       8245 |
| wait/synch/mutex/innodb/srv_innodb_monitor_mutex   |  0.0102 |         65 |
| wait/synch/mutex/innodb/recv_sys_mutex             |  0.0050 |        146 |
| wait/synch/mutex/innodb/buf_pool_free_list_mutex   |  0.0048 |        165 |
| wait/synch/mutex/innodb/trx_mutex                  |  0.0020 |        105 |
| wait/synch/mutex/innodb/srv_sys_mutex              |  0.0012 |         11 |
| wait/synch/mutex/innodb/trx_sys_mutex              |  0.0010 |         29 |
| wait/synch/mutex/innodb/lock_mutex                 |  0.0008 |         26 |
| wait/synch/mutex/innodb/innobase_share_mutex       |  0.0004 |          5 |
| wait/synch/mutex/innodb/buf_dblwr_mutex            |  0.0003 |          4 |
| wait/synch/mutex/innodb/file_format_max_mutex      |  0.0003 |          6 |
| wait/synch/mutex/innodb/rseg_mutex                 |  0.0002 |          7 |
| wait/synch/mutex/innodb/recv_writer_mutex          |  0.0001 |          1 |
| wait/synch/mutex/innodb/ut_list_mutex              |  0.0001 |          1 |
| wait/synch/mutex/innodb/ibuf_mutex                 |  0.0001 |          2 |
| wait/synch/mutex/innodb/log_flush_order_mutex      |  0.0000 |          1 |
+----------------------------------------------------+---------+------------+

The difference is clear. We get much more information from Performance Schema. In my personal opinion, despite the extra resources needed by Performance Schema, the change is for the better.

The post Getting mutex information from MySQL’s performance_schema appeared first on MySQL Performance Blog.

Nov
20
2014
--

Sys Schema for MySQL 5.6 and MySQL 5.7

Performance Schema (P_S) has been available since MySQL 5.5, more than 4 years ago. It is still difficult to see production servers with P_S enabled, even with MySQL 5.6 where the feature is enabled by default. There have been several complaints like the overhead, that the fix is a work in progress, and the ease of use. 52 tables to query and 31 configuration variables is enough to scare people.

There is a solution for the second problem, the usability. It’s name is “sys schema“. It is a collection of views, functions and procedures to make P_S more user friendly.

Installation

If you are a MySQL Workbench user the installation is pretty easy because sys schema is already included. You just need to install it. Click on “Performance – Performance Reports” and there you will find the “Install Helper” button that will install sys schema.

sys schema mysql workbench

If you don’t use MySQL Workbench you need to download sys_56.sql or sys_57.sql (depends if you use 5.6 or 5.7) from the github repository. Then, just import the sql file as usual:

mysql -u root -p < ./sys_56.sql

Usage

After the import, you will have a new “sys” schema with some very descriptive table names. Let’s see an example. Do you want to know what tables are using most of our InnoDB buffer memory? Easy:

mysql> select * from sys.innodb_buffer_stats_by_table;
+---------------+--------------------+-----------+-----------+-------+--------------+-----------+-------------+
| object_schema | object_name        | allocated | data      | pages | pages_hashed | pages_old | rows_cached |
+---------------+--------------------+-----------+-----------+-------+--------------+-----------+-------------+
| test          | t                  | 63.61 MiB | 58.06 MiB |  4071 |         4071 |      4071 |     2101222 |
| InnoDB System | SYS_FOREIGN        | 32.00 KiB | 0 bytes   |     2 |            2 |         2 |           0 |
| InnoDB System | SYS_TABLES         | 32.00 KiB | 1.11 KiB  |     2 |            2 |         2 |          10 |
| InnoDB System | SYS_COLUMNS        | 16.00 KiB | 4.68 KiB  |     1 |            1 |         1 |          71 |
| InnoDB System | SYS_DATAFILES      | 16.00 KiB | 324 bytes |     1 |            1 |         1 |           6 |
| InnoDB System | SYS_FIELDS         | 16.00 KiB | 722 bytes |     1 |            1 |         1 |          17 |
| InnoDB System | SYS_INDEXES        | 16.00 KiB | 836 bytes |     1 |            1 |         1 |          12 |
| InnoDB System | SYS_TABLESPACES    | 16.00 KiB | 318 bytes |     1 |            1 |         1 |           6 |
| mysql         | innodb_index_stats | 16.00 KiB | 274 bytes |     1 |            1 |         1 |           3 |
| mysql         | innodb_table_stats | 16.00 KiB | 53 bytes  |     1 |            1 |         1 |           1 |
+---------------+--------------------+-----------+-----------+-------+--------------+-----------+-------------+

Pretty easy and useful, right? You can also get what is the database using more memory in the buffer pool querying innodb_buffer_stats_by_schema.

For each table there is another similar ones that it’s name starts with x$. For example, you have user_summary_by_file_io and x$user_summary_by_file_io. The difference is that the first table has a human readable output values while the second one has the real values. Let’s see an example:

mysql> select * from sys.user_summary_by_file_io;
+------------+-------+------------+
| user       | ios   | io_latency |
+------------+-------+------------+
| root       | 19514 | 2.87 s     |
| background |  5916 | 1.91 s     |
+------------+-------+------------+
2 rows in set (0.00 sec)
mysql> select * from sys.x$user_summary_by_file_io;
+------------+-------+---------------+
| user       | ios   | io_latency    |
+------------+-------+---------------+
| root       | 19514 | 2871847094292 |
| background |  5916 | 1905079715132 |
+------------+-------+---------------+

For humans, at least for me, it is easier to read seconds rather than picoseconds :)

There are multiple tables with very descriptive names.

– io_by_thread_by_latency
– schema_unused_indexes
– statements_with_temp_tables
– statements_with_errors_or_warnings
– user_summary_by_statement_type
– waits_by_host_by_latency

There are lot more, and they are explained with examples in project’s README file.

Configuration

On the MySQL side nothing special is needed. Just enable performance_schema:

performance_schema="on"

sys schema also provides some procedures to enable/disable some features. For example:

– ps_setup_enable_background_threads
– ps_setup_enable_consumers
– ps_setup_enable_instrument
and so on…

We also have the same procedures with “disable”. After you have made the changes you can save them calling ps_setup_save() and reload it later on if you want calling ps_setup_reload_saved(). If you want to reset the configuration to default values just call ps_setup_reset_to_default().

For example, we can check that some consumers are disabled. It is possible to enable them and save the configuration:

mysql> CALL sys.ps_setup_show_disabled_consumers();
+--------------------------------+
| disabled_consumers             |
+--------------------------------+
| events_stages_current          |
| events_stages_history          |
| events_stages_history_long     |
| events_statements_history      |
| events_statements_history_long |
| events_waits_current           |
| events_waits_history           |
| events_waits_history_long      |
+--------------------------------+
mysql> CALL sys.ps_setup_enable_consumers('events');
+---------------------+
| summary             |
+---------------------+
| Enabled 8 consumers |
+---------------------+
mysql> CALL sys.ps_setup_show_disabled_consumers();
Empty set (0.00 sec)
mysql> CALL sys.ps_setup_save(TRUE);

Conclusion

Performance Schema is very useful when we want to know what is happening inside the database. It is getting more features with each new GA and will probably be the single point of information in near future. Now thanks to sys schema it is also easy to use.

The post Sys Schema for MySQL 5.6 and MySQL 5.7 appeared first on MySQL Performance Blog.

May
14
2014
--

max_allowed_packet and binary log corruption in MySQL

The combination of max_allowed_packet variable and replication in MySQL is a common source of headaches. In a nutshell, max_allowed_packet is the maximum size of a MySQL network protocol packet that the server can create or read. It has a default value of 1MB (<= 5.6.5) or 4MB (>= 5.6.6) and a maximum size of 1GB. This adds some constraints in our replication environment:

  • The master server shouldn’t write events to the binary log larger than max_allowed_packet
  • All the slaves in the replication chain should have the same max_allowed_packet as the master server

Sometimes, even following those two basic rules we can have problems.

For example, there are situations (also called bugs) where the master writes more data than the max_allowed_packet limit causing the slaves to stop working. In order to fix this Oracle created a new variable called slave_max_allowed_packet. This new configuration variable available from 5.1.64, 5.5.26 and 5.6.6 overrides the max_allowed_packet value for slave threads. Therefore, regardless of the max_allowed_packet value the slaves’ threads will have 1GB limit, the default value of slave_max_allowed_packet. Nice trick that works as expected.

Sometimes even with that workaround we can get the max_allowed_packet error in the slave servers. That means that there is a packet larger than 1GB, something that shouldn’t happen in a normal situation. Why? Usually it is caused by a binary log corruption. Let’s see the following example:

Slave stops working with the following message:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master'

The important part is “got fatal error 1236 from master”. The master cannot read the event it wrote to the binary log seconds ago. To check the problem we can:

  • Use mysqlbinlog to read the binary log from the position it failed with –start-position.

This is an example taken from our Percona Forums:

#121003 5:22:26 server id 1 end_log_pos 398528
# Unknown event
# at 398528
#960218 6:48:44 server id 1813111337 end_log_pos 1835008
# Unknown event
ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1953066613, event_type: 8
DELIMITER ;
# End of log file

Check the size of the event, 1953066613 bytes. Or the “Unknown event” messages. Something is clearly wrong there. Another usual thing to check is the server id that sometimes doesn’t correspond with the real value. In this example the person who posted the binary log event confirmed that the server id was wrong.

  • Check master’s error log.
[ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1953066613, event_type: 8

Again, the event is bigger than expected. There is no way the master and slave can read/write it, so the solution is to skip that event in the slave and rotate the logs on the master. Then, use pt-table-checksum to check data consistency.

MySQL 5.6 includes replication checksums to avoid problems with log corruptions. You can read more about it in Stephan’s blog post.

Conclusion

Errors on slave servers about max_allowed_packet can be caused by very different reasons. Although binary log corruption is not a common one, it is something worth checking when you have run out of ideas.

The post max_allowed_packet and binary log corruption in MySQL appeared first on MySQL Performance Blog.

Apr
01
2014
--

How to add an existing Percona XtraDB Cluster to Percona ClusterControl

In my last blog post I explained how to use Percona ClusterControl to create a new Percona XtraDB Cluster from scratch. That’s a good option when you want to create a testing environment in just some mouse clicks. In this case I’m going to show you how to add your existing cluster to Percona ClusterControl so you can manage and monitor it on the web interface.

The environment will be pretty similar, we will have UI, CMON and 3 XtraDB Cluster nodes. The cluster should be already running and Percona ClusterControl also installed.

Adding an existing Cluster

The ClusterControl web interface is empty, there are no clusters on it. To add an existing one we need to click on “Add existing Galera Cluster.” (Click on the image for an enlarged view).

 

 

Add Existing Cluster
A new form will be shown pretty similar to the one we saw last time when we were creating a new cluster. We can divide the form in two parts. First we need to give information about our Cluster. The info requires is the Linux distribution and version, IP of PXC nodes and MySQL root passwords. Pretty easy:

 

 

 

 

From
In the second part we have the SSH configuration. There is one pre-requisite, the UI server should be able to connect to all servers using a SSH key. Therefore, our first step is to create a SSH key pair in our UI server and copy the public one to all other servers.

 

 

It is also necessary to add the private key in the web interface. You can do it using the form shown after clicking on “Add Key Pair”:

Add key pair

Once the key is added, we can verify the access:

Access Check

As we can see here, everything works as expected and all servers are reachable by SSH. The parameter “Create shared SSH key” also needs to be enabled. That option will make ClusterControl to create a new SSH key pair on CMON node so this one can also connect to PXC nodes with passwordless SSH.

Now everything is prepared. We can proceed with the deployment. Just click on “Add cluster” and the installation process will start. While the installation is in progress you will see this notification:

 

 

Deployment notification
 

 

Clicking on it we can see the progress of the deployment:

Progress

After some minutes our PXC is shown in the Percona ClusterControl UI:

Cluster

Now we can monitor it, get alerts, clone, run backups and everything from the web interface. You can also add multiple clusters and create new ones.

The post How to add an existing Percona XtraDB Cluster to Percona ClusterControl appeared first on MySQL Performance Blog.

Jan
02
2014
--

How to recover table structure from .frm files with MySQL Utilities

Table structures are stored in .frm files and in the InnoDB Data Dictionary. Sometimes, usually in data recovery issues, we need to recover those structures to be able to find the lost data or just to recreate the tables.

There are different ways to do it and we’ve already written about it in this blog. For example, we can use the data recovery tools to recover table structures from InnoDB Dictionary or from the .frm files using a MySQL Server. This blog post will be an update of that last one. I will show you how to easily recover the structure from a .frm file and in some cases even without using a MySQL server. This will make the process faster and easily scriptable.

MySQL Utilities and mysqlfrm

MySQL Utilities are a set of scripts released by Oracle that helps us to do some usual DBA tasks in an easier way. It is written in Python and it’s only dependency is the Python Connector. From the large list of utilities we are going to use mysqlfrm, the tool that will help us to recover the structure.

As usual, an image worth a thousand words. Let’s recover some table structures:

This is the table we have:

CREATE TABLE `new_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `age` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name_idx` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

So, let’s try to recover that info from the .frm file and let’s see what we can get:

$ mysqlfrm --diagnostic /usr/local/mysql/data/test/new_table.frm
# WARNING: Cannot generate character set or collation names without the --server option.
[...]
CREATE TABLE `test`.`new_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `age` tinyint(4) NOT NULL,
PRIMARY KEY `PRIMARY` (`id`),
KEY `name_idx` (`name`)
) ENGINE=InnoDB;

Pretty good result :)

It is important to mention that this tool has two different ways to do the recovery.

  • First one is spawning a new MySQL instance and run there the structure recovery, pretty similar to the one PeterZ explained in his blog post. You would need to use –server or –basedir directory along with –port. It will shut the spawned instance down after the recovery is complete.
  • The second one used with –diagnostic reads the .frm file byte by byte to recover all the information possible but without the requirement of a MySQL instance. Therefore, this method can be used to recover all the information possible from damaged .frm files that even MySQL can’t read.

As we can see in the warning of the last example not all information can be recovered with second method. For example character set or collation can’t be recovered without the –server option (first method). Let’s see how to use a spawned server to recover the .frm info:

$ mysqlfrm --server=root@127.0.0.1 --port 3307 ./new_table.frm
# Source on 127.0.0.1: ... connected.
# Starting the spawned server on port 3307 ... done.
# Reading .frm files
#
# Reading the new_table.frm file.
#
# CREATE statement for ./new_table.frm:
#
CREATE TABLE `new_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `age` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name_idx` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The tool connects to MySQL server, get all the info it needs (basedir and so on) and it spawns a new instance on port 3307. Then it uses that new instance to recover the info from the .frm file. Fast and easy :)

It is worth to mention that not all the information we could need is stored in those .frm files. There are some things that we won’t be able to recover, for example FK constraints and AI number sequences.

Conclusion

MySQL Utilities is a very useful set of tools. In this particular case mysqlfrm can be used to recover a large list of table structures from their .frm files, making the process fast and easy to script.

The post How to recover table structure from .frm files with MySQL Utilities appeared first on MySQL Performance Blog.

Oct
01
2013
--

MySQL 5.7 multi-source replication

Recently Oracle announced several new features for the latest available development version of MySQL that is 5.7.2 at the time of writing this article. Most of them are performance and replication related that show us how incredible the new release will be.

In this post I’m going to try to explain in some easy steps how the new multi-source replication works and how we can configure it for our own tests. It is important to mention that this is a development release, so it is not production ready. Therefore this post is intend to people that want to test the new feature and see how it works with their application, always in a staging environment.

What is multi-source replication?

First, we need to have clear that multi-master and multi-source replication are not the same. Multi-Master replication is the usual circular replication where you can write on any server and data gets replicated to all others.

Multi master replication
Multi-source is different. MySQL replication had a limitation, fixed with this new release, that said that one slave could have only one master. That is a limiting factor when we are designing our replication environment. There were some “hacks” to make it work, but now there is an official way. So, in a nutshell, Multi-Source means that a slave can have more than one master. Now, replication environments like these one are possible:

Multi source replication
This will help us create some replication hierarchies that were impossible in the past. For example, you can have a slave in your office replicating the data from all servers that you have in the offices spread around the world.

How does it work?

Now we have the concept of communication channels. Each communication channel is a connection from slave to master to get the binary log events. That means we will have one IO_THREAD for each communication channel. We will need to run different “CHANGE MASTER” commands, one for each master, with the “FOR CHANNEL” argument that we will use to give a name to that channel.

CHANGE MASTER MASTER_HOST='something', MASTER_USER=... FOR CHANNEL="name_of_channel";

Pretty easy. There is one single pre-requisite. The slave should have been configured first with the crash-safe feature of MySQL 5.6. That means that info usually included in master.info or relay-log.info should be on a table. Let’s start with the configuration.

Show me an example!

First you need to download the lab version of mysql from this link.

We have a sandbox environment with 2 masters and 1 slave servers. I won’t go over the details of how to configure the server_id, binary logs or replication users. I assume they are well configured. If you need a howto, you can follow this one.

First, we have to enable the crash safe feature on the slave:

master_info_repository=TABLE;
relay_log_info_repository=TABLE;

After a restart of the slave we can start creating the channels with the names “master1″ and “master2″:

slave > change master to master_host="127.0.0.1", master_port=12047, master_user="msandbox",master_password="msandbox" for channel="master1";
slave > change master to master_host="127.0.0.1", master_port=12048, master_user="msandbox",master_password="msandbox" for channel="master2";

To start the slave processes you need to specify what channel are you referring to:

slave > start slave for channel="master1";
slave > start slave for channel="master2";

Now, we want to check the status of the slave:

slave > show slave status\G
Empty set (0.00 sec)

Oh, it is empty. We have to specify again which channel we want to check:

slave > SHOW SLAVE STATUS FOR CHANNEL="master1"\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: msandbox
                  Master_Port: 12047
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 232
               Relay_Log_File: squeeze-relay-bin-master1.000003
                Relay_Log_Pos: 395
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]

and we can also check that the IO_THREAD and SQL_THREADS are running:

slave > SHOW PROCESSLIST;
+----+-------------+-----------------------------------------------------------------------------+
| Id | User        | State                                                                       |
+----+-------------+-----------------------------------------------------------------------------+
|  2 | system user | Waiting for master to send event                                            |
|  3 | system user | Slave has read all relay log; waiting for the slave I/O thread to update it |
|  4 | system user | Waiting for master to send event                                            |
|  5 | system user | Slave has read all relay log; waiting for the slave I/O thread to update it |
+----+-------------+-----------------------------------------------------------------------------+

Let’s test it:

master1 > create database master1;
master2 > create database master2;
slave > show databases like 'master%';
+--------------------+
| Database (master%) |
+--------------------+
| master1            |
| master2            |
+--------------------+

It works, that easy!

Conclusion

The new multi-source feature allow us to create new replication environments that were impossible in the past without some complex “hacks”. Of course, your application should be designed and developed with this new architecture in mind. Like with multi-master, multi-source needs special care to not end up with your data messed up.

MySQL replication is getting better on every release giving us more configuration, performance and design possibilities. And all those new features can be combined. Your replication environment can be even better if you mix some of the new (and old) features added recently to the replication. For example, you can configure GTID or enable multi-threaded slave per schema or intra-database.

The post MySQL 5.7 multi-source replication appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com