Troubleshooting hardware resource usage webinar: Q & A

InnoDB locks and transaction isolation

Troubleshooting hardware resourceIn this blog, I provide answers to the Q & A for the Troubleshooting hardware resource usage webinar.

First, I want to thank everybody who attended the May 26 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: How did you find the memory IO LEAK?

A: Do you mean the replication bug I was talking about in the webinar? I wrote about this bug here. See also comments in the bug report itself.

Q: Do you have common formulas you use to tune MySQL?

A: There are best practices: relate thread concurrency to number of CPU cores you have, set InnoDB buffer pool size large enough so it can contain all your working dataset (which is not always possible), and do not set the Query Cache size larger than 512MB (or even better, turn it off) to avoid issues with global lock set when it needs to be de-fragmented. I prefer not to call them “formulas,” because all options need to be adjusted to match the workload. If this weren’t the case, MySQL Server would have an automatic configuration. There is also a separate webinar on configuration (Troubleshooting configuration issues) where I discuss these practices.

Q: Slide 11: is this real time? Can we get this info for a job that has already finished?

A: Yes, this is real time. No, it is not possible to get this info for a thread that does not exist.

Q: Slide 11: what do negative numbers mean?

A: Numbers are taken from the 


 field for table


 in Performance Schema. These values, in turn, are calculated as (memory allocated by thread) – (memory freed by thread). Negative numbers here mean either a memory leak or incorrect calculation of memory used by the thread. I reported this behavior in the MySQL Bugs database. Please subscribe to the bug report and wait to see how InnoDB and Performance Schema developers answer.

Q: Are TokuDB memory usage stats recorded in the 


  table also?  Do we have to set something to enable this collection? I ran the query, but it shows 0 for everything.

A: TokuDB currently does not support Performance Schema, thus its memory statistics are not instrumented. See the user manual on how memory instrumentation works.

Q: With disk what we will check for disk I/O?

A: I quite don’t understand the question. Are you asking on which disk we should check IO statistics? For datadir and other disks, look at the locations where MySQL stores data and log files (if you set custom locations).

Q: How can we put CPU in parallel to process multiple client requests? Put multiple requests In memory by parallel way. By defining transaction. Or there any query hints?

A: We cannot directly put CPU in parallel, but we can affect it indirectly by tuning InnoDB threads-related options (

innodb_threads_concurrency, innodb_read_io_threads, innodb_write_io_threads

) and using the thread pool.

Q: Is there any information the Performance Schema that is not found in the SYS schema?

A: Yes. For example, sys schema does not have a view for statistics about prepared statements, while Performance Schema does, because sys schema takes its statement statistics from digest tables (which make no sense for prepared statements).

Q: What is your favorite tool to investigate a memory issue with a task/job that has already finished?

A: I don’t know that there is such a tool suitable for use in production. In a test environment, you can use valgrind or similar tools. You can also make core dumps of the mysqld process and investigate them after the issue is gone.


Introduction to Troubleshooting Performance – Troubleshooting Slow Queries webinar: Q & A

Troubleshooting Slow Queries

Troubleshooting Slow QueriesIn this blog, I will provide answers to the Q & A for the Troubleshooting Slow Queries webinar.

First, I want to thank you for attending the April 28 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: I’ve heard that is a bad idea to use

select *

; what do you recommend?

A: When I used


 in my slides, I wanted to underline the idea that sometimes you need to select all columns from the table. There is nothing bad about it if you need them.


 is bad when you need only a few columns from the table. In this case, you retrieve more data than needed, which affects performance. Another issue that  


 can cause is if you hard-code the statement into your application, then change table definition; the application could start retrieving columns in wrong order and output (e.g., email instead of billing address). Or even worse, it will try to access a non-existent index in the result set array. The best practice is to explicitly enumerate all columns that your application needs.

Q: I heard that using 


 length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?

A: I assume you are asking about the ability to create an index with lengths smaller than the column length? They work as follows:

Assume you have a 


  field which contains these user questions:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using 

     length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?

  3. ….

Since this is a 


  field you cannot create and index on it without specifying its length, so you need to make the index as minimal as possible to uniquely identify questions. If you create an index with length 10 it will contain:

  1. I’ve heard
  2. I heard th

You will index only those parts of questions that are not very distinct from each other, and do not contain useful information about what the question is. You can create index of length 255:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using index_field length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as index

In this case, the index includes the whole first question and almost all the second question. This makes the index too large and requires us to use more disk space (which causes more IO). Also, information from the second question is probably too much.

If make index of length 75, we will have:

  1. I’ve heard that is a bad idea to use select * what do you recommend?
  2. I heard that using index_field length will affect the indexing principle du

This is more than enough for the first question and gives a good idea of what is in the second question. It also potentially will have enough unique entries to make its cardinality look more like the cardinality of real data distribution.

To conclude: choosing the correct index length is something that requires practice and analysis of your actual data. Try to make them as short as possible, but long enough so that the number of unique entries in the index will be similar to a number of unique entries in the table.

Q: Which view can we query to see stats?

A: Do you mean index statistics?

SHOW INDEX FROM table_name

 will do it.

Q: We have an InnoDB table with 47 fields (mostly text); some are ft-indexed. I tried to do an alter table, and it ran for 24 hours. What is the best way to run an alter table to add one extra field? The table has 1.9 M rows and 47 columns with many indexes.

A: Adding a column requires a table copy. Therefore, the speed of this operation depends on the table size and speed of your disk. If you are using version 5.6 and later, adding a column would not block parallel queries (and therefore is not a big deal). If you are using an older version, you can always use the pt-online-schema-change utility from Percona Toolkit. However, it will run even more slowly than the regular


. Unfortunately, you cannot speed up the execution of


 much. The only thing that you can do is to use a faster disk (with options, tuned to explore speed of the disk).

However, if you do not want to have this increased IO affect the production server, you can alter the table on the separate instance, then copy tablespace to production and then apply all changes to the original table from the binary logs. The steps will be something like:

  1. Ensure you use option

      and the big table has individual tablespace

  2. Ensure that binary log is enabled
  3. Start a new server (you can also use an existent stand-by slave).
  4. Disable writes to the table
  5. Record the binary log position
  6. Copy the tablespace to the new server as described here.
  7. Enable writes on the production server
  8. Run

     on the new server you created in step 2 (it will still take 24 hours)

  9. Stop writes to the table on the production server
  10. Copy the tablespace, altered in step 7
  11. Apply all writes to this table, which are in the binary logs after position, recorded in step 4.
  12. Enable writes to the table

This scenario will take even more time overall, but will have minimal impact on the production server

Q: If there is a compound index like index1(emp_id,date), will the following query be able to use index? “select * from table1 where emp_id = 10”

A: Yes. At least it should.

Q: Are 




 in extended info for explain not good?

A: Regarding


: it depends. For example, you will always have the word


” for tables which perform 


 and cannot use an index for


. This is not always bad. For example, in this query:

mysql> explain select emp_no, first_name from employees where emp_no <20000 order by first_nameG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 18722
     filtered: 100.00
        Extra: Using where; Using filesort
1 row in set, 1 warning (0,01 sec)

the primary key used to resolve rows and


 were necessary and not avoidable. You can read about different


 algorithms here.


Using temporary

: this means what during query execution temporary table will be created. This is can be not good, especially if the temporary table is large and cannot fit into memory. In this case, it would be written to disk and slow down operations. But, again, sometimes creating temporary tables in not avoidable, for example, if you have both




 clauses which list columns differently as stated in the user manual.

Q: Is


 length more of a good thing for query execution?



 field is not


 for all queries that use and index, and just shows the length of the key part used. It is not good or bad, it is just for information. You can use this information, for example, to identify which part of combined index is used to resolve the query.

Q: Does an alter query go for an optimizer check?

A: No. You can check it either by enabling optimizer trace, running


 and find what trace is empty. Or by enabling the debug option and searching the resulting trace for



Q: A query involves four columns that are all individually covered by an index. The optimizer didn’t merge indexes because of cost, and even didn’t choose the composite index I created.

A: This depends on the table definition and query you used. I cannot provide a more detailed answer based only on this information.

Q cont.: Finally, only certain composite indexes were suitable, the column order in the complex index mattered a lot. Why couldn’t the optimizer merge the four individual single column indexes, and why did the order of the columns in the composite index matter?

A: Regarding why the optimizer could not merge four indexes, I need to see how the table is defined and which data is in these indexed columns. Regarding why the order of the columns in the composite index matters, it depends on the query. Why the optimizer can use an index, say, on

(col1, col2)

 where the conditions

col1=X AND col2=Y


col2=Y AND col2=X

 for the case when you use


, the order is important. For example, for the condition

col1=X OR col2=Y

, where the part


 is always executed and the part


  is executed only when


 is false. The same logic applies to queries like

SELECT col1 WHERE col2=Y ORDER BY col3

. See the user manual for details.

Q: When I try to obtain the optimizer trace on the console, the result is cut off. Even if I redirect the output to a file, how to overcome that?

A: Which version of MySQL Server do you use? The 


 column is defined as

longtext NOT NULL

, and should not cause such issues. If it does with a newer version, report a bug at http://bugs.mysql.com/.

Q: Are there any free graphical visualizers for either EXPLAIN or the optimizer TRACE output?

A: There is graphical visualizer for


 in MySQL Workbench. But it works with online data only: you cannot run it on


 output, saved into a file. I don’t know about any visualizer for the optimizer


 output. However, since it is


 you can simply save it to file and open in web browser. It will allow a better view than if opened in simple text editor.

Q: When do you use force index instead of

use index


A: According to user manual “

USE INDEX (index_list)

 hint tells MySQL to use only one of the named indexes to find rows in the table” and “


  hint acts like

USE INDEX (index_list)

, with the addition that a table scan is assumed to be very expensive . . . a table scan is used only if there is no way to use one of the named indexes to find rows in the table.” This means that when you use


, you are giving a hint for the optimizer to prefer a particular index to others, but not enforcing index usage if the optimizer prefers a table scan, while


 requires using the index. I myself use only




  index hints.

Q: Very informative session. I missed the beginning part. Are you going to distribute the recoded session later?

A: Yes. As usual slides and recording available here.


Database Performance Webinar: Tired of MySQL Making You Wait?


database performance

Too often developers and DBAs struggle to pinpoint the root cause of MySQL database performance issues, and then spend too much time in trying to fix them. Wouldn’t it be great to bypass wasted guesswork and get right to the issue?

In our upcoming webinar Tired of MySQL Making You Wait? we’re going to help you discover how to significantly increase the performance of your applications and reduce database response time.

In this webinar, Principal Architect Alexander Rubin and Database Evangelist Janis Griffin will provide the key steps needed to identify, prioritize, and improve query performance.

They will discuss the following topics:

  • Wait time analytics using Performance / Information schemas
  • Monitoring for performance using DPA
  • Explaining plan operations focusing on temporary tables and filesort
  • Using indexes to optimize your queries
  • Using loose and tight index scans in MySQL


Thursday, January 7, 2016 10:00am Pacific Standard Time (UTC – 8)


Alexander RubinPrincipal Consultant, Percona

Janis GriffinDatabase Evangelist, SolarWinds

Register now!

Percona is the only company that delivers enterprise-class software, support, consulting and managed services solutions for both MySQL and MongoDB® across traditional and cloud-based platforms that maximize application performance while streamlining database efficiencies.

Percona’s industry-recognized performance experts can maximize your database, server and application performance, lower infrastructure costs, and provide capacity and scalability planning for future growth.


Webinar: Introduction to MySQL SYS Schema follow up questions

Thanks to all who attended my webinar Introduction to MySQL SYS Schema. This blog is for me to address the extra questions I didn’t have time to answer on the stream.

Can i have the performance_schema enabled in 5.6 and then install the sys schema? Or they are one and the same?

You need to have enabled the performance_schema in order to use it through the sys schema. They are different entities. In general, performance_schema collects and stores the data, and sys schema reads and presents the data.

The installation of sys schema on primary database will be replicated to the slaves?

By default, no. If you wish that the Sys Schema replicates to the slaves, you can modify the before_setup.sql (https://github.com/MarkLeith/mysql-sys/blob/master/before_setup.sql#L18) to skip the “SET sql_log_bin = 0;

Can MySQL save the slow running query in any table?

Yes it does: https://dev.mysql.com/doc/refman/5.6/en/log-destinations.html

How to see the query execution date & time from events_statements_current/history views in performance_schema?

You can check the performance_schema.events_statements_summary_by_digest table, that have the fields FIRST_SEEN and LAST_SEEN which are both a datetime values.

When the Sys Schema views show certain stats for the queries, is there a execution time range for queries under evaluation or is it like all the queries executed until date?

It’s all the queries executed until date, except when using some of the stored procedures that receive as a parameter a run time value, like “diagnostics” or “ps_trace_statement_digest”

I want to write the automated script to rebuild table or index. How to determine which table(s) or index(es) need to be rebuilt because of high fragmentation ratio?

For this you need to use something completely different. To know the fragmentation inside an InnoDB table i’ll recommend you to use XtraBackup with the –stats parameter https://www.percona.com/doc/percona-xtrabackup/2.2/xtrabackup_bin/analyzing_table_statistics.html

Downside to using? Overhead?

The overhead is the one that comes with using Performance Schema. I like the perspective of this presentation (https://www.percona.com/live/mysql-conference-2014/sessions/performanceschema-worth-overhead): Overhead is dynamic. Do  not  rely  on  other  people’s  benchmarks. Benchmark your application and find out what your overhead is.

What is the performance cost with regards to memory and io when using sys schema? Are there any tweaks or server variables with help the sys schema performing better?

Use only the instrumentation needed. This blog post have extensive info about the topic http://marcalff.blogspot.com.co/2011/06/performance-schema-overhead-tuning.html

For replicate how does sys schema record data?

It doesn’t until MySQL 5.7.2 where the Performance Schema provides tables that expose replication information: https://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication-tables.html and talking about the sys schema, currently the only place where you can find info about replication is in the “diagnostics” procedure, but as you can imagine, it only get’s data if the MySQL version is 5.7.2 or higher.

Is sys schema built into any o the Percona releases?

At the moment, no.

Is it possible to use SYS schema in Galera 3 nodes cluster?

Yes, since the only requirement is to have performance_schema, which is also available on PXC / Galera Cluster

Can you create trending off information pulled from the Sys Schema? Full table scans over time, latency over time, that kind of thing?

Yes, you can use procedures like, for example, “diagnostics

How do I reset the performance data to start collecting from scratch?

By calling the ps_truncate_all_tables procedure. Truncate a performance_schema table equals to “clear collected events”. TRUNCATE TABLE can also be used with summary tables, but the effect is to reset the summary columns to 0 or NULL, not to remove rows.

Can we install SYS schema before 5.6?

You can use the ps_helper on 5.5 https://github.com/MarkLeith/dbahelper

Does sys support performance_schema from 5.0?

Unfortunately, MySQL 5.0 doesn’t have performance_schema. P_S is available since MySQL 5.5.3

If you install the sys schema on one node of a Galera cluster will all the nodes get the Sys schema? Also, is the Sys schema cluster aware or does it only track the local node?

For PXC 5.6 with Galera 3, the answer is: yes, it will be replicated to all the nodes. And the performance schema will always only collect data of the local node.

The post Webinar: Introduction to MySQL SYS Schema follow up questions appeared first on MySQL Performance Blog.


Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A

Thank you for attending my July 22 webinar titled “Advanced Query Tuning in MySQL 5.6 and 5.7” (my slides and a replay available here). As promised here is the list of questions and my answers (thank you for your great questions).

Q: Here is the explain example:

mysql> explain extended select id, site_id from test_index_id where site_id=1
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_index_id
         type: ref
possible_keys: key_site_id
          key: key_site_id
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index

why is site_id a covered index for the query, given the fact that a) we are selecting “id”, b) key_site_id only contains site_id?

As the table is InnoDB, all secondary keys will always contain primary key (“id”); in this case the secondary index will contain all needed information to satisfy the above query and key_site_id will be “covered index”

Q: Applications change over time. Do you suggest doing a periodic analysis of indexes that are being used and drop the ones that are not? If yes, any suggestions as to tackle that?

Yes, that is a good idea. Usually it can be done easily with Percona toolkit or Performance_schema in MySQL 5.6

  1. Enable slow query log and log every query, then use Pt-index-usage tool
  2. Or use the following query (as suggested by FromDual blog post):
SELECT object_schema, object_name, index_name
  FROM performance_schema.table_io_waits_summary_by_index_usage
 WHERE index_name IS NOT NULL
   AND count_star = 0
 ORDER BY object_schema, object_name;

Q: Does the duplicate index is found on 5.6/5.7 will that causes an performance impact to the db while querying?

Duplicate keys can have negative impact on selects:

  1. MySQL can get confused and choose a wrong index
  2. Total index size can grow, which can cause MySQL to run out of RAM

Q: What is the suggested method to measure performance on queries (other than the slow query log) so as to know where to create indexes?

Slow query log is most common method. In MySQL 5.6 you can also use Performance Schema and use events_statements_summary_by_digest table.

Q: I’m not sure if this was covered in the webinar but… are there any best-practices for fulltext indexes?

That was not covered in this webinar, however, I’ve done a number of presentations regarding Full Text Indexes. For example: Creating Geo Enabled Applications with MySQL 5.6

Q: What would be the limit on index size or number of indexes you can defined per table?

There are no limits on Index size on disk, however, it will be good (performance wise) to have active indexes fit in RAM.

In InnoDB there are a number of index limitations, i.e. a table can contain a maximum of 64 secondary indexes.

Q:  If a table has two columns you would like to sum, can you have that sum indexed as a calculated index? To add to that, can that calculated index have “case when”?

Just to clarify, this is only a feature of MySQL 5.7 (not released yet).

Yes, it is documented now:

CREATE TABLE triangle (
  sidea DOUBLE,
  sideb DOUBLE,
  sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb))

Q: I have noticed that you created indexes on columns like DayOfTheWeek with very low cardinality. Shouldn’t that be a bad practice normally?

Yes, you are right! Unless, you are doing queries like “select count(*) from … where DayOfTheWeek = 7” those indexes may not be very useful.

Q: I saw an article that if you don’t specify a primary key upfront mysql / innodb creates one in the background (hidden). Is it different from a primary key itself, if most of the where fields that are used not in the primary / semi primary key? And is there a way to identify the tables with the hidden primary key indexes?

The “hidden” primary key will be 6 bytes, which will also be appended (duplicated) to all secondary keys. You can create an INT primary key auto_increment, which will be smaller (if you do not plan to store more than 4 billion rows). In addition, you will not be able to use the hidden primary key in your queries.

The following query (against information_schema) can be used to find all tables without declared primary key (with “hidden” primary key):

SELECT tables.table_schema, tables.table_name, tables.table_rows
FROM information_schema.tables
  SELECT table_schema, table_name
  FROM information_schema.statistics
  GROUP BY table_schema, table_name, index_name
      CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END
    ) = COUNT(*)
) puks
ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name
WHERE puks.table_name IS NULL
AND tables.table_type = 'BASE TABLE' AND engine='InnoDB'

You may also use mysql.innodb_index_stats table to find rows with the hidden primary key:


mysql> select * from mysql.innodb_index_stats;
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_diff_pfx01 | 96         | 1           | DB_ROW_ID                         |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_leaf_pages | 1          | NULL        | Number of leaf pages in the index |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | size         | 1          | NULL        | Number of pages in the index      |

Q: You are using the alter table to create index, but how does mysql sort the data for creating the index? isn’t it uses temp table for that?

That is a very good question: the behavior of the “alter table … add index” has changed over time. As documented in Overview of Online DDL:

Historically, many DDL operations on InnoDB tables were expensive. Many ALTER TABLE operations worked by creating a new, empty table defined with the requested table options and indexes, then copying the existing rows to the new table one-by-one, updating the indexes as the rows were inserted. After all rows from the original table were copied, the old table was dropped and the copy was renamed with the name of the original table.

MySQL 5.5, and MySQL 5.1 with the InnoDB Plugin, optimized CREATE INDEX and DROP INDEX to avoid the table-copying behavior. That feature was known as Fast Index Creation

When MySQL uses “Fast Index Creation” operation it will create a set of temporary files in MySQL’s tmpdir:

To add a secondary index to an existing table, InnoDB scans the table, and sorts the rows using memory buffers and temporary files in order by the values of the secondary index key columns. The B-tree is then built in key-value order, which is more efficient than inserting rows into an index in random order.

Q: How good is InnoDB deadlocks on 5.7 comparing to 5.6 version. Is that based on parameters setup?

InnoDB deadlocks discussion is outside of the scope of this presentation. Valerii Kravchuk and Nilnandan Joshi did an excellent talk at Percona Live 2015 (slides available): Understanding Innodb Locks and Deadlocks

Q: What is the performance impact of generating a virtual column for a table having 66 Million records and generating the index. And how would you go about it? Do you have any suggestions on how to re organize indexes on the physical disk?

As MySQL 5.7 is not released yet, behavior of the virtual columns may change.  The main question here is: will it be online operations to a) add a virtual column (as this is only metadata change it should be very light operation anyway). b) add index on that virtual column. In the labs released it was not online, however this can change.

Thank you again for attending.

The post Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A appeared first on Percona Data Performance Blog.


The Q&A: Creating best-in-class backup solutions for your MySQL environment

Percona MySQL and MongoDB WebinarsThank you for attending my July 15 webinar, “Creating Best in Class Backup solutions for your MySQL environment.” Due to the amount of content we discussed and some minor technical difficulties faced near the end of webinar we have decided to cover the final two slides of the presentation along with the questions asked by attendees during the webinar via this blog post.

The slides are available for download. And you can watch the webinar in it’s entirety here.

The final two slides were about our tips for having a good backup and recovery strategy. Lets see the bullet points along with what would have been their explanation during the webinar :

  • Use the three types of backups
    • Binary for full restores, new slaves
      • Binary backups are easy to restore, plus takes the least amount of time to restore. The mean time to recover is mostly bound by the time to transfer backup to the appropriate target server,
    • Logical for partial restores
      • Logical backups, especially when done table-wise come in handy when you’re wanting to restore one or few smaller tables,
    • Binlog for point in time recovery
      • Very often the need is to have Point In Time Recovery, with a Full backup of any type (Logical or Binary) its half the story, we still need the DML statements processed on the server in order to bring it to the latest state, thats where Binary logs (Binlog) backups come into picture.
  • Store on more than one server and off-site
    •  Store your backups on more than one location, what if the backup server goes down ? Considering offsite storages like Amazon S3 and Glacier with weekly or monthly backups retention can be cheaper options.
  • Test your backups!!!!
    • Testing your backups is very important, its always great to know backups are recoverable and not corrupted. Spin off an EC2 instance if you want, copy and restore the backup there, roll-forward a days worth of binlogs just to be sure.
  • Document restore procedures, script them and test them!!!
    • Also when you test your backups, make sure to document the steps to restore the backup to avoid last minute hassle over which commands to use.
  • If taking from a slave run pt-table-checksum
    • Backups are mostly taken from slaves, as such make sure to checksum them regularly, you dont wanna backup inconsistent data. 
  • Configuration files, scripts
    • Data is not the only thing you should be backing up, backup your config files, scripts and user access at a secure location.
  • Do you need to backup everything all days?
    • For very large instances doing a logical backup is a toughie, in such cases evaluate your backup needs, do you want to backup all the tables ? Most of the time smaller tables are the more important ones, and needs partial restore, backup only those.
  • Hardlinking backups can save lot of disk space in some circumstances
    • There are schemas which contains only a few high activity tables, rest of them are probably updated once a week or are updated by an archiver job that runs montly, make sure to hardlink the files with the previous backup, it can save good amount of space in such scenarios.
  • Monitor your Backups
    • Lastly, monitor your backups. You do not want to realize that you’re backup had been failing the whole time. Even a simple email notification from your backup scripts can help reduce the chance of failure.

Now lets try to answer some of the questions asked during the webinar :

Q : –use-memory=2G, is that pretty standard, if we have more more, should we have a higher value?
Usually we would evaluate the value based on size of xtrabackup_logfile (amount of transactions to apply). If you have more free memory feel free to provide it to –use-memory, you dont want to let the memory be a bottleneck in the restore process.

Q : which is the best backup option for a 8Tb DB?
Usually it would depend on what type of data would you have and business requirements for the backups. For eg: a full xtrabackup and later incrementals on the weekdays would be a good idea. Time required for backups play an important role here, backing up to a slow NAS share can be time consuming, and it will make xtrabackup record lot of transactions which will further increase your restore time. Also look into backing up very important medium-small size tables via logical backups.

Q : I’m not sure if this will be covered, but if you have a 3 master-master-master cluster using haproxy, is it recommended to run the backup from the haproxy server or directly on a specific server? Would it be wise to have a 4th server which would be part of the cluster, but not read from to perform the backups?
I am assuming this a Galera cluster setup, in which case you can do backups locally on any of the node by using tools like percona xtrabackup, however the best solution would be spinning off a slave from one of the nodes and running backups there.

Q : With Mudumper, can we strem the data over SSH or netcat to another server? Or would one have to use something like NFS? I’ve used mysqldump and piped it over netcat before.. curious if we can do that with Mydumper ?
Mydumper is similar in nature with other mysql client tools. They can be run remotely (–host option). Which means you can run mydumper from another server to backup from the master or slave. Mydumper can be piped for sure too.

Q : Is Mydumper still maintained. It hasn’t had a release since March of last year?
Indeed, Max Bubenick from Percona is currently maintaining the project. Actually he has added new features to the tool which  makes it more comprehensive and feature rich. He is planning the next release soon, stay tuned for the blog post.

Q : Is MyDumper an opensource ? prepare and restore are same ?
Absolutely. Right now we need to download the source and compile, however very soon we will have packages built for it too. Prepare and Restore are common terminologies used in the backup lingo, in the webinar, Restore means copying back the backup files from its storage location to the destination location, whereas Prepare means applying the transactions to the backup and making it ready to restore.

Q : Is binlog mirroring needed on Galera (PXC)?
It is good idea to keep binlog mirroring. Even though the IST and SST will do its job to join the node, the binlogs could play a role in case you wanted to rollforward a particular schema on a slave or QA instance.

Q : As we know that Percona XtraBackup takes Full & Incremental as well. Like that Does MyDumper helps in taking the incremental backup.?
At this moment we do not have the ability to take Incremental backups with mydumper or with any other logical backup tool. However, Weekly full backups (logical) and daily binlog backups can serve as the same strategy with other Incremental backup solutions, plus they are easy to restore :)

Q : Is it possible to encrypt the output file ? What will be Best methodology to back up data with the database size of 7 to 8 Gb and increses 25 % each day ? what is difference between innobackupex and mydumper ?
Indeed its possible to encrypt the backup files, as a matter of fact, we encrypt backup files with GPG keys before uploading to offsite storage. The best method to backup a 7 to 8G instance would be implementing all 3 types of backup we discussed in the webinar, your scenarios require planning for the future, so its always best to have different solutions available as the data grows. Innobackupex is part of the Percona-Xtrabackup toolkit and is a script which does binary backups of databases, MyDumper on the other hand is a logical backup tool which creates backups as text files.

Q : How can I optimize a MySQL dump of a large database? The main bottleneck while taking MySQL dump backup of a large database is if any table is found to be corrupted then it never goes beyond by skipping this corrupted tables temporary. Can we take database backup of large database without using locking mechanism i.e. Does someone know how to make the backup without locking the tables ? Is there any tools which would faster in restoration and backup technique or how come we use MySQL dump to optimize this kind of issue in future during crash recovery.
Mysqldump is logical backup tool, and as such it executes full table scans to backup the tables and write them down in the output file, hence its very difficult to improve performance of mysqldump (query-wise). Assuming that you’re referring the corruption to MyISAM tables, it is highly recommended you repair them before backing up, also to make sure mysqldump doesnt fail due to error on such a corrupt table try using –force option to mysqldump. If you’re using MyISAM tables first recommendation would be to switch to Innodb, with most of the tables innodb locking can be greatly reduced, actually till a point where the locking is negligible, look into –single-transaction. Faster backup recovery can be achieved with binary backups, look into using Percona Xtrabackup tool, we have comprehensive documentation to get you started.

Hope this was a good webinar and we have answered most of your questions. Stay tuned for more such webinars from Percona.

The post The Q&A: Creating best-in-class backup solutions for your MySQL environment appeared first on Percona Data Performance Blog.


How to create a rock-solid MySQL database backup & recovery strategy

Percona MySQL and MongoDB WebinarsHave you ever wondered what could happen if your MySQL database goes down?

Although it’s evident such a crash will cause downtime – and surely some business impact in terms of revenue – can you do something to reduce this impact?

The simple answer is “yes” by doing regular backups (of course) but are you 100% sure that your current backup strategy will really come through when an outage occurs? And how much precious time will pass (and how much revenue will be lost) before you get your business back online?

I usually think of backups as the step after HA fails. Let’s say we’re in M<>M replication and something occurs that kills the db but the HA can’t save the day. Let’s pretend that the UPS fails and those servers are completely out. You can’t failover; you have to restore data. Backups are a key piece of “Business Continuity.” Also factor in the frequent need to restore data that’s been altered by mistake. No ‘WHERE’ clause or DROP TABLE in prod instead of DEV. These instances are where backups are invaluable.

Let’s take some time and discuss the possible backup strategies with MySQL…  how to make backups efficiently and also examine the different tools that are available. We’ll cover these topics and more during my July 15  webinar: “Creating a Best-in-Class Backup and Recovery System for Your MySQL Environment” starting at 10 a.m. Pacific time.

On a related note, did you know that most online backups are possible with mysqldump and you can save some space on backups by using simple Linux tools? I’ll also cover this so be sure to join me next Wednesday. Oh, and it’s a free webinar, too!

Stay tuned!

The post How to create a rock-solid MySQL database backup & recovery strategy appeared first on MySQL Performance Blog.


Practical MySQL Performance Optimization (July 2 webinar)

Percona MySQL webinarsApplications often become impacted by MySQL performance. Optimization is the obvious solution but where to start? Join me on July 2 at 11 a.m. Pacific and I’ll show you how to optimize MySQL in a practical way – spending time on what gives you the best return. Many of you have heard me talk on this topic before and this updated presentation will get you up to speed on the latest areas to consider when improving MySQL performance.

I’ll be pointing out the most important things you need to get right for each of the following:

  • Hardware
  • MySQL Configuration
  • Schema and Queries
  • Application Architecture

I’ll also share some tools and techniques to focus on the queries most important for your application. At the end of this webinar, aptly titled “Practical MySQL Performance Optimization,” you will know how to optimize MySQL performance in a practical way to ensure our system is “good enough” in terms of responsiveness, efficiency and scalability.

I hope to see you there! Register now to reserve your spot.

The post Practical MySQL Performance Optimization (July 2 webinar) appeared first on MySQL Performance Blog.


Q&A: High availability when using MySQL in the cloud

Percona MySQL webinar followup: Q&ALast week I hosted a webinar on using MySQL in the cloud for High Availability (HA) alongside 451 Research analyst Jason Stamper. You can watch the recording and also download the slides (free) here. Just click the “Register” button at the end of that page.

We had several excellent questions and we didn’t have time to get to several of them in the allotted time. I’m posting them here along with the answers. Feel free to ask follow-up questions in the comments below.

Q: Can the TokuDB engine be used in a PXC environment?

A: No, TokuDB cannot currently be used in a PXC environment, the only supported engine in Percona XtraDB Cluster 5.6 is InnoDB.

Q: With Galera replication (PXC), is balancing the load on each node?

A: No, you need to implement your own load balancing and HA layer between your clients and the Percona XtraDB Cluster server.  Examples mentioned in the webinar include HAProxy and F5 BigIP.

Q: What’s the best version of Percona XtraDB Cluster regarding InnoDB performance?

A: In general for best performance you should be using the latest release of Percona XtraDB Cluster 5.6, which is currently 5.6.24, released on June 3rd, 2015.

Q: Can I redirect my writes in Percona XtraDB Cluster to multiple nodes using the HAProxy? While trying with SysBench I can see write-only goes to first nodes in PXC while reads does goes to multiple nodes.

A: Yes you can configure HAProxy to distribute both reads and writes across all of your nodes in a Percona XtraDB Cluster environment. Perhaps SysBench created only one database connection for all writes, and so haproxy kept those confined to only one host. You may want to experiment with parallel_prepare.lua.

Q: What’s the optimal HA for small datasets (db is less than 10gb)?

A: The optimal HA deployment for small datasets would be dependent on your level of recovery required (tolerance for loss of transactions) and time that you can be in an unavailable state (seconds, minutes, hours?).  Unfortunately there isn’t a single answer to your question, however, if you are interested in further discussion on this point Percona would be happy to coordinate a time to speak.  Please feel free to contact me directly and we can continue the conversation at michael.coburn@percona.com.

 Q: Is there a concept of local master vs. remote master with PXC?

A: No there is no concept of local vs remote master.  All nodes in a Percona XtraDB Cluster can now be classified as Master, regardless of their proximity to the clients.

Q: Are there any concerns when considering AWS RDS or AURORA DB for MySQL HA in the Cloud?

A: Regarding AWS RDS, yes this a good option for MySQL HA in the Cloud.  I unfortunately haven’t worked with Aurora DB that much yet so I don’t have an opinion on it’s suitability for HA in the Cloud.

Q: We tried out PXC awhile back and it used to lock everything whenever any ddl was done. Has that changed?

A: We would have to look at the specifics of your environment, however, there have been numerous improvements in the 1½ years of development since Percona XtraDB Cluster went Generally Available (GA) on January 30th, 2014 in version 5.6.15.

Q: Is using the arbitrator a must?

A: No the arbitrator role via the garbd daemon is generally only used when operating in a minimal environment of two nodes that contain the data and you need a third node for quorum but don’t want to store the data a third time.

Q: Can we do a cluster across different zones?

A: Yes you can. However be aware that the latency incurred for all cluster certification operations will be impacted by the round trip time between nodes.

Q: Does PXC also support the MyISAM database?

A: No, Percona XtraDB Cluster does not support any storage engine other than InnoDB as of PXC 5.6.

Q: How do load balancers affect the throughput in a Galera-based setup given that the write would be limited by the slowest node?

A: Load balancers will introduce some measure of additional latency in the form of CPU time in the load balancer layer as it evaluates its own ruleset, and also in network time due to additional hop via load balancer.  Otherwise there should be no perceptible difference in the write throughput of a Percona XtraDB Cluster with and without a load balancer as it relates to the “slowest node” factor.

Q: Have you used MaxScale yet? If so, what are your thoughts?

A: Unfortunately I haven’t used MaxScale however Yves Trudeau, Percona Principal Architect, has recently written about MaxScale in this blog post.

Q: How do you configure timeout and maintain persistent connection to HAProxy?

A: I would encourage you to refer to the HAProxy Documentation.

The post Q&A: High availability when using MySQL in the cloud appeared first on MySQL Performance Blog.


Considering Sharding with MySQL? Join my April 22 webinar. Questions welcome!

Sharding with MySQLMySQL sharding is one of the most used and surely the most abused MySQL scaling technology. My April 2 Dzone article, “To Shard, or Not to Shard,” proved there is indeed quite an interest in this topic.

As such, I’m hosting a live webinar tomorrow (April 22) that will shed light on questions about sharding with MySQL. It’s titled: To Shard or Not to Shard That is the Question!

I’ll be answering questions such as:

  • Is sharding right for your application or should you use other scaling technologies?
  • If you’re sharding, what things do you need to consider and which questions do you need to have answered?
  • What kind of specific technologies can assist you with sharding?

I hope you can make it for this April 22 webinar. It starts at 10 a.m. Pacific time. Please register now and bring your questions, as sharing them with me and the other attendees is half of the fun of live webinars. :)

Or if you prefer, share your questions about sharding with MySQL in the comments section below, and I’ll do my best to answer them. I’ll be writing a followup post that will include all questions and my answers soon. A recording of this webinar along with my slides will also be available here afterwards.

The post Considering Sharding with MySQL? Join my April 22 webinar. Questions welcome! appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com