Oct
16
2020
--

5 Things DBAs Should Know Before Deploying MongoDB

Before Deploying MongoDB

Before Deploying MongoDBMongoDB is one of the most popular databases and is one of the easiest NoSQL databases to set up for a DBA.  Oftentimes, relational DBAs will inherit MongoDB databases without knowing all there is to know about MongoDB.   I encourage you to check out our Percona blogs as we have lots of great information for those both new and experienced with MongoDB.  Don’t let the ease of installing MongoDB fool you; there are things you need to consider before deploying MongoDB.  Here are five things DBAs should know before deploying MongoDB in production.

1) Enable Authentication and Authorization

Security is of utmost importance to your database.  Gone are the days when security was disabled by default for MongoDB, but it’s still easy to start MongoDB without security.  Without security and with your database bound to a public IP, anyone can connect to your database and steal your data.  By simply adding some important MongoDB security configuration options to your configuration file, you can ensure that your data is protected.  You can also configure MongoDB to utilize native LDAP for authentication.  Setting up authentication and authorization is one of the simplest ways to ensure that your MongoDB database is secure.  The most important configuration option is turning on authorization which enables users and roles and requires you to authenticate and have the proper roles to access your data.

security:   
  authorization: enabled
  keyfile: /path/to/your.keyfile

 

2) Deploy Replica Sets at a Minimum

A great feature of MongoDB is how easy it is to set up replication.  Not only is it easy but it gives you built-in High Availability.  When your primary crashes or goes down for maintenance, MongoDB automatically fails over to one of your secondaries and your application is back online without any intervention. Replica Sets allow you to offload reads, as long as your application can tolerate eventual consistency.  Additionally, replica sets allow you to offload your backups to a secondary.  Another important feature to know about Replica Sets is that only one member of the Replica Set can accept client writes at a time; if you want multiple nodes to accept writes, then you should look into sharded clusters.  There are some tradeoffs with replica sets that you should also be aware of; when reading from a secondary, depending on your read preference, you may be reading stale data or data that hasn’t been acknowledged by all members of the replica set.

3) Backups

Backups are just as important with MongoDB as they were with any other database.  There are tools like mongodump, mongoexport, Percona Backup for MongoDB, and Ops Manager (Enterprise Edition only) that support Point In Time Recovery, Oplog backups, Hot Backups, and full and incremental Backups.  As mentioned, backups can be run from any node in your replica set.  The best practice is to run your backup from a secondary node so you don’t put unnecessary pressure on your primary node.   In addition to the above methods, you can also take snapshots of your data, and this is possible as long as you pause writes to the node that you’re snapshotting before freezing the file system to ensure a consistent snapshot of your MongoDB database.

4) Indexes Are Just as Important

Indexes are just as important for MongoDB as they are with relational databases.  Just like with relational databases, indexes can help speed up your queries by reducing the size of data that your query returns thus speeding up query performance.  Indexes also help your working set more easily fit into memory.  Pay attention to the query patterns and look for full collection scans in your logfiles to know where there are queries that could benefit from an index.  In addition to that, make sure that you follow the ESR rule when creating your indexes and examining your query patterns.  Just like relational databases, indexes aren’t free, as MongoDB has to update indexes every time there’s an insert, delete or update, so make sure your indexes are being used and aren’t unnecessarily slowing down your writes.

5) Whenever Possible, Working Set < RAM

As with any database, fitting your data into RAM will allow for faster reads than from disk.  MongoDB is no different.  Knowing how much data MongoDB has to read in for your queries can help you determine how much RAM you should allocate to your database.  For example, if your query requires a working set that is 50 GB and you only have 32 GB of RAM allocated to the Wired Tiger Cache, MongoDB is going to constantly read in more of the working set from disk and page it out to make room for the additional data, and this will lead to slower query performance and a system that is constantly using all of its available memory for its cache.  Conversely, if you have a 50 GB working set and 100 GB of RAM for your Wired Tiger cache, the working set will fit completely into memory and as long as other data doesn’t page it out of the cache, MongoDB should serve reads much faster as it will all be in memory.

Avoiding reads from disk is not always possible or practical.  To assess how many reads you’re doing from disk, you can use commands like db.serverStatus() or measure it with tools like Percona Monitoring and Management.  Be sure your filesystem uses the MongoDB recommended XFS file system and whenever possible, uses Solid-State Drives (SSD’s) to speed up disk performance.   Because of database workloads, you should also make sure to provision enough IOPS for your database servers to try and avoid disk bottlenecks as much as possible.   Multiple cored systems will also tend to work better as this allows faster checkpointing for the Wired Tiger storage engine, but you will still be reliant on going as fast as the disks can take you.

Takeaways:

While MongoDB is easy to get started with and has a lower barrier to entry, just like any other database there are some key things that you, as a DBA, should consider before deploying MongoDB.   We’ve covered enabling authentication and authorization to ensure you have a secure deployment.  We’ve also covered deploying replica sets to ensure high availability, backups to ensure your data stays safe, the importance of indexes to your query performance and having sufficient memory to cover your queries, and what to do if you don’t. We hope this helps you have a better idea of how to deploy MongoDB and to be able to support it better.  Thanks for reading!

Aug
13
2019
--

SET PERSIST in MySQL: A Small Thing for Setting System Variable Values

SET PERSIST in MySQL

To set correct system variable values is the essential step to get the correct server behavior against the workload. SET PERSIST in MySQL
In MySQL, we have many System variables that can be changed at runtime, and most of them can be set at the session or global level.

To change the value of a system variable at the global level in the past, users needed to have SUPER privileges. Once the system variable value is modified as global, the server will change this behavior for the session, and obviously as global scope.

For instance, one of the most commonly adjusted variables is probably max_connections.

If you have max_connection=100 in your my.cnf or as the default value, and during the day as DBA you notice that it is not enough, it is easy just to add new connections on the fly with the command:

SET GLOBAL MAX_CONNECTIONS=500;

This will do the work.

But here is the issue. We had changed a GLOBAL value, that applies to the whole server, but this change is ephemeral and if the server restarts, the setting is lost. In the past, I have seen millions of times servers with different configurations between my.cnf and Current Server settings. To prevent this, or at least keep it under control, good DBAs had developed scripts to check if and where the differences exist and fix them. The main issue is that very often, we forget to update the configuration file while doing the changes, or we do it on purpose to do “Fine-tuning first” and forgot afterward.

What’s new in MySQL8 about that?

Well, we have a couple of small changes. First of all the privileges, as for MySQL8 users can have SYSTEM_VARIABLES_ADMIN or SUPER to modify the GLOBAL system variables. Also, the ability to have GLOBAL changes to variable to PERSIST on disk and finally, to know who did it and when.

The new option for SET command is PERSIST

So, if I have:

(root@localhost) [(none)]>show global variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1500  |
+-----------------+-------+

and I want to change the value of max_connection and be sure this value is reloaded at the restart, I will do this:

(root@localhost) [(none)]>set PERSIST max_connections=150;

(root@localhost) [(none)]>show global variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 150   |
+-----------------+-------+

With the usage of PERSIST, MySQL will write the information related to:

– key (variable name)
– value
– timestamp (including microseconds)
– user
– host

A new file in the data directory: mysqld-auto.cnf contains the information. The file is in Json format and will have the following:

{ "Version" : 1 , "mysql_server" : {
 "max_connections" : { 
"Value" : "150" , "Metadata" : {
 "Timestamp" : 1565363808318848 , "User" : "root" , "Host" : "localhost" 
} } } }

The information is also in Performance Schema:

select a.VARIABLE_NAME,b.VARIABLE_value ,SET_TIME,SET_USER,SET_HOST  
    from performance_schema.variables_info a 
        join performance_schema.global_variables b 
        on a.VARIABLE_NAME=b.VARIABLE_NAME  
    where b.VARIABLE_NAME like 'max_connections'\G

*************************** 1. row ***************************
 VARIABLE_NAME: max_connections
VARIABLE_value: 150
      SET_TIME: 2019-08-09 11:16:48.318989
      SET_USER: root
      SET_HOST: localhost

As you see, it reports who did the change, from where, when, and the value. Unfortunately, there is no history here, but this can be easily implemented.

To clear the PERSIST settings, run RESET PERSIST and all the Persistent setting will be removed.

If you have:

{ "Version" : 1 , "mysql_server" : {
  "max_connections" : { "Value" : "151" , "Metadata" : { "Timestamp" : 1565367524946371 , "User" : "root" , "Host" : "localhost" } } , 
  "wait_timeout" : { "Value" : "151" , "Metadata" : { "Timestamp" : 1565368154036275 , "User" : "root" , "Host" : "localhost" } } 
} }

RESET PERSIST will do:

{ "Version" : 1 , "mysql_server" : {  }

Which is removing ALL THE SETTINGS, not just one.

Anyhow, why is this a good thing to have?

First, because we have no excuse now when we change a variable, as we have all the tools needed to make sure we will have it up at startup if this is the intention of the change.

Second, it is good because storing the information in a file, and not only showing it from PS, allows us to include such information in any automation tool we have. This is in case we decide to take action or just to keep track of it, like comparison with my.cnf and fixing the discrepancies automatically also at service down or when cloning. On this let me say that WHILE you can change the value in the file mysqld-auto.cnf, have the server at restart use that value as the valid one.

This is not recommended, instead please fix my.cnf and remove the information related to PERSIST.

To touch that file is also dangerous because if you do stupid things like removing a double quote or in any way affecting the Json format, the server will not start, but there will be NO error in the log.

{ "Version" : 1 , "mysql_server" : { "wait_timeout": { "Value : "150" , "Metadata" : { "Timestamp" : 1565455891278414, "User" : "root" , "Host" : "localhost" } } } }
                                                           ^^^ missing double quote

tusa@tusa-dev:/opt/mysql_templates/mysql-8.0.17futex$ ps aux|grep 8113
tusa      8119  0.0  0.0  14224   896 pts/1    S+   12:54   0:00 grep --color=auto 8113
[1]+  Exit 1                  bin/mysqld --defaults-file=./my.cnf

I have opened a bug for this (https://bugs.mysql.com/bug.php?id=96501).

A short deep dive in the code (you can jump it if you don’t care)

The new feature is handled in the files <source>/sql/persisted_variable.(h/cc). The new structure dealing with the PERSIST actions is:

struct st_persist_var {
  std::string key;
  std::string value;
  ulonglong timestamp;
  std::string user;
  std::string host;
  bool is_null;
  st_persist_var();
  st_persist_var(THD *thd);
  st_persist_var(const std::string key, const std::string value,
                 const ulonglong timestamp, const std::string user,
                 const std::string host, const bool is_null);
};

And the main steps are in the constructors st_persist_var. It should be noted that when creating the timestamp, the code is generating a value that is NOT fully compatible with the MySQL functions FROM_UNIXTIME.

The code assigning the timestamp value pass/assign also passes the microseconds from the timeval (tv) structure:

st_persist_var::st_persist_var(THD *thd) {
  timeval tv = thd->query_start_timeval_trunc(DATETIME_MAX_DECIMALS);
  timestamp = tv.tv_sec * 1000000ULL + tv.tv_usec;
  user = thd->security_context()->user().str;
  host = thd->security_context()->host().str;
  is_null = false;
}

Where:

tv.tv_sec = 1565267482
    tv.tc_usec = 692276

will generate:
timestamp = 1565267482692276

This TIMESTAMP is not valid in MySQL and cannot be read from the time functions, while the segment related to tv.tv_sec = 1565267482 works perfectly.

(root@localhost) [(none)]>select FROM_UNIXTIME(1565267482);
+---------------------------+
| FROM_UNIXTIME(1565267482) |
+---------------------------+
| 2019-08-08 08:31:22       |
+---------------------------+

(root@localhost) [(none)]>select FROM_UNIXTIME(1565267482692276);
+---------------------------------+
| FROM_UNIXTIME(1565267482692276) |
+---------------------------------+
| NULL                            |
+---------------------------------+

This because the timestamp with microseconds is formatted differently in MySQL :
PERSIST_code = 1565267482692276
MySQL = 1565267482.692276

If I run: select FROM_UNIXTIME(1565267482.692276);

I get the right result:

+----------------------------------+
| FROM_UNIXTIME(1565267482.692276) |
+----------------------------------+
| 2019-08-08 08:31:22.692276       |
+----------------------------------+

And of course, I can use the trick:

select FROM_UNIXTIME(1565267482692276/1000000);
+-----------------------------------------+
| FROM_UNIXTIME(1565267482692276/1000000) |
+-----------------------------------------+
| 2019-08-08 08:31:22.6922                |
+-----------------------------------------+

Well, that’s all for the behind the scene info.  Keep this in mind if you want to deal with the value coming from the Json file.

SET PERSIST Conclusion

Sometimes the small things can be better than the HUGE shiny things. Many times I saw DBAs in trouble because they do not have this simple feature in MySQL, and many MySQL fails to start or doesn’t behave as expected. Given that, I welcome SET PERSIST and I am sure that the people who manage thousands of servers, with different workloads and automation in place, will see this as a good thing as well.

References:

https://dev.mysql.com/doc/refman/8.0/en/persisted-system-variables.html
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timestamp
https://lefred.be/content/where-does-my-mysql-configuration-variable-value-come-from/
https://lefred.be/content/what-configuration-settings-did-i-change-on-my-mysql-server/

Jan
22
2018
--

Webinar Wednesday, January 24, 2018: Differences between MariaDB and MySQL

MariaDB and MySQL

MariaDB and MySQLJoin Percona’s Chief Evangelist, Colin Charles as he presents Differences Between MariaDB and MySQL on Wednesday, January 24, 2018, at 7:00 am PDT / 10:00 am EDT (UTC-7).

Tags: MariaDB, MySQL, Percona Server for MySQL, DBA, SysAdmin, DevOps
Experience Level: Novice

MariaDB and MySQL. Are they syntactically similar? Where do these two query languages differ? Why would I use one over the other?

MariaDB is on the path of gradually diverging from MySQL. One obvious example is the internal data dictionary currently under development for MySQL 8.0. This is a major change to the way metadata is stored and used within the server. MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

There are also non-technical differences between MySQL and MariaDB, including:

  • Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL because they derive their work from the MySQL source code under the terms of that license.
  • Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people will prefer working with smaller companies, as traditionally it affords them more leverage as a customer.
  • Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB and MySQL and help answer some of the common questions our Database Performance Experts get about the two databases.

Register for the webinar now.

Colin CharlesColin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team for MariaDB Server in 2009, worked in MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, Colin worked actively on the Fedora and OpenOffice.org projects. He’s well-known within many open source communities and speaks on the conference circuit.

Jan
16
2018
--

Webinar January 18, 2018: MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) Part 2

Percona Monitoring and Management

Percona Monitoring and ManagementJoin Percona’s Product Manager Michael Coburn as he presents MySQL Troubleshooting and Performance Optimization with Percona Monitoring and Management (PMM) Part 2 on Thursday, January 18, 2018, at 11:00 am PST / 2:00 pm EST (UTC-8).

Tags: Percona Monitoring and Management, PMM, Monitoring, MySQL, Performance, Optimization, DBA, SysAdmin, DevOps
Experience Level: Expert

Optimizing MySQL performance and troubleshooting MySQL problems are two of the most critical and challenging tasks for MySQL DBAs. The databases powering your applications need to handle heavy traffic loads while remaining responsive and stable. This is so that you can deliver an excellent user experience. Furthermore, DBA’s are also expected to find cost-efficient means of solving these issues.

In this webinar — the second part of a two-part series — Michael discusses how you can optimize and troubleshoot MySQL performance and demonstrate how Percona Monitoring and Management (PMM) enables you to solve these challenges using free and open source software. We will look at specific, common MySQL problems and review the essential components in PMM that allow you to diagnose and resolve them.

By the end of this webinar, you will have a better understanding of how you can troubleshoot MySQL problems in your database.

Register for the webinar now.

Percona Monitoring and ManagementMichael Coburn, Product Manager

Michael joined Percona as a Consultant in 2012 and progressed through various roles including Managing Consultant, Principal Architect, Technical Account Manager, and Technical Support Engineer. He is now leading the Product Manager of Percona Monitoring and Management.

Sep
14
2016
--

MySQL Default Configuration Changes between 5.6 and 5.7

MySQL Default Configuration Changes

MySQL Default Configuration ChangesIn this blog post, we’ll discuss the MySQL default configuration changes between 5.6 and 5.7.

MySQL 5.7 has added a variety of new features that might excite you. However, there are also changes in the current variables that you might have overlooked. MySQL 5.7 updated nearly 40 of the defaults from 5.6. Some of the changes could severely impact your server performance, while others might go unnoticed. I’m going to go over each of the changes and what they mean.

The change that can have the largest impact on your server is likely

sync_binlog

. My colleague, Roel Van de Paar, wrote about this impact in depth in another blog post, so I won’t go in much detail.

Sync_binlog

 controls how MySQL flushes the binlog to disk. The new value of 1 forces MySQL to write every transaction to disk prior to committing. Previously, MySQL did not force flushing the binlog, and trusted the OS to decide when to flush the binlog.

(https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/)

Variables 5.6.29 5.7.11
sync_binlog 0 1

 

The performance schema variables stand out as unusual, as many have a default of -1. MySQL uses this notation to call out variables that are automatically adjusted. The only performance schema variable change that doesn’t adjust itself is  

performance_schema_max_file_classes

. This is the number of file instruments used for the performance schema. It’s unlikely you will ever need to alter it.

Variables 5.6.29 5.7.11
performance_schema_accounts_size 100 -1
performance_schema_hosts_size 100 -1
performance_schema_max_cond_instances 3504 -1
performance_schema_max_file_classes 50 80
performance_schema_max_file_instances 7693 -1
performance_schema_max_mutex_instances 15906 -1
performance_schema_max_rwlock_instances 9102 -1
performance_schema_max_socket_instances 322 -1
performance_schema_max_statement_classes 168 -1
performance_schema_max_table_handles 4000 -1
performance_schema_max_table_instances 12500 -1
performance_schema_max_thread_instances 402 -1
performance_schema_setup_actors_size 100 -1
performance_schema_setup_objects_size 100 -1
performance_schema_users_size 100 -1

 

The

optimizer_switch

, and

sql_mode

 variables have a variety of options that can each be enabled and cause a slightly different action to occur. MySQL 5.7 enables both variables for flags, increasing their sensitivity and security. These additions make the optimizer more efficient in determining how to correctly interpret your queries.

Three flags have been added to the

optimzer_switch

, all of which existed in MySQL 5.6 and were set as the default in MySQL 5.7 (with the intent to increase the optimizer’s efficiency):

duplicateweedout=on

,

condition_fanout_filter=on

, and

derived_merge=on

.

duplicateweedout

 is part of the optimizer’s semi join materialization strategy.

condition_fanout_filter

 controls the use of condition filtering, and

derived_merge controls

 the merging of derived tables, and views into the outer query block.

https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

http://www.chriscalender.com/tag/condition_fanout_filter/

The additions to SQL mode do not affect performance directly, however they will improve the way you write queries (which can increase performance). Some notable changes include requiring all fields in a select … group by statement must either be aggregated using a function like SUM, or be in the group by clause. MySQL will not assume they should be grouped, and will raise an error if a field is missing.

Strict_trans_tables

 causes a different effect depending on if it used with a transactional table.

Statements are rolled back on transaction tables if there is an invalid or missing value in a data change statement. For tables that do not use a transactional engine, MySQL’s behavior depends on the row in which the invalid data occurs. If it is the first row, then the behavior matches that of a transactional engine. If not, then the invalid value is converted to the closest valid value, or the default value for the columns. A warning is generated, but the data is still inserted.

Variables 5.6.29 5.7.11
optimizer_switch index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on,mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on, semijoin=on
loosescan=on, firstmatch=on
subquery_materialization_cost_based=on
use_index_extensions=on
index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on
mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on
semijoin=on
loosescan=on
firstmatch=on
duplicateweedout=on
subquery_materialization_cost_based=on
use_index_extensions=on
condition_fanout_filter=on
derived_merge=on
sql_mode NO_ENGINE_SUBSTITUTION ONLY_FULL_GROUP_BY
STRICT_TRANS_TABLES
NO_ZERO_IN_DATE
NO_ZERO_DATE
ERROR_FOR_DIVISION_BY_ZERO
NO_AUTO_CREATE_USER
NO_ENGINE_SUBSTITUTION

 

There have been a couple of variable changes surrounding the binlog. MySQL 5.7 updated the

binlog_error_action

 so that if there is an error while writing to the binlog, the server aborts. These kind of incidents are rare, but cause a big impact to your application and replication when they occurs, as the server will not perform any further transactions until corrected.

The binlog default format was changed to ROW, instead of the previously used statement format. Statement writes less data to the logs. However there are many statements that cannot be replicated correctly, including “update … order by rand()”. These non-deterministic statements could result in different resultsets on the master and slave. The change to Row format writes more data to  the binlog, but the information is more accurate and ensures correct replication.

MySQL has begun to focus on replication using GTID’s instead of the traditional binlog position. When MySQL is started, or restarted, it must generate a list of the previously used GTIDs. If

binlog_gtid_simple_recovery

 is OFF, or FALSE, then the server starts with the newest binlog and iterates backwards through the binlog files searching for a

previous_gtids_log_event

. With it set to ON, or TRUE, then the server only reviews the newest and oldest binlog files and computes the used gtids.

Binlog_gtid_simple_recovery

  makes it much faster to identify the binlogs, especially if there are a large number of binary logs without GTID events. However, in specific cases it could cause

gtid_executed

 and

gtid_purged

 to be populated incorrectly. This should only happen when the newest binarly log was generated by MySQL5.7.5 or older, or if a SET GTID_PURGED statement was run on MySQL earlier than version 5.7.7.

Another replication-based variable updated in 5.7 is 

slave_net_timeout

. It is lowered to only 60 seconds. Previously the replication thread would not consider it’s connection to the master broken until the problem existed for at least an hour. This change informs you much sooner if there is a connectivity problem, and ensures replication does not fall behind significantly before informing you of an issue.

Variables 5.6.29 5.7.11
binlog_error_action IGNORE_ERROR ABORT_SERVER
binlog_format STATEMENT ROW
binlog_gtid_simple_recovery OFF ON
slave_net_timeout 3600 60

 

InnoDB buffer pool changes impact how long starting and stopping the server takes.

innodb_buffer_pool_dump_at_shutdown

 and

innodb_buffer_pool_load_at_startup

 are used together to prevent you from having to “warm up” the server. As the names suggest, this causes a buffer pool dump at shutdown and load at startup. Even though you might have a buffer pool of 100’s of gigabytes, you will not need to reserve the same amount of space on disk, as the data written is much smaller. The only things written to disk for this is the information necessary to locate the actual data, the tablespace and page IDs.

Variables 5.6.29 5.7.11
innodb_buffer_pool_dump_at_shutdown OFF ON
innodb_buffer_pool_load_at_startup OFF ON

 

MySQL now made some of the options implemented in InnoDB during 5.6 and earlier into its defaults. InnoDB’s checksum algorithm was updated from innodb to crc32, allowing you to benefit from the hardware acceleration recent Intel CPU’s have available.

The Barracuda file format has been available since 5.5, but had many improvements in 5.6. It is now the default in 5.7. The Barracuda format allows you to use the compressed and dynamic row formats. My colleague Alexey has written about the utilization of the compressed format and the results he saw when optimizing a server: https://www.percona.com/blog/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/

The

innodb_large_prefix

 defaults to “on”, and when combined with the Barracuda file format allows for creating larger index key prefixes, up to 3072 bytes. This allows larger text fields to benefit from an index. If this is set to “off”, or the row format is not either dynamic or compressed, any index prefix larger than 767 bytes gets silently be truncated. MySQL has introduced larger InnoDB page sizes (32k and 64k) in 5.7.6.

MySQL 5.7 increased the

innodb_log_buffer_size

 value as well. InnoDB uses the log buffer to log transactions prior to writing them to disk in the binary log. The increased size allows the log to flush to the disk less often, reducing IO, and allows larger transactions to fit in the log without having to write to disk before committing.

MySQL 5.7 moved InnoDB’s purge operations to a background thread in order to reduce the thread contention in MySQL 5.5.The latest version increases the default to four purge threads, but can be changed to have anywhere from 1 to 32 threads.

MySQL 5.7 now enables 

innodb_strict_mode

 by default, turning some of the warnings into errors. Syntax errors in create table, alter table, create index, and optimize table statements generate errors and force the user to correct them prior to running. It also enables a record size check, ensuring that insert or update statements will not fail due to the record being too large for the selected page size.

Variables 5.6.29 5.7.11
innodb_checksum_algorithm innodb crc32
innodb_file_format Antelope Barracuda
innodb_file_format_max Antelope Barracuda
innodb_large_prefix OFF ON
innodb_log_buffer_size 8388608 16777216
innodb_purge_threads 1 4
innodb_strict_mode OFF ON

 

MySQL has increased the number of times the optimizer dives into the index when evaluating equality ranges. If the optimizer needs to dive into the index more than the

eq_range_index_dive_limit

 , defaulted to 200 in MySQL 5.7, then it uses the existing index statistics. You can adjust this limit from 0, eliminating index dives, to 4294967295. This can have a significant impact to query performance since the table statistics are based on the cardinality of a random sample. This can cause the optimizer to estimate a much larger set of rows to review than it would with the index dives, changing the method the optimizer chooses to execute the query.

MySQL 5.7 deprecated

log_warnings

. The new preference is 

utilize log_error_verbosity

. By default this is set to 3, and logs errors, warnings, and notes to the error log. You can alter this to 1 (log errors only) or 2 (log errors and warnings). When consulting the error log, verbosity is often a good thing. However this increases the IO and disk space needed for the error log.

Variables 5.6.29 5.7.11
eq_range_index_dive_limit 10 200
log_warnings 1 2

 

There are many changes to the defaults in 5.7. But many of these options have existed for a long time and should be familiar to users. Many people used these variables, and they are the best method to push MySQL forward. Remember, however, you can still edit these variables, and configure them to ensure that your server works it’s best for your data.

May
18
2016
--

Webinar Thursday May 19, 2016: MongoDB administration for MySQL DBA

MongoDB administration

MongoDB administrationPlease join Alexander Rubin, Percona Principal Consultant, for his webinar MongoDB administration for MySQL DBA on Thursday, May 19 at 10 am PDT (UTC-7).

If you are a MySQL DBA and want to learn MongoDB quickly – this webinar is for you. MySQL and MongoDB share similar concepts so it will not be hard to get up to speed with MongoDB.

In this talk I will explain the following MongoDB administration concepts:
  • Day to day operations for MongoDB
  • Storage engines and differences with MySQL storage engines
  • Databases, collections and documents
  • Replication in MongoDB and the difference with MySQL replication
  • Sharding in MongoDB
  • Backups in MongoDB

In the webinar, each slide will show a MySQL concept or operation (on the left) and the corresponding MongoDB one (on the right).

Register here.

MongoDB administrationAlexander Rubin, Principal Consultant

Alexander joined Percona in 2013. Alexander has worked with MySQL since 2000 as a DBA and Application Developer. Before joining Percona, he was a MySQL principal consultant for over seven years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

Oct
16
2015
--

MySQL 101 Videos – A Must Have Starter Kit for a MySQL DBA

The MySQL 101 track was introduced at the 2015 Percona Live MySQL Conference & Expo in Santa Clara, California. This track was tailored specifically for software developers and other specialists with technical backgrounds wanting to acquire the basic skills needed for a MySQL DBA role.101-Poster The Percona Live Conference Committee saw an increase in demand for people with MySQL DBA skills to be able to help with the day-to-day management of MySQL databases. In response to this demand, a MySQL crash course for developers, systems administrators and other technical resources was added to the official Percona Live conference program. This 2-day MySQL 101 track, led by Percona MySQL technical experts, took attendees through the fundamentals of MySQL DBA tools and techniques.

The goal of this educational course was to send attendees back ready to quickly and correctly take care of the day-to-day and long-term management of their MySQL environment.

“You send us developers and admins, and we’ll send you back MySQL DBAs!”
Completion of this course enabled the attendees to combine their daily non-DBA work with practical performance optimization techniques to resolve problems in environments not managed by a dedicated MySQL DBA.

Due to the overall success of the MySQL 101 track and the continued demand for this type of education, Percona has packaged these courses together in the official “Percona Live MySQL 101 Series,” now available for purchase.

You can buy all 24 sessions together at a bundled rate or purchase each video individually based on interest and needs. Don’t miss your opportunity to learn from some of the top MySQL experts.
Complete set of 24 Videos = $ 599.99 USD
Individual Videos = $29.99 USD

The post MySQL 101 Videos – A Must Have Starter Kit for a MySQL DBA appeared first on MySQL Performance Blog.

Dec
23
2014
--

File carving methods for the MySQL DBA

File carving methods for the MySQL DBAThis is a long overdue blog post from London’s 44con Cyber Security conference back in September. A lot of old memories were brought to the front as it were; the one I’m going to cover in this blog post is: file carving.

So what is file carving? despite the terminology it’s not going to be a full roast dinner; unless you have an appetite for data which as you’re here I’m assuming you have.

The TL;DR of “what is file carving” is taking a target blob of data (often a multi GB / TB file) and reducing it in to targeted pieces of data, this could be for instance grabbing all the jpeg images in a packet capture / mysqldump; or pulling that single table/schema out of a huge mysqldump with –all-databases (if you’re not using mydumper you really should it avoids issues like this!) aka “Sorting the wheat from the chaff”.

Let’s take for example at the time of writing this post I am looking to extract a single schema out of one such mysqldump –all-database file of around 2GB (2GB of course isn’t large however it’s large enough to give a practical example; the methods for larger files are of course the same). So where to start?

You’ll need the following tools installed:

  1. xxd (you can substitute xxd for od, hexer or any other hex editing / viewing tool you are comfortable with, just make sure it can handle very large files)
  2. grep

Let’s carve out the mysql schema

dbusby@kali:~$ xxd yourdumpfile.sql | grep 'mysql' -B5 | grep 'ASE' -A2 -B2
00003c0: 6e74 2044 6174 6162 6173 653a 2060 6d79 nt Database: my
00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql
.--..CREATE
00003e0: 4441 5441 4241 5345 202f 2a21 3332 3331 DATABASE /*!3231
00003f0: 3220 4946 204e 4f54 2045 5849 5354 532a 2 IF NOT EXISTS*
0000400: 2f20 606d 7973 716c 6020 2f2a 2134 3031 / mysql /*!40

Wonderful so we have some hex representation of the sql dumpfile why on earth do we want the hex? we need to define our offsets. In short our offsets are the position of the start and end of the chunk we intend to carve from the file.

From the above our start offset is 00003d9 at the start of CREATE DATABASE; for those unfamiliar with hexdump outputs I recommend looking at the tool hexer a vi like tool and pressing v to enter visual selection mode select a few characters and you’ll not something as follows “visual selection:  0x000003d9 – …”.

You can of course work out the range visually from the above, 00003d0 is the start of the line, each alphanumeric pair is a single byte the byte offset notation is hexedecimal 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f.

00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql.--..CREATE
00003d0 == s, 00003d1 == q, 00003d2 == l And so on, we can easily verify this using xxd
dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | head -n3
00003d9: 4352 4541 5445 2044 4154 4142 4153 4520 CREATE DATABASE
00003e9: 2f2a 2133 3233 3132 2049 4620 4e4f 5420 /*!32312 IF NOT
00003f9: 4558 4953 5453 2a2f 2060 6d79 7371 6c60 EXISTS*/
mysql

right so now we need the end offset, as above we establish a search pattern as the schema data we're carving is in the midst of a larger file we can look for the start of the dump for the next schema.


dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | grep '--' -A5 | grep C -A2 -B2 | less
...
0083b19: 2043 7572 7265 6e74 2044 6174 6162 6173 Current Databas
0083b29: 653a 2060 7065 7263 6f6e 6160 0a2d 2d0a e:
nextschema`.--.
...

I’ve piped into less here as there were many matches to the grep patterns.

From the above we can see a potential offset of 0x83b19 however we want to “backtrack” a few bytes to before the — comment start.


dbusby@kali:~$ xxd -s 0x83b14 yourdumpfile.sql | head -n1
0083b14: 2d2d 0a2d 2d20 4375 7272 656e 7420 4461 --.-- Current Da

Excellent we have our offsets starting at 0x3d9 ending at 0x83b14 we need to now convert base16 (hexidecimal) into base10 fortunatly we can do this usinc the bc utility very easily however we will need to fully expand and make upper case our offsets.


dbusby@kali:~$ echo 'ibase=16;00003D9' | bc
985
dbusby@kali:~$ echo 'ibase=16;0083B14' | bc
539412
dbusby@kali:~$ echo '539412-985' | bc
538427
dbusby@kali:~$ dd if=yourdumpfile.sql of=mysql.sql skip=985 bs=1 count=538427
538427+0 records in
538427+0 records out
538427 bytes (538 kB) copied, 1.08998 s, 494 kB/s

Let’s discuss this a little; what we have done here is convert our start offset to a base10 count of bytes to offset by when using dd (skip=985) we then convert the end offset to its base10 byte position, and by removing the startoffset base10 value this gives us the size of the chunk we are carving.

We now put this into a dd command line, and voila! we have a mysql.sql file which contains only the mysqldump data.

I hope this post helps somewhat to demystify file carving; the above techniques can be applied to any for of file carving need and is not limited only to mysql files.

The post File carving methods for the MySQL DBA appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com