Using Volume Snapshot/Clone in Kubernetes

Volume snapshot and clone Kubernetes

Volume snapshot and clone KubernetesOne of the most exciting storage-related features in Kubernetes is Volume snapshot and clone. It allows you to take a snapshot of data volume and later to clone into a new volume, which opens a variety of possibilities like instant backups or testing upgrades. This feature also brings Kubernetes deployments close to cloud providers, which allow you to get volume snapshots with one click.

Word of caution: for the database, it still might be required to apply fsfreeze and FLUSH TABLES WITH READ LOCK or



It is much easier in MySQL 8 now, because as with atomic DDL, MySQL 8 should provide crash-safe consistent snapshots without additional locking.

Let’s review how we can use this feature with Google Cloud Kubernetes Engine and Percona Kubernetes Operator for XtraDB Cluster.

First, the snapshot feature is still beta, so it is not available by default. You need to use GKE version 1.14 or later and you need to have the following enabled in your GKE: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver#enabling_on_a_new_cluster.

It is done by enabling “Compute Engine persistent disk CSI Driver“.

Now we need to create a Cluster using storageClassName: standard-rwo for PersistentVolumeClaims. So the relevant part in the resource definition looks like this:

        storageClassName: standard-rwo
        accessModes: [ "ReadWriteOnce" ]
            storage: 11Gi

Let’s assume we have cluster1 running:

NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          49m
cluster1-haproxy-1                                 2/2     Running   0          48m
cluster1-haproxy-2                                 2/2     Running   0          48m
cluster1-pxc-0                                     1/1     Running   0          50m
cluster1-pxc-1                                     1/1     Running   0          48m
cluster1-pxc-2                                     1/1     Running   0          47m
percona-xtradb-cluster-operator-79d786dcfb-btkw2   1/1     Running   0          5h34m

And we want to clone a cluster into a new cluster, provisioning with the same dataset. Of course, it can be done using backup into a new volume, but snapshot and clone allow for achieving this much easier. There are still some additional required steps, I will list them as a Cheat Sheet.

1. Create VolumeSnapshotClass (I am not sure why this one is not present by default)

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotClass
        name: onesc
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

2. Create snapshot

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
  name: snapshot-for-newcluster
  volumeSnapshotClassName: onesc
    persistentVolumeClaimName: datadir-cluster1-pxc-0

3. Clone into a new volume

Here I should note that we need to use the following as volume name convention used by Percona XtraDB Cluster Operator, it is:


Where CLUSTERNAME is the name used when we create clusters. So now we can clone snapshot into a volume:


Where newcluster is the name of the new cluster.

apiVersion: v1
kind: PersistentVolumeClaim
  name: datadir-newcluster-pxc-0
    name: snapshot-for-newcluster
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
    - ReadWriteOnce
      storage: 11Gi

Important: the volume spec in storageClassName and accessModes and storage size should match the original volume.

After volume claim created, now we can start newcluster, however, there is still a caveat; we need to use:

forceUnsafeBootstrap: true

Because otherwise, Percona XtraDB Cluster will think the data from the snapshot was not after clean shutdown (which is true) and will refuse to start.

There is still some limitation to this approach, which you may find inconvenient: the volume can be cloned in only the same namespace, so it can’t be easily transferred from the PRODUCTION namespace into the QA namespace.

Though it still can be done but will require some extra steps and admin Kubernetes privileges, I will show how in the following blog posts.


Annotations Provide Context to Timelines in Percona Monitoring and Management

Annotations Percona Monitoring and Management

About the AuthorThis blog was written as a collaboration with my colleague Jiri Ctvrtka.  Jiri is a senior software developer from Brno, Czech Republic, and has been partnering with Percona for almost a year working on various components of our Percona Platform. He’s been programming in Go since 2015 and Jiri’s got a passion for simplicity, speed, and precision of data and has focused that passion on understanding impacts of changes and reimagined the Percona Monitoring and Management (PMM) Annotations functionality.

“Does anyone remember what caused this large spike last Thursday at 2:24 am?”

Annotations Percona Monitoring and ManagementWhat are Annotations?

Annotations are a way to provide context to timelines in Percona Monitoring and Management (PMM). For example, in bigger teams, it is a good way to inform others about an event, or important changes that may have occurred. It can contain any kind of information but we see it most commonly used for indicating there was an outage, maintenance window start/end, deployment of new code, security event, etc.  Annotations in PMM help provide quick explanations to peaks and valleys in graphs or indicate when something took place on the timeline and create correlations.


An example of annotations automatically added at the beginning and end of a benchmark test.

Every annotation can be shown/hidden by a simple toggle button presented in the filter options. So you don’t need to be worried about crowded charts of annotations. You can toggle on/off and zoom in or out around events to get better detail.

Filter contains a toggle button to turn on/off the visibility of annotations.

How Can I Add Annotations?

Annotations can simply be added by the pmm-admin option: annotate. So, let’s try it:

pmm-admin annotate “Deployed web version 2.4.6“

That command will place an annotation on every chart in every node and service…maybe more than you need. What if you only needed to add an annotation for a specific node or service to indicate a broad network outage? Or add specific annotations for specific nodes or services to indicate a software update was applied? This is all possible as pmm-admin provides four flags, which can be used for just this purpose.

--node = annotate the node the pmm-admin command was run on
--node-name = annotate node with specified name
--service = annotate all services running on the node the pmm-admin command was run on
--service-name = annotate service with specified name

All these flags can be combined together for options to annotate more nodes or services by just only one command. The order of flags doesn’t matter. Just imagine how many combinations you have!  Even better, imagine how easily this can be integrated into our CI/CD or deploy pipelines!

You can also add annotations via the API with curl commands:

curl 'http://admin:admin@localhost/graph/api/annotations' -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"tags": ARRAY OF TAGS,"text":"TEXT"}'

ARRAY OF TAGS = Names of node or service, like for node pmm-server and service pmm-server-mysql it will be “tags: [“pmm-server”, “pmm-server-mysql”]”

Some Examples for Better Understanding

Case 1: We have a DB node we need to take offline for maintenance and want to capture this on all its graphs to explain the gap in reporting data.

pmm-admin annotate "`date` - System is going down for Maintenance - RFC-2232-2020" --node-name=”db1002”

via API:

curl 'http://admin:admin@localhost/graph/api/annotations' -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"tags": ["db1002"],"text":"`date` - System is going down for Maintenance - RFC-2232-2020"}'

or if pmm-admin is running on this node then you don’t even need to know the name of this node and can set it as part of the shutdown routine.

pmm-admin annotate "`date` - System is being shut down/rebooted"

Case 2: We have node pmm-server and three services running on the current node (mysql, postgres, mongo). So, yeah, it’s simple right?

pmm-admin annotate “`date` - Apply configuration change - RFC-1009-2020“ –service-name=mysql
pmm-admin annotate “Restarting Postgres to apply security patch - “ –service-name=postgres
pmm-admin annotate “`date` - Service Disruption Reported via Support - SUP-239“ –service-name=mongo

via API:

curl 'http://admin:admin@localhost/graph/api/annotations' -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"tags": ["mysql", "postgresl", "mongo"],"text":"`date` - Apply configuration change - RFC-1009-2020"}'

Or you can do it in one command:

pmm-admin annotate “Services Recycled to pickup config changes“ –service

And that’s it!  All services found running on that node will be annotated.

Case 3: We have node “registrationDb” and many services running on the current node. What if we want to annotate that node and also every service running on this node by just one command? Again, no problem:

pmm-admin annotate “`date` - Security alerted to possible event“ –node-name=registrationDb --service

via API:

curl 'http://admin:admin@localhost/graph/api/annotations' -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"tags": ["registrationDb", "service1", "service2",...*],"text":"`date` - Security alerted to possible event"}'

* while the PMM admin command –service flag will add to all services, you need to add all services names of the current node to get the same result using the API but you can make an API call to get all services on a given node

That’s it, no matter how many services you are running on node registrationDb the annotations will be presented on all of them and on node graphs as well.

Case 4: We have 100 services running on the current node and also another node named pmm-server2 and we need to annotate all 100 services on the current node (but not the current node) along with node pmm-server2.  Simple:

pmm-admin annotate “`date` - Load Test Start“ –node-name=pmm-server2 --service

via API:

curl 'http://admin:admin@localhost/graph/api/annotations' -H 'Content-Type: application/json;charset=UTF-8' --data-binary '{"tags": ["pmm-server2", "service1", "service2",...*],"text":"`date` - Load Test - Increasing to 100 threads"}'

* while the PMM admin command –service flag will add to all services, you need to add all services names of the current node to get the same result using the API but you can make an API call to get all services on a given node

The current node will not be annotated, but every service on this node will be along with node pmm-server2.

Here’s a little guide to see many of the possible combinations of flags and what will result:

–node = current node
–node-name = node with name
–node –node-name = current node and node with name
–node –service-name = current node and service with name
–node –node-name –service-name = current node, node with name, and service with name
–node –service = current node and all services of current node
–node-name –service = all services of current node, node with name
–node –node-name –service = all services of current node, node with name, and current node
–service = all services of current node
–service-name = service with name
–service –service-name = all services of current node, service with name
–service –node-name = all services of current node and node with name
–service-name –node-name = service with name and node with name
–service –service-name –node-name = service with name, all services on current node, and node with name

So thanks to annotations, correlating events on your servers is now easier than ever.  We’d love to hear or even see how you’re using annotations to make your life easier, hopefully, we’ve given you some ideas to get started right away!

Download Percona Monitoring and Management Today


Introducing the Percona Enterprise Platform

Percona Enterprise Platform

Percona Enterprise PlatformThere is an on-going shift within many organizations where the focus placed on application engineers is expanding while at the same time infrastructure teams within the DBA landscape are being trimmed back. The result is application engineers with little to no database experience and DBAs who are completely overwhelmed – both looking for resources to complete the work expected of them. As we know, helping consolidate this work and empowering those responsible is not only critical but often paramount in the success of an organization.

Introducing the Percona Enterprise Platform

Our vision is to provide a truly open source platform that won’t lock you in, with a single pane of glass to easily manage your open source database infrastructure, and a self-service experience enabling fast and consistent open source database deployment. 

Our goal is to deliver a platform which provides the enterprise benefits our customers are looking for, including:

  • A single interface to deploy and manage your open source databases on-prem, in the cloud, or across hybrid and multi-cloud environments.
  • The ability to configure a database once and deploy it anywhere.
  • Critical database management operations such as backup, recovery, and patching.
  • Enhanced automation and advisory services, which allow you to find, eliminate, and prevent outages, security issues, and slowdowns.
  • A viable alternative platform to public cloud and large enterprise database vendor DBaaS offerings, enabling you to eliminate vendor lock-in.

The Percona Enterprise Platform Beta Program is Kicking Off This Year!

The beta program will officially launch before the end of 2020 and we’re hoping you’ll join us as this is your opportunity to help shape the future of our open-source database platform offering. We know you’re curious as to what that entails, so read on to learn more.

The beta program 101:
  • This beta is a year-long program with four phases
  • Each three-month phase will involve a different area of emphasis in terms of participant feedback
  • Each phase requires around 10 hours of self-paced activities during the three month period. Participants can commit to as many phases as they like.
  • Our initial focus will be on MySQL and MongoDB, with PostgreSQL added later in the program.
  • Beta participant activities will be facilitated through our Percona Forum, so go ahead and register today if you haven’t already!

Register now in order to hear more about the beta program as we get closer to kick-off later this year!

Register Now


PMM 101: Troubleshooting MongoDB with Percona Monitoring and Management

Troubleshooting MongoDB with Percona Monitoring and Management

Troubleshooting MongoDB with Percona Monitoring and ManagementPercona Monitoring and Management (PMM) is an open-source tool developed by Percona that allows you to monitor and manage your MongoDB, MySQL, and PostgreSQL databases.  This blog will give you an overview of troubleshooting your MongoDB deployments with PMM.

Let’s start with a basic understanding of the architecture of PMM. PMM has two main architectural components:

  1. PMM Client – Client that lives on each database host in your environment.  Collects server metrics, system metrics, and database metrics used for Query Analytics
  2. PMM Server – Central part of PMM that the clients report all of their metric data to.   Also presents dashboards, graphs, and tables of that data in its web interface for visualization of your metric data.

For more details on the architecture of PMM, check out our docs.

Query Analytics

PMM Query Analytics (“QAN”) allows you to analyze MongoDB query performance over periods of time.  In the below screenshot you can see that the most longest-running query was against the testData collection.

Percona Monitoring and Management query analytics

If we drill deeper by clicking on the query in PMM we can see exactly what it was running. In this case, the query was searching in the testData collection of the mpg database looking for records where the value of x is 987544.

Percona Monitoring and Management

This is very helpful in determining what each query is doing, how much it is running, and which queries make up the bulk of your load.

The output is from db.currentOp(), and I agree it may not be clear at a glance what the application-side (or mongo shell) command was. This is a limitation of the MongoDB API in general – the drivers will send the request with perfect functional accuracy but it does not necessarily resemble what the user typed (or programmed).  But with an understanding of this, and focusing first on what the “command” field contains it is not too hard to picture a likely original format. For example the example above could have been sent by running “use mpg; db.testData.find({“x”: { “$lte”: …, “$gt”: … }).skip(0)” in the shell. The last “.skip(0)” is optional as it is 0 by default.

Additionally, you can see the full explain plan for your query just as you would by adding .explain() to your query.   In the below example we can see that the query did a full collection scan on the mpg.testData collection and we should think about adding an index to the ‘x’ field to improve the performance of this query.

Metrics Monitor

Metrics Monitor allows you to monitor, alert, and visualize different metrics related to your database overall, its internal metrics, and the systems they are running on.

Overall System Performance View

The first view that is helpful is your overall system performance view.   Here you can see at a high level, how much CPU and memory are being used, the amount of writes and reads from disk, network bandwidth, # of database connections, database queries per second, RAM, and the uptime for both the host and the database.   This view can often lead you to the problematic node(s) if you’re experiencing any issues and can also give you a high level of the overall health of your monitored environment.

Percona Monitoring and Management system overview

WiredTiger Metrics

Next, we’ll start digging into some of the database internal metrics that are helpful for troubleshooting MongoDB.  These metrics are mostly from the WiredTiger Storage Engine that is the default storage engine for MongoDB since MongoDB 3.0.  In addition to the metrics I cover, there are more documented here.

The WiredTiger storage engine uses tickets as a way to handle concurrency, The default is for WiredTiger to have 128 read and 128 write tickets.  PMM allows you to alert when your available tickets are getting low.  You can also correlate with other metrics as to why so many tickets are being utilized. The graph sample below shows a low-load situation – only ~1 ticket out of 128 was checked out at any time.

Percona Monitoring and Management wiredtiger

One of the metrics that could be causing you to use a large number of tickets is if your checkpoint time is high.   WiredTiger, by default, does a full checkpoint at least every 60 seconds, this is controlled by the WiredTiger parameter checkpoint=(wait=60)).  Checkpointing flushes all the dirty pages to disk. (By the way ‘dirty’ is not as bad as it sounds – it’s just a storage engine term meaning ‘not committed to disk yet’.)  High checkpointing times can lead to more tickets being in use.

Finally, we have WiredTiger Cache Activity metrics. WiredTiger Cache activity indicates the level of data that is being read into or written from the cache.  These metrics can help you baseline your normal cache activity, so you can notice if you have a large amount of data being read into the cache, perhaps from a poorly tuned query, or a lot of data being written from the cache.

WiredTiger Cache Activity

Database Metrics

PMM also has database metrics that are not WiredTiger specific.   Here we can see the uptime for the node, queries per second, latency, connections, and number of cursors.   These are higher-level metrics which can be indicative of a larger problem such as connection storms, storage latency, and excessive queries per second.  These can help you hone in on potential issues for your database.

Percona Monitoring and Management database metrics

Node Overview Metrics

System metrics can point you towards an issue at the O/S level that may or may not correlate to your database.  CPU, CPU saturation, core usage, DISK I/O, Swap Activity, and Network Traffic are some of the metrics that can help you find issues that may start at the O/S level or below.  Additional metrics to the below can be found in our documentation.

Node Overview Metrics


In this blog, we’ve discussed how PMM can help you troubleshoot your MongoDB deployment, whether you’re looking at the WiredTiger specific metrics, system-level metrics, or database level metrics PMM has you covered and can help you troubleshoot your MongoDB deployment.  Thanks for reading!

Additional Resources:

Download Percona Monitoring and Management

PMM for MongoDB Quick Start Guide

PMM Blog Topics

MongoDB Blog Topics


Danger of Changing Default of log_error_verbosity on MySQL/Percona Server for MySQL 5.7

Changing Default of log_error_verbosity mysql

Changing Default of log_error_verbosity mysqlChanging the default value (3) of log_error_verbosity in MySQL/Percona Server for MySQL 5.7 can have a hidden unintended effect! What does log_error_verbosity do exactly? As per the documentation:

The log_error_verbosity system variable specifies the verbosity for handling events intended for the error log.

Basically a value of 1 logs only [Errors]; 2 is 1)+[Warnings]; and 3 is 2)+[Notes].

For example, one might be tempted to change the default of log_error_verbosity since the error.log might be bloated with thousands (or millions) of lines with [Notes], such as:

2020-10-08T17:02:56.736179Z 3 [Note] Access denied for user 'root'@'localhost' (using password: NO)
2020-10-08T17:04:48.149038Z 4 [Note] Aborted connection 4 to db: 'unconnected' user: 'root' host: 'localhost' (Got timeout reading communication packets)

(P.S. you can read more about those Notes on this other Percona blog posts):

Fixing MySQL 1045 Error: Access Denied

MySQL “Got an error reading communication packet”

The problem is, after lowering log_error_verbosity to 1 or 2, no messages about server startup or shutdown would be printed to the log! That can really make troubleshooting really hard in the event of issues or system failure.  For completeness, on error.log from 5.7 at startup with default log_error_verbosity, the following should be seen:

2020-10-08T16:36:07.302168Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-10-08T16:36:07.302188Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-10-08T16:36:07.303998Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-10-08T16:36:07.307823Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-10-08T16:36:07.497571Z 0 [Note] /usr/sbin/mysqld: ready for connections.

And on shutdown:

2020-10-08T16:36:10.447002Z 0 [Note] Giving 0 client threads a chance to die gracefully
2020-10-08T16:36:10.447022Z 0 [Note] Shutting down slave threads
2020-10-08T16:36:10.447027Z 0 [Note] Forcefully disconnecting 0 remaining clients
2020-10-08T16:36:12.104099Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

With log_error_verbosity =2, there won’t be messages about MySQL startup, but some warnings that are only printed at startup might give a hint of the time of server restart such as:

2020-10-08T16:30:21.966844Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-10-08T16:30:22.181367Z 0 [Warning] CA certificate ca.pem is self signed.
2020-10-08T16:30:22.221732Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 6000)

If there is no information about server restart, system logs can be checked for information about restarts:

# cat /var/log/messages 
Oct  8 16:31:25 carlos-tutte-latest57-standalone-1 systemd: Starting MySQL Server...
Oct  8 16:31:26 carlos-tutte-latest57-standalone-1 systemd: Started MySQL Server.

If still no clue when was MySQL last started, checking the “Uptime” status variable can help in calculating the last startup.

The problem does NOT occur on MySQL/Percona Server for MySQL 8.0 since even with log_error_verbosity = 1, the following startup/shutdown lines are printed on the error.log:

2020-10-08T16:31:54.532504Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.20-11) starting as process 1052
2020-10-08T16:31:54.540342Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2020-10-08T16:31:55.026815Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2020-10-08T16:31:55.136125Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/lib/mysql/mysqlx.sock' bind-address: '::' port: 33060
2020-10-08T16:31:55.270669Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.20-11'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona Server (GPL), Release 11, Revision 5b5a5d2.
2020-10-08T16:32:01.708932Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.20-11)  Percona Server (GPL), Release 11, Revision 5b5a5d2.

In conclusion, if possible, avoid changing the default of log_error_verbosity on MySQL/Percona Server for MySQL 5.7. And if you need to change it, do it online with SET GLOBAL instead of through the config file, since, in the event of a restart, startup messages won’t be logged.


5 Things DBAs Should Know Before Deploying MongoDB

Before Deploying MongoDB

Before Deploying MongoDBMongoDB is one of the most popular databases and is one of the easiest NoSQL databases to set up for a DBA.  Oftentimes, relational DBAs will inherit MongoDB databases without knowing all there is to know about MongoDB.   I encourage you to check out our Percona blogs as we have lots of great information for those both new and experienced with MongoDB.  Don’t let the ease of installing MongoDB fool you; there are things you need to consider before deploying MongoDB.  Here are five things DBAs should know before deploying MongoDB in production.

1) Enable Authentication and Authorization

Security is of utmost importance to your database.  Gone are the days when security was disabled by default for MongoDB, but it’s still easy to start MongoDB without security.  Without security and with your database bound to a public IP, anyone can connect to your database and steal your data.  By simply adding some important MongoDB security configuration options to your configuration file, you can ensure that your data is protected.  You can also configure MongoDB to utilize native LDAP for authentication.  Setting up authentication and authorization is one of the simplest ways to ensure that your MongoDB database is secure.  The most important configuration option is turning on authorization which enables users and roles and requires you to authenticate and have the proper roles to access your data.

  authorization: enabled
  keyfile: /path/to/your.keyfile


2) Deploy Replica Sets at a Minimum

A great feature of MongoDB is how easy it is to set up replication.  Not only is it easy but it gives you built-in High Availability.  When your primary crashes or goes down for maintenance, MongoDB automatically fails over to one of your secondaries and your application is back online without any intervention. Replica Sets allow you to offload reads, as long as your application can tolerate eventual consistency.  Additionally, replica sets allow you to offload your backups to a secondary.  Another important feature to know about Replica Sets is that only one member of the Replica Set can accept client writes at a time; if you want multiple nodes to accept writes, then you should look into sharded clusters.  There are some tradeoffs with replica sets that you should also be aware of; when reading from a secondary, depending on your read preference, you may be reading stale data or data that hasn’t been acknowledged by all members of the replica set.

3) Backups

Backups are just as important with MongoDB as they were with any other database.  There are tools like mongodump, mongoexport, Percona Backup for MongoDB, and Ops Manager (Enterprise Edition only) that support Point In Time Recovery, Oplog backups, Hot Backups, and full and incremental Backups.  As mentioned, backups can be run from any node in your replica set.  The best practice is to run your backup from a secondary node so you don’t put unnecessary pressure on your primary node.   In addition to the above methods, you can also take snapshots of your data, and this is possible as long as you pause writes to the node that you’re snapshotting before freezing the file system to ensure a consistent snapshot of your MongoDB database.

4) Indexes Are Just as Important

Indexes are just as important for MongoDB as they are with relational databases.  Just like with relational databases, indexes can help speed up your queries by reducing the size of data that your query returns thus speeding up query performance.  Indexes also help your working set more easily fit into memory.  Pay attention to the query patterns and look for full collection scans in your logfiles to know where there are queries that could benefit from an index.  In addition to that, make sure that you follow the ESR rule when creating your indexes and examining your query patterns.  Just like relational databases, indexes aren’t free, as MongoDB has to update indexes every time there’s an insert, delete or update, so make sure your indexes are being used and aren’t unnecessarily slowing down your writes.

5) Whenever Possible, Working Set < RAM

As with any database, fitting your data into RAM will allow for faster reads than from disk.  MongoDB is no different.  Knowing how much data MongoDB has to read in for your queries can help you determine how much RAM you should allocate to your database.  For example, if your query requires a working set that is 50 GB and you only have 32 GB of RAM allocated to the Wired Tiger Cache, MongoDB is going to constantly read in more of the working set from disk and page it out to make room for the additional data, and this will lead to slower query performance and a system that is constantly using all of its available memory for its cache.  Conversely, if you have a 50 GB working set and 100 GB of RAM for your Wired Tiger cache, the working set will fit completely into memory and as long as other data doesn’t page it out of the cache, MongoDB should serve reads much faster as it will all be in memory.

Avoiding reads from disk is not always possible or practical.  To assess how many reads you’re doing from disk, you can use commands like db.serverStatus() or measure it with tools like Percona Monitoring and Management.  Be sure your filesystem uses the MongoDB recommended XFS file system and whenever possible, uses Solid-State Drives (SSD’s) to speed up disk performance.   Because of database workloads, you should also make sure to provision enough IOPS for your database servers to try and avoid disk bottlenecks as much as possible.   Multiple cored systems will also tend to work better as this allows faster checkpointing for the Wired Tiger storage engine, but you will still be reliant on going as fast as the disks can take you.


While MongoDB is easy to get started with and has a lower barrier to entry, just like any other database there are some key things that you, as a DBA, should consider before deploying MongoDB.   We’ve covered enabling authentication and authorization to ensure you have a secure deployment.  We’ve also covered deploying replica sets to ensure high availability, backups to ensure your data stays safe, the importance of indexes to your query performance and having sufficient memory to cover your queries, and what to do if you don’t. We hope this helps you have a better idea of how to deploy MongoDB and to be able to support it better.  Thanks for reading!


Announcing pg_stat_monitor Tech Preview: Get Better Insights Into Query Performance in PostgreSQL

pg_stat_monitor Better Insights Query Performance in PostgreSQL

I am very passionate about database observability, and I believe query performance observability is the most important insight you can get in your database.  Why? Because if you look from an application developer’s point of view, once a database is provisioned and you can connect to it, responding to your queries promptly and correctly is essentially all that you need from the database. This applies both to the databases which you deploy on-prem or in cloud-based DBaaS offerings.

PostgreSQL has a fantastic extension for capturing query performance called pg_stat_statements which captures a lot of query execution details; however, it did not capture all the details we wanted to capture in order to provide deep query performance insights in Percona Monitoring and Management. So, we developed a new pg_stat_monitor plugin which has slightly different design goals and usage patterns, while providing everything you’re used to from pg_stat_statements, and more.

At this point, pg_stat_monitor is released as a separate project from Percona, so it is easier for us to experiment with new approaches and ideas. At the same time, we’re very much looking forward to working with the PostgreSQL community to see if there are any features that have enough consensus to be incorporated into pg_stat_statements, too.

pg_stat_monitor Differences

  • Time Bucketing: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals. This allows for much better data accuracy, especially in the case of high resolution or unreliable network.
  • Multi-Dimensional grouping:  While pg_stat_statements groups counters by (userid, dbid, queryid),  pg_stat_monitor uses (userid, clientip, dbid, queryid). This allows you to drill down into the performance of queries coming from particular client addresses, which we at Percona have found to be very valuable in a number of cases.
  • Capture Actual Parameters in the Queries:  pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples.  We found having the full query example is very helpful, as you can run EXPLAIN on it or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
  • Store Information about tables accessed by statement: This allows us to identify all queries which accessed a given table easily.  Such an approach is more reliable than parsing queries to extract such information.
  • Response time distribution histogram:  While min/max/avg query execution statistics are great, especially when computed over a short time bucket, they are hard to understand when trying to look at them for a long period of time. For this reason, we have added a query response time histogram which can offer better insights.

It is important to note that we are able to achieve these additional features while having performance overhead comparable to the original pg_stat_statements extension.

pg_stat_monitor Planned Features

There are a number of features we want to implement but have not yet figured out how to achieve them with the PostgreSQL extension (and if it is possible at all without patching PostgreSQL code). If you have ideas on how to achieve any of these, let us know.

  • Capturing All the Queries:  Capture all queries (not only successful queries) so the queries which terminated with errors are not missed (or require another data source to capture). Once we have this, we can measure performance for successful and failed queries separately, so that an elevated failure rate can be seen as a different signal than different query performance.
  • Errors and Warnings Histogram:  Not all queries succeed and when they fail you really want to know why. We would like to see not only the number of times a given query failed but what the cause of failure was. The same query may fail due to a permission error or foreign key violation which requires an entirely different action to remediate.  It is also very helpful to know which query triggers particular warnings.
  • Accurate Wait Event Attribution to Queries:  For a given query type it would be great to see where its response time comes from. Wait Events is one way to capture this and currently, query attribution can be done through a high-frequency sampling of the pg_stat_activity table. However, it is not accurate and not scalable with a large number of active backends.

pg_stat_monitor  is now available as a Technical Preview and we’re releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come. Please check it out,  drop us a note, file an issue, or make a pull request!

Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF


Setup Teams/Users With Limited Access in Percona Monitoring and Management

limit access percona monitoring and managment

From time to time we are asked how to limit users to viewing only some dashboards or servers in Percona Monitoring and Management (PMM). Here are some hints on how to do this.

Let’s imagine you want the following:

  • Users user1 and user2 are only allowed to see the “CPU Utilization Details” dashboard for server1, server2 and pmm-server;
  • User user3 is only allowed to see the “CPU Utilization Details” dashboard for server3;
  • All users are allowed to see MySQL dashboards for any services.

1. First, let’s create users user1, user2, and user3. Their roles should be set to “Viewer”.

percona monitoring and management

2. Now let’s create two folders, Team1 and Team2

percona monitoring and management dashboard

3. We limit folder access on “Team1” to User1/User2, and folder “Team2” to User3

4. Viewer (Role) has to be excluded for all original folders except for MySQL. In our situation, all users are allowed access to MySQL dashboards.

percona monitoring and management insight
5. Now we make copies of the “CPU Utilization Details” dashboard in folders “Team1” and “Team2”

So now users can view only dashboards in folder “MySQL” and “Team1″/”Team2”. In the next step, we will apply limits by servers.

6. We are going to limit servers for dashboards to new folders. To do this, we must modify the node_name variables.
Navigate to “Setting” … “Variables” … “node_name”
Allowed servers are added into the field “Regex”
For Team1: /server1|server2|pmm-server/
For Team2: /server3/

That’s it. Let’s login as user1 and check what we’ve got.

(Please notice that the Home dashboard is located in the folder “Internal” so it’s not allowed for our users either. So the list of allowed dashboards can be accessed through the left menu only.)

As we can see, MySQL dashboards and “CPU Utilization Details” dashboards are accessible. But let’s also check the servers in the dropdown list of the last dashboard.

So it’s correct; User1 can see data for pmm-server.

You can read more about this in the official Grafana documentation, in the section “Manage users“. Also, please keep in mind that users with “Editor” role have access to dashboards settings and can remove or modify regex filtering for servers/services. So it’s better to avoid assigning the “Editor” role to users in the provided solution.


MySQL 101: Troubleshooting MySQL with Percona Monitoring and Management

MySQL 101 Troubleshoot with percona monitoring and management

MySQL 101 Troubleshoot with percona monitoring and managementPercona Monitoring and Management (PMM) is a free and open source platform for managing and monitoring MySQL, MongoDB, and PostgreSQL databases. In this blog, we will look at troubleshooting MySQL issues and performance bottlenecks with the help of PMM.

We can start troubleshooting MySQL after selecting the appropriate server or cluster from the PMM dashboard. There are many aspects to check which should get priority depending on the issue you are facing.

Things to look at:


CPU Usage:

Check to see if there is a spike or gradual increase in the CPU usage on the database server other than the normal pattern. If so, you can check the timeframe of the spike or starting point of the increasing load and review the database connections/thread details from the MySQL dashboard for that time interval.

MySQL CPU usage

CPU Saturation Metrics and Max Core Usage:

This is an important metric as it shows the saturation level of the CPU with normalized CPU load. Normalized CPU Load is very helpful for understanding when the CPU is overloaded. An overloaded CPU causes response times to increase and performance to degrade.

CPU Saturation Metrics and Max Core Usage

Disk latency/ Disk IO utilization:

Check to see if there is any latency observed for the disk. If you see the Disk IO utilization reach 100%, this will cause latency in queries as Disk would not be able to perform the read/writes, causing a gradual pile up of queries and hence the spike. The issue might be with the underlying disk or hardware.

Disk latency/ Disk IO utilization

Memory Utilization:

Any sudden change in memory usage consumption could indicate some process hogging the memory, for example, the Mysql process, if many concurrent queries or any long-running queries are in progress. We can see any increase when any backup job or a scheduled batch job is in progress on the server as well.

Network Details:

Check the Inbound and Outbound Network traffic for the duration of the issue for any sudden dip which would point to some network problems.


MySQL Client Thread Activity / MySQL Connections:

If the issue at hand is for a large number of running threads or a connection spike, you can check the graphs of MySQL Connections and MySQL thread activity and get the timeframe when these connections start increasing. More details about the threads (or queries running) can be seen from Innodb dashboards. As mentioned previously, a spike in Disk IO utilization reaching 100% can also cause connections to pile up. Hence, it is important to check all aspects before coming to any conclusion.

MySQL Client Thread Activity / MySQL Connections

MySQL Slow Queries:

If there were queries that were performing slow, this would be reported in the MySQL slow queries graph. This could be due to old queries performing slowly due to multiple concurrent queries, underlying disk load, or newly introduced queries that need analysis and optimization. Look at the timeframe involved and further check the slow logs and QAN to get the exact queries.

MySQL Slow Queries

MySQL Aborted Connections:

If there were a large number of users or threads unable to establish a connection to the server this would be reported by a spike in aborted connections.

InnoDB Metrics:

InnoDB Transactions:

This metric will show the graph of History List Length on the server, which is basically ‘undo logs’ created to keep a consistent state of data for any particular connection. An increase in HLL over time is caused due to long-running transactions on the server. If you see a gradual increase in HLL, look at your server and check show engine innodb status\G. Look for the culprit query/transaction and try to kill it if it’s not truly needed. While not an immediate issue, an increase in HLL can hamper the performance of a server if the value is in the millions and still increasing.

InnoDB Transactions

InnoDB Row Operations:

When you see a spike in thread activity, you should check here next to get more details about the threads running.  Spike in reads/inserts/deletes? You will get details about each of the row operations and their count, which will help you understand what kind of queries were running on the server and find the particular job that is responsible for this.

InnoDB Row Operations

InnoDb Locking > InnoDB Row Lock Time:

Row Lock Waits indicates how many times a transaction waited on a row lock per second. Row Lock Wait Load is a rolling 5 minute average of Row Lock Waits. Average Row Lock Wait Time is the row lock wait time divided by the number of row locks.

InnoDb Locking > InnoDB Row Lock Time

InnoDB Logging > InnoDB Log file usage Hourly:

If there is an increase in writes on the server, we can see an increase in the log file usage hourly and it would show the size in GB on how much data was written to the ib_logfiles before being sent to disk.

Performance Schema Details:

PS File IO (Events):

This dashboard will provide details on the wait IO for various events as shown in the image:

PS File IO (Events)

PS File IO (Load):

Similar to events, this will display the load corresponding to the event.


PMM Query Analytics:

The Query Analytics dashboard shows how queries are executed and where they spend their time. To get more information out of QAN you should have QAN prerequisites enabled. Select the timeframe and check for the slow queries that caused most of the load. Most probably these are the queries that you need to optimize to avoid any further issues.

PMM Query Analytics

Daniel’s blog on How to find query slowdowns using PMM will help you troubleshoot better with QAN.

For more details on MySQL Troubleshooting and Performance Optimizations, you can check our CEO Peter Zaitsev’s webinar on MySQL Troubleshooting and Performance Optimization with PMM.


Dangerous Edge Case Warning for Percona Toolkit and pt-online-schema-change

Percona Toolkit and pt-online-schema-change

Percona Toolkit and pt-online-schema-changeRecently I was dealing with an unexpected issue raised by our Support customer, in which data became inconsistent after a schema change was applied.

After some investigation, it turned out that affected tables had a special word in the comments of some columns, which triggered an already known (and fixed) issue with the TableParser.pm library of Percona Toolkit.  The problem is that the customer was using an outdated Toolkit version, where pt-online-schema-change was using that buggy parser.

This bug applies only to Percona Toolkit versions up to 3.0.10, so if you have already 3.0.11 or newer installed, you can skip the rest of this post as these are no longer affected.

I am writing this post to warn every user of pt-online-schema-change who has not upgraded the toolkit, as the problem is potentially very dangerous and can lead to the silent loss of data. 

The problem can manifest in two ways. The first one, although confusing, is not really dangerous as the operation is canceled. It happens when the columns with culprit comments do not allow NULL values. For example:

CREATE TABLE `test_not_null` (
`id` int NOT NULL,
`add_id` int NOT NULL COMMENT 'my generated test case',

Schema change operation on this one will look like this:

$ ./pt-online-schema-change-3.0.10 u=msandbox,p=msandbox,h=localhost,S=/tmp/mysql_sandbox5735.sock,D=test,t=test_not_null --print --alter "engine=InnoDB" --execute
Altering `test`.`test_not_null`...
Creating new table...
CREATE TABLE `test`.`_test_not_null_new` (
`id` int(11) NOT NULL,
`add_id` int(11) NOT NULL COMMENT 'my generated test case',
Created new table test._test_not_null_new OK.
Altering new table...
ALTER TABLE `test`.`_test_not_null_new` engine=InnoDB
Altered `test`.`_test_not_null_new` OK.
2020-09-30T21:25:22 Creating triggers...
2020-09-30T21:25:22 Created triggers OK.
2020-09-30T21:25:22 Copying approximately 3 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_not_null_new` (`id`) SELECT `id` FROM `test`.`test_not_null` LOCK IN SHARE MODE /*pt-online-schema-change 1438 copy table*/
2020-09-30T21:25:22 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_del`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_upd`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_ins`
2020-09-30T21:25:22 Dropped triggers OK.
2020-09-30T21:25:22 Dropping new table...
DROP TABLE IF EXISTS `test`.`_test_not_null_new`;
2020-09-30T21:25:22 Dropped new table OK.
`test`.`test_not_null` was not altered.
2020-09-30T21:25:22 Error copying rows from `test`.`test_not_null` to `test`.`_test_not_null_new`: 2020-09-30T21:25:22 Copying rows caused a MySQL error 1364:
Level: Warning
Code: 1364
Message: Field 'add_id' doesn't have a default value
Query: INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_not_null_new` (`id`) SELECT `id` FROM `test`.`test_not_null` LOCK IN SHARE MODE /*pt-online-schema-change 1438 copy table*/

So the reason for the failed operation may be unclear, but at least no data gets damaged. A much worse result happens when the column with comment allows nulls:

CREATE TABLE `test_null` (
`id` int NOT NULL,
`add_id` int DEFAULT NULL COMMENT 'my generated test case',

mysql [localhost:5735] {msandbox} (test) > select * from test_null;
| id | add_id |
| 1  |      1 |
| 2  |      2 |
| 3  |      3 |
3 rows in set (0.01 sec)

For this one, the schema change command runs without any errors:

$ ./pt-online-schema-change-3.0.10 u=msandbox,p=msandbox,h=localhost,S=/tmp/mysql_sandbox5735.sock,D=test,t=test_null --print --alter "engine=InnoDB" --execute
Altering `test`.`test_null`...
Creating new table...
CREATE TABLE `test`.`_test_null_new` (
`id` int(11) NOT NULL,
`add_id` int(11) DEFAULT NULL COMMENT 'my generated test case',
Created new table test._test_null_new OK.
Altering new table...
ALTER TABLE `test`.`_test_null_new` engine=InnoDB
Altered `test`.`_test_null_new` OK.
2020-09-30T21:28:11 Creating triggers...
2020-09-30T21:28:11 Created triggers OK.
2020-09-30T21:28:11 Copying approximately 3 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_null_new` (`id`) SELECT `id` FROM `test`.`test_null` LOCK IN SHARE MODE /*pt-online-schema-change 3568 copy table*/
2020-09-30T21:28:11 Copied rows OK.
2020-09-30T21:28:11 Analyzing new table...
2020-09-30T21:28:11 Swapping tables...
RENAME TABLE `test`.`test_null` TO `test`.`_test_null_old`, `test`.`_test_null_new` TO `test`.`test_null`
2020-09-30T21:28:11 Swapped original and new tables OK.
2020-09-30T21:28:11 Dropping old table...
DROP TABLE IF EXISTS `test`.`_test_null_old`
2020-09-30T21:28:11 Dropped old table `test`.`_test_null_old` OK.
2020-09-30T21:28:11 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_del`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_upd`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_ins`
2020-09-30T21:28:11 Dropped triggers OK.
Successfully altered `test`.`test_null`.

But… the table data is not the same after:

mysql [localhost:5735] {msandbox} (test) > select * from test_null;
| id | add_id |
|  1 |   NULL |
|  2 |   NULL |
|  3 |   NULL |
3 rows in set (0.00 sec)

Summarizing, it is essential to make sure you are using the up to date Percona Toolkit, especially the pt-online-schema-change tool, to avoid potential disaster. The current latest stable release, as of when I am writing the post, is version 3.2.1, and the fixed version for this particular bug, 3.0.11, was released in July 2018.




Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com