Introducing Performance Improvement of Window Functions in PostgreSQL 15

Window Functions in PostgreSQL 15

When working with databases, there are always some projects oriented to performing analytics and reporting tasks over the information stored in the database. Usually, these tasks leverage window functions to do calculations “across a set of table rows that are somehow related to the current row,” as is described in the documentation.

There are several built-in windows functions available in PostgreSQL. In the latest release, PostgreSQL 15, some performance improvements were added for the rank(), row_number(), and count() functions. First, let’s review what these functions can do.

The window functions

As mentioned above, the window functions let us perform some calculations on a set of table rows related to the current one. The “set of table rows” is usually identified as a “partition” defined by a column or columns. As we read in the documentation, the named functions work for:

rank () ? bigint

Returns the rank of the current row, with gaps; that is, the row_number of the first row in its peer group.

row_number () ? bigint

Returns the current row number within its partition, counting from 1.

count ( * ) ? bigint

Computes the number of input rows.

Consider the following table and data:

                                          Table "public.employee"
    Column      |           Type           | Collation | Nullable |                 Default
emp_id          | integer                  |           | not null | nextval('employee_emp_id_seq'::regclass)
emp_name        | character varying(20)    |           | not null |
emp_country     | character varying(35)    |           | not null |
emp_salary      | integer                  |           | not null |
date_of_joining | timestamp with time zone |           | not null | now()
    "employee_pkey" PRIMARY KEY, btree (emp_id)

demo=# SELECT * FROM employee ;
emp_id  |   emp_name   | emp_country | emp_salary |    date_of_joining
      1 | KELIO        | Japon       |       2000 | 2021-10-26 00:00:00+00
      2 | JEAN-VINCENT | Canada      |       6500 | 2021-01-22 00:00:00+00
      3 | JUNO         | Japon       |       4000 | 2021-02-27 00:00:00+00
      4 | GUY-EMMANUEL | Salvador    |       2000 | 2020-07-27 00:00:00+00
      5 | WALI         | Japon       |       7000 | 2021-01-31 00:00:00+00
      6 | HENRI-PAUL   | Canada      |       4500 | 2021-08-19 00:00:00+00
      7 | MUHAMED      | France      |       5000 | 2021-07-20 00:00:00+00
      8 | MUHITTIN     | Madagascar  |       2500 | 2021-12-31 00:00:00+00
      9 | DEVLIN       | Madagascar  |       7000 | 2022-04-03 00:00:00+00
     10 | JOSUE        | Salvador    |       5500 | 2020-09-25 00:00:00+00
(10 rows)


We can use the rank() function to get the rank of employees (id) per country based on their salary. Look at the next example.

demo=# SELECT 
         rank() OVER (PARTITION BY emp_country ORDER BY emp_salary DESC) 
       FROM employee;

emp_id  | emp_salary | emp_country | rank
      2 |       6500 | Canada      |    1
      6 |       4500 | Canada      |    2
      7 |       5000 | France      |    1
      5 |       7000 | Japon       |    1
      3 |       4000 | Japon       |    2
      1 |       2000 | Japon       |    3
      9 |       7000 | Madagascar  |    1
      8 |       2500 | Madagascar  |    2
     10 |       5500 | Salvador    |    1
      4 |       2000 | Salvador    |    2
(10 rows)


In the next example, the row_number() function gets a sorted list of employees’ names per country and their relative numeric position.

demo=# SELECT 
         row_number() OVER (PARTITION BY emp_country ORDER BY emp_name) 
       FROM employee;

emp_id  |   emp_name   | emp_country | row_number
      6 | HENRI-PAUL   | Canada      |          1
      2 | JEAN-VINCENT | Canada      |          2
      7 | MUHAMED      | France      |          1
      3 | JUNO         | Japon       |          1
      1 | KELIO        | Japon       |          2
      5 | WALI         | Japon       |          3
      9 | DEVLIN       | Madagascar  |          1
      8 | MUHITTIN     | Madagascar  |          2
      4 | GUY-EMMANUEL | Salvador    |          1
     10 | JOSUE        | Salvador    |          2
(10 rows)


The count() function is an “old known” tool used by almost everyone with access to a SQL engine. This function is part of the aggregate functions list but can act as windows functions when the OVER clause follows the call. So we can use it to know how many employees share the same salary as a given employee name. 

demo=# SELECT 
         count(*) OVER (PARTITION BY emp_salary) 
       FROM employee;

  emp_name   | emp_salary | emp_country | count
KELIO        |       2000 | Japon       |     2
GUY-EMMANUEL |       2000 | Salvador    |     2
MUHITTIN     |       2500 | Madagascar  |     1
JUNO         |       4000 | Japon       |     1
HENRI-PAUL   |       4500 | Canada      |     1
MUHAMED      |       5000 | France      |     1
JOSUE        |       5500 | Salvador    |     1
JEAN-VINCENT |       6500 | Canada      |     1
WALI         |       7000 | Japon       |     2
DEVLIN       |       7000 | Madagascar  |     2
(10 rows)

Window functions in PostgreSQL 15

Now that we have refreshed what the window functions are, let’s consider what the PostgreSQL 15 release notes say:

Improve the performance of window functions that use row_number(), rank(), and count() (David Rowley)

Accordingly, if we are users of the window functions and move from a previous version to version 15, we should see an improvement in the performance of our workload. Let’s test it.

Laboratory case

To test the performance of the window functions, I created three instances of PostgreSQL, (a) version 13, (b) version 14, and (c) version 15. 

I used the same public.employee table used in the previous examples, and I loaded it with 10K rows. Then I executed the same queries we saw before for the window functions. I got the output from an EXPLAIN (ANALYZE) command which executes the query, and we can see the timing for the specific window function.

The EXPLAIN (ANALYZE) output was the same for each version of PostgreSQL.



                                                      QUERY PLAN
WindowAgg  (cost=1288.49..1488.49 rows=10000 width=25) (actual time=11.946..18.732 rows=10000 loops=1)
  ->  Sort  (cost=1288.49..1313.49 rows=10000 width=17) (actual time=11.928..12.803 rows=10000 loops=1)
        Sort Key: emp_country, emp_salary DESC
        Sort Method: quicksort  Memory: 980kB
        ->  Seq Scan on employee  (cost=0.00..180.00 rows=10000 width=17) (actual time=0.008..2.402 rows=10000 loops=1)
Planning Time: 0.143 ms
Execution Time: 19.268 ms
(7 rows)


                                                      QUERY PLAN
WindowAgg  (cost=844.39..1044.39 rows=10000 width=25) (actual time=12.585..20.921 rows=10000 loops=1)
  ->  Sort  (cost=844.39..869.39 rows=10000 width=17) (actual time=12.560..13.545 rows=10000 loops=1)
        Sort Key: emp_country, emp_salary DESC
        Sort Method: quicksort  Memory: 1020kB
        ->  Seq Scan on employee  (cost=0.00..180.00 rows=10000 width=17) (actual time=0.011..1.741 rows=10000 loops=1)
Planning Time: 0.449 ms
Execution Time: 21.407 ms
(7 rows)


                                                      QUERY PLAN
WindowAgg  (cost=844.39..1044.39 rows=10000 width=25) (actual time=18.949..28.619 rows=10000 loops=1)
  ->  Sort  (cost=844.39..869.39 rows=10000 width=17) (actual time=18.896..19.998 rows=10000 loops=1)
        Sort Key: emp_country, emp_salary DESC
        Sort Method: quicksort  Memory: 1020kB
        ->  Seq Scan on employee  (cost=0.00..180.00 rows=10000 width=17) (actual time=0.011..1.228 rows=10000 loops=1)
Planning Time: 0.460 ms
Execution Time: 29.111 ms
(7 rows)

We can easily see from the WindowAgg node that the total time was smaller in the PG15 than in the other two. The performance improvement is clear here.

To verify this is consistent, I got the Total Time for the WindowAgg node from 500 executions and plotted the next graph.

postgresql total time

We can see the timing from the PG15 version is better than the other versions. Also, I added a trending line. We see the PG13 performed the “worst,” and even when the PG14 showed a better trend, the PG15 was the best.

I did the same exercise for the row_number() and count() functions. 


postgresql row number


I also got the results from 500 executions for these last two functions. In both cases, the difference was smaller than the rank() function, but the PG15 still showed better results, as seen in the trend line.

The performance improvement for the rank(), row_number(), and count() window function introduced in the new PostgreSQL 15 will let all those analytic and reporting projects process their data faster than in previous versions. As always, every environment has its characteristics and challenges, but as we saw in these quick tests, the new version delivers the improvement just out of the box.


Parallel Commits for Transactions Using postgres_fdw on PostgreSQL 15

Transactions Using postgres_fdw on PostgreSQL 15

Consuming data from a remote foreign database into our local database just like another local table is a great feature that adds an excellent level of flexibility to the design of a database solution. 

This option is present in a number of database products. Oracle (DBLINK), MySQL (FEDERATED Engine), and obviously, PostgreSQL (dblink, postgres_fdw) support it. PostgreSQL supports multiple foreign data wrappers. But in this blog, we will review a specific feature added for postgres_fdw in the new version 15.

How postgres_fdw works

Let’s say that you have a local and a remote PostgreSQL server. The second one has a table that your application using the local server needs to access. You might think of any of the following solutions:

  1. Configure your application to be able to connect both servers.
  2. Set up some background jobs to perform a copy every then and now.
  3. Configure logical replication between the two servers so the local table syncs up with the remote one.

Any of the above has its own advantages and disadvantages. For the first one, you might need to configure your network and grant access privileges to both DB servers. All these might add complexity and open the door for some security issues. The second option will add extra load on both servers. The data is not going to be up-to-date in real-time, so some reading inconsistencies might happen. The third one actually could be a good option. But if you are planning to MODIFY the synced data then you might face some collisions.  

Then the solution might be relaying in the Foreign Data Wrapper. From the official documentation the next steps are required to access data from a remote PostgreSQL server:

  1. Install the postgres_fdw extension using CREATE EXTENSION.
  2. Create a foreign server object, using CREATE SERVER, to represent each remote database you want to connect to. Specify connection information, except user and password, as options of the server object.
  3. Create a user mapping, using CREATE USER MAPPING, for each database user you want to allow to access each foreign server. Specify the remote user name and password to use as user and password options of the user mapping.
  4. Create a foreign table, using CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA, for each remote table you want to access. The columns of the foreign table must match the referenced remote table. You can, however, use table and/or column names different from the remote tables, if you specify the correct remote names as options of the foreign table object.

You will end with something like this:

Foreign Data Wrapper PostgreSQL

So now, with the proper privileges, your application is able to access the data from the remote table just like another local table, you can SELECT or even execute DML statements on it.

What has changed on PG15?

The described functionality has been present for a while in the PostgreSQL versions, and the new enhancement that is going to be available in version 15 is the parallel transaction commit when using postgres_fdw. This is particularly useful when your solution design relies on having multiple remote servers, such as a distributed PostgreSQL database. 

From the Transaction Management documentation, we know that “during a query that references any remote tables on a foreign server, postgres_fdw opens a transaction on the remote server if one is not already open corresponding to the current local transaction”, in the case of our local transaction references multiple remote tables from different foreign servers, then a (sub)transaction is opened on each remote server.

Up to version 14, when there are multiple (sub)transactions tied to a local transaction and the transaction is committed locally, postgres_fdw commits each remote (sub)transaction one by one (sequentially). This behavior is still the default on version 15, but now we have the option to enable the parallel_commit option for each foreign server tied to a local transaction, and in that case, when the local transaction commits all the remote (sub)transactions will commit at once (parallel). 

Laboratory case

I built a laboratory to test the sequential and the parallel remote (sub)transactions commit and catch the differences. 

I decided to use the pgbench tool for the benchmark and the option to create partitions for the pgbench_accounts table, I adapted these partitions to be foreign tables, just as was described on a very interesting Percona blog about Sharding on PostgreSQL, worth having a look on this last one. Finally, I created a partitioned table with 70 partitions distributed over seven remote servers, each one tied to a postgres_fdw foreign server. 

The next diagram gives some insight.

postgres_fdw foreign server

Pgbench execution

To test the new feature I tried two different loads with the parallel_commit option disabled (default) and then enabled:

  • I ran the “normal” pgbench workload. It accesses one row per transaction, which means every local transaction opened only one remote (sub)transaction.
  • I ran a pgbench “custom” script to retrieve 100 rows per transaction. In this case, every local transaction opened multiple remote (sub)transactions.

Running the regular pgbench load against the PG15 server was not more than:

pgbench --progress-timestamp -c 16 -j 4 -T 3600 -P 2 \
  -h pg15 -U pgbench pgbench

After getting the results I enabled the parallel_commit option on each foreign server the next way:

ALTER SERVER remote_server<N> OPTIONS (ADD parallel_commit 'true');

And repeated the pgbench load. The results are as follows.

pgbench load

We see no big difference between the load with the parallel option set OFF or ON. This was kind of expected due to in the regular pgbench load every local transaction opened only one remote transaction.

For the second test scenario I used a custom pgbench script to go for 100 rows on each transaction, it was as follows:

$ cat pgbench_custom.sql
\set v1 random(1, 100000 * :scale)
\set v2 :v1 + 100

  SELECT abalance FROM pgbench_accounts
  WHERE aid BETWEEN :v1 AND :v2;

Before running the load I disabled the parallel_commit option from the foreign servers executing the next for each one:

ALTER SERVER remote_server<N> OPTIONS (DROP parallel_commit);

And then I ran the custom load with NO parallel commit the next way:

pgbench --progress-timestamp -c 8 -j 4 -T 3600 -P 2 \
  -h pg15 -U pgbench -f pgbench_custom.sql pgbench

Then I enabled it on each foreign server with the command I used before and ran the custom load again. The next are the results.

foreign server

Now we see a clear improvement when using the parallel_commit option enabled. Consistently the number of TPS was higher during the test when the option was enabled showing how the performance can be improved.

In general, the TPS dropped down compared with the regular pgbench load. The details are out of the scope of this blog post, the goal here was to get an idea of how the performance might be improved using the new parallel_commit option in similar conditions. 


The capabilities a PostgreSQL database gets with postgres_fdw have been known for a while, and multiple system designs have taken advantage of them. 

The addition of the new parallel_commit option when using postgres_fdw in PostgreSQL 15 brings a very good opportunity to improve the performance of our databases if they are designed as distributed systems and our local transactions would be tied to multiple remote (sub)transactions. Every solution and system design requires its own analysis. This blog post only intends to show the new option, and how we can deliver a higher throughput under the same conditions just by enabling the new parallel_commit option. 


Percona Operator for MongoDB and LDAP: How Hard Can it Be?

Percona Operator for MongoDB and LDAP

Percona Operator for MongoDB and LDAPAs a member of the Percona cloud team, I run a lot of databases in Kubernetes and our Percona Operator for MongoDB in particular. Most of us feel comfortable with rather small and tidy infrastructure, which is easy to control and operate. However, the natural growth of the organization brings a lot of challenges to be tackled, especially in the context of access management. We’ve all been there – every new account requires platform-specific efforts adding to the operational cost of your infrastructure support. Such a burden could be solved by LDAP-based software like OpenLDAP, Microsoft Active Directory, etc., as a source of truth for the authentication/authorization process. Let’s dive into the details of how Percona Distribution for MongoDB managed by the Kubernetes operator could be connected to the LDAP server.


Our scenario is based on the integration of the OpenLDAP server and Percona Distribution for MongoDB and the corresponding Kubernetes Operator. We are going to keep the setup as simple as possible thus no complicated domain relationships will be listed here.


On the OpenLDAP side the following settings may be used:

0-percona-ous.ldif: |-
   dn: ou=perconadba,dc=ldap,dc=local
   objectClass: organizationalUnit
   ou: perconadba
 1-percona-users.ldif: |-
   dn: uid=percona,ou=perconadba,dc=ldap,dc=local
   objectClass: top
   objectClass: account
   objectClass: posixAccount
   objectClass: shadowAccount
   cn: percona
   uid: percona
   uidNumber: 1100
   gidNumber: 100
   homeDirectory: /home/percona
   loginShell: /bin/bash
   gecos: percona
   userPassword: {crypt}x
   shadowLastChange: -1
   shadowMax: -1
   shadowWarning: -1

Also, a read-only user should be created for database-issued user lookups.

If everything is done correctly, the following command should work

$ ldappasswd -s percona -D "cn=admin,dc=ldap,dc=local" -w password -x "uid=percona,ou=perconadba,dc=ldap,dc=local"


Percona Operator for MongoDB will do all the work inside the Kubernetes. You can use any supported platform from System Requirements

In order to get MongoDB connected with OpenLDAP we need to configure both:

  • Mongod
  • Internal mongodb role

As for mongod you may use the following configuration snippet:

  authorization: "enabled"
      queryTemplate: 'ou=perconadba,dc=ldap,dc=local??sub?(&(objectClass=group)(uid={USER}))'
    servers: "openldap"
    transportSecurity: none
      queryUser: "cn=readonly,dc=ldap,dc=local"
      queryPassword: "password"
            match : "(.+)",
            ldapQuery: "OU=perconadba,DC=ldap,DC=local??sub?(uid={0})"
  authenticationMechanisms: 'PLAIN,SCRAM-SHA-1'

The internals of the snippet is a topic for another blog post, though. Basically, we are providing mongod with ldap-specific parameters like domain name of ldap server (‘server’), explicit lookup user, domain rules, etc. 

Put the snippet on your local machine and create Kubernetes secret object named after the MongoDB cluster name from Install Percona server for MongoDB on Kubernetes. 

$ kubectl create secret generic cluster1-rs0-mongod --from-file=mongod.conf=<path-to-mongod-ldap-configuration>

Percona Operator for MongoDB is able to pass through a custom configuration from these Kubernetes objects: Custom Resource, ConfigMap, Secret. Since we are to use some security-sensitive information, we’ve picked the Kubernetes Secret.

The next step is to start the MongoDB cluster up as it’s described in Install Percona Server for MongoDB on Kubernetes.

On successful completion of the steps from the doc, we are to proceed with setting the LDAP users’ roles inside the MongoDB. Let’s log in to MongoDB as administrator and execute:

var admin = db.getSiblingDB("admin")
    role: "ou=perconadba,dc=ldap,dc=local",
    privileges: [],
    roles: [ "userAdminAnyDatabase" ]

Now our percona user created inside OpenLDAP is able to login to MongoDB as administrator. Please replace <mongodb-rs-endpoint> with a valid replica set domain name.

$ mongo --username percona --password 'percona' --authenticationMechanism 'PLAIN' --authenticationDatabase '$external' --host <mongodb-rs-endpoint> --port 27017


Percona Operator for MongoDB supports unencrypted only transport between mongodb and LDAP servers at the moment of writing this blog post. We’ll do our best to bring such a feature in the future, in the meantime feel free to use it in security relaxed scenarios.


This very blog post describes only one possible case out of a huge variety of possible combinations. It’s rather simple and does not bring for discussion topics like TLS secured transport setup, directory design, etc. We encourage you to try the MongoDB LDAP integration described for proof of concept, development environment setup, etc., where security concerns are not so demanding. Don’t hesitate to bring more complexity with sophisticated directory structures, user privileges schemes, etc. If you find your own configuration interesting and worth sharing with the community, please visit our Percona Community Forum. We’ll be glad to check your outcomes. 

Have fun with Percona Operator for MongoDB!


Window Functions in MongoDB 5.0

Window Functions in MongoDB 5.0

Window Functions in MongoDB 5.0I have already presented in previous posts some of the new features available on MongoDB 5.0: resharding and time series collections. Please have a look if you missed them:

MongoDB 5.0 Time Series Collections

Resharding in MongoDB 5.0

In this article, I would like to present another new feature: window functions.

Window functions are quite popular on relational databases, they permit the run of a window across sorted documents producing calculations over each step of the window. Typical use cases are calculating rolling averages, correlation scores, or cumulative totals. You can achieve the same result even with older versions of MongoDB or with databases where window functions are not available. But this comes at the cost of more complexity because multiple queries are usually required, and saving somewhere temporary data is needed as well.

Instead, the window functions let you run a single query and get the expected results in a more efficient and elegant way.

Let’s see how the feature works on MongoDB 5.0.

The window functions

A new aggregation stage $setWindowFields is available on MongoDB 5.0. This is the one that provides the window functions capability.

The following is the syntax of the stage:

  $setWindowFields: {
    partitionBy: <expression>,
    sortBy: { 
      <sort field 1>: <sort order>,
      <sort field 2>: <sort order>,
      <sort field n>: <sort order>
    output: {
      <output field 1>: {
        <window operator>: <window operator parameters>,
        window: { 
          documents: [ <lower boundary>, <upper boundary> ],
          range: [ <lower boundary>, <upper boundary> ],
          unit: <time unit>
      <output field 2>: { ... }, 
      <output field n>: { ... }

  • partitionBy (optional): some expression to group the document. If omitted by default all the documents are grouped into a single partition
  • sortBy (required in some cases ): sorting the documents. Uses the $sort syntax
  • output (required): specifies the documents to append to the result set. Basically, this is the parameter that provides the result of the window function
  • window (optional): defines the inclusive window boundaries and how the boundaries should be used for the calculation of the window function result

Well, the definitions may look cryptic but a couple of simple examples will clarify how you can use them.

The test dataset

I have a Percona Server for MongoDB 5.0 running and I got some public data about COVID-19 infections, hospitalizations, and other info from Italy. The data are available on a per-day and per-region basis from the following link:

I loaded just a few months’ data spanning 2021 and 2022. Data is labeled in Italian, so I created a similar and reduced collection just for the needs of this article.

Here is a sample of the documents:

> db.covid.find({"region":"Lombardia"}).sort({"date":1}).limit(5)
{ "_id" : ObjectId("62ab5f7d017d030e4cb314e9"), "region" : "Lombardia", "total_cases" : 884125, "date" : ISODate("2021-10-01T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb314fe"), "region" : "Lombardia", "total_cases" : 884486, "date" : ISODate("2021-10-02T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb31516"), "region" : "Lombardia", "total_cases" : 884814, "date" : ISODate("2021-10-03T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb31529"), "region" : "Lombardia", "total_cases" : 884920, "date" : ISODate("2021-10-04T15:00:00Z") }
{ "_id" : ObjectId("62ab5f7d017d030e4cb3153d"), "region" : "Lombardia", "total_cases" : 885208, "date" : ISODate("2021-10-05T15:00:00Z") }

Each document contains the daily number of total COVID infections from the beginning of the pandemic for a specific Italian region.

Calculate daily new cases

Let’s create our first window function.

Since we have in the collection only the number of total cases, we would like to calculate the number of new cases per day. This way we can understand if the status of the pandemic is getting worse or improving.

You can achieve that by issuing the following aggregation pipeline:

> db.covid.aggregate( [
{ $setWindowFields: {
    partitionBy : "$region",
    sortBy: { date: 1 },
    output: {
      previous: {
        $push: "$total_cases",
        window: {
          range: [-1, -1],
          unit: "day"
{ $unwind:"$previous"},
{ $addFields: {
    new_cases: {
      $subtract: ["$total_cases","$previous"]
{ $match: { "region": "Lombardia" } },
{ $project: { _id:0, region:1, date:1, new_cases: 1}  }
] )
{ "region" : "Lombardia", "date" : ISODate("2021-10-02T15:00:00Z"), "new_cases" : 361 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-03T15:00:00Z"), "new_cases" : 328 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-04T15:00:00Z"), "new_cases" : 106 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-05T15:00:00Z"), "new_cases" : 288 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-06T15:00:00Z"), "new_cases" : 449 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-07T15:00:00Z"), "new_cases" : 295 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-08T15:00:00Z"), "new_cases" : 293 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-09T15:00:00Z"), "new_cases" : 284 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-10T15:00:00Z"), "new_cases" : 278 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-11T15:00:00Z"), "new_cases" : 87 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-12T15:00:00Z"), "new_cases" : 306 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-13T15:00:00Z"), "new_cases" : 307 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-14T15:00:00Z"), "new_cases" : 273 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-15T15:00:00Z"), "new_cases" : 288 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-16T15:00:00Z"), "new_cases" : 432 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-17T15:00:00Z"), "new_cases" : 297 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-18T15:00:00Z"), "new_cases" : 112 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-19T15:00:00Z"), "new_cases" : 412 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-20T15:00:00Z"), "new_cases" : 457 }
{ "region" : "Lombardia", "date" : ISODate("2021-10-21T15:00:00Z"), "new_cases" : 383 }


The pipeline also contains stages to make the output more readable. Let’s focus on the $setWindowFields anyway.

In the first stage, we define the window function in order to create for each document a new field containing the total cases from the previous day. The field was obviously named previous.

Then we’ll use this information in the following stages to simply calculate the difference between the total cases of “today” and “yesterday”. Then we get the daily increase.

Take a look at how the window function has been created. We used $push to fill the new field with the value of total_cases. In the window document, we defined the range as [-1,-1]. These numbers represent the lower and upper boundaries of the window and they both correspond to the previous (-1) document in the window. It spans only one document: yesterday. In this case, the usage of sortBy is relevant because it tells MongoDB which order to consider the documents in the windows. The trick of defining the range as [-1,-1] to get yesterday’s data is possible because the documents are properly sorted.

Calculate moving average

Let’s now calculate the moving average. We’ll consider the last week of data to calculate the average of new cases on a daily basis. This kind of parameter was a very popular one during the peak of the pandemic to trigger a lot of discussions around the forecasts and to address the decisions of the governments. Well, it’s a simplification. There were also other relevant parameters, but the moving average was one of them.

To calculate the moving average we need the daily new cases we have calculated in the previous example. We can reuse those values in different ways like adding another “$setWindowField” stage in the previous pipeline, adding the new_cases field on existing documents, or creating another collection as I did for simplicity this way using the $out stage:

> db.covid.aggregate( [ { $setWindowFields: { partitionBy : "$region", sortBy: { date: 1 }, output: { previous: { $push: "$total_cases", window: { range: [-1, -1],  unit: "day" } } } } }, { $unwind:"$previous"},  { $addFields: { new_cases: { $subtract: ["$total_cases","$previous"] } } }, { $project: { region:1, date:1, new_cases: 1} }, { $out: "covid_daily"  }  ] )

Now we can calculate the moving average on the covid_daily collection. Let’s do it with the following aggregation:

> db.covid_daily.aggregate([
{ $setWindowFields: {
    partitionBy : "$region",
    sortBy : { date: 1 },
    output: {
      moving_average: {
        $avg: "$new_cases",
        window: {
          range: [-6, 0],
          unit: "day"
{ $project: { _id:0  } }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-02T15:00:00Z"), "new_cases" : 49, "moving_average" : 49 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-03T15:00:00Z"), "new_cases" : 36, "moving_average" : 42.5 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-04T15:00:00Z"), "new_cases" : 14, "moving_average" : 33 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-05T15:00:00Z"), "new_cases" : 35, "moving_average" : 33.5 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-06T15:00:00Z"), "new_cases" : 61, "moving_average" : 39 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-07T15:00:00Z"), "new_cases" : 54, "moving_average" : 41.5 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-08T15:00:00Z"), "new_cases" : 27, "moving_average" : 39.42857142857143 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-09T15:00:00Z"), "new_cases" : 48, "moving_average" : 39.285714285714285 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-10T15:00:00Z"), "new_cases" : 19, "moving_average" : 36.857142857142854 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-11T15:00:00Z"), "new_cases" : 6, "moving_average" : 35.714285714285715 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-12T15:00:00Z"), "new_cases" : 55, "moving_average" : 38.57142857142857 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-13T15:00:00Z"), "new_cases" : 56, "moving_average" : 37.857142857142854 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-14T15:00:00Z"), "new_cases" : 45, "moving_average" : 36.57142857142857 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-15T15:00:00Z"), "new_cases" : 41, "moving_average" : 38.57142857142857 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-16T15:00:00Z"), "new_cases" : 26, "moving_average" : 35.42857142857143 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-17T15:00:00Z"), "new_cases" : 39, "moving_average" : 38.285714285714285 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-18T15:00:00Z"), "new_cases" : 3, "moving_average" : 37.857142857142854 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-19T15:00:00Z"), "new_cases" : 45, "moving_average" : 36.42857142857143 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-20T15:00:00Z"), "new_cases" : 54, "moving_average" : 36.142857142857146 }
{ "region" : "Abruzzo", "date" : ISODate("2021-10-21T15:00:00Z"), "new_cases" : 72, "moving_average" : 40 }


Note we have defined the range boundaries as [-6,0] in order to span the last week’s documents for the current document.

Notes about window functions

We have used unit: “day” in the window definition, but this option field can also have other values like year, quarter, month, week, day, hour, and so on.

There are multiple operators that can be used with $setWindowFields: $avg, $count, $first, $last, $max, $min, $derivative, $sum, $rank and many others you can check on the documentation.

There are a few restrictions about window functions usage. Please have a look at the official documentation in case you hit some of them.


The new window function is a very good feature deployed on MongoDB 5.0. It could make life easier for a lot of developers.

For getting more details and to check the restrictions you can, have a look at the following page:

Percona Server for MongoDB 5.0 is a drop-in replacement for MongoDB Community. You can use it for free and you can rely on enterprise-class features like encryption at rest, LDAP authentication, auditing, and many others. You can also rely on all new features of MongoDB Community 5.0, including window functions.

Take a look at Percona Server for MongoDB.


Debug Symbols for Percona Server for MongoDB and MongoDB

Debug Symbols MongoDB

Debug Symbols MongoDBBoth Percona Server for MongoDB and vanilla MongoDB packages do not contain debug symbols by default. This is because the debug symbols package can be up to a 3GB download depending on the version and target platform. Fortunately, you only need debug symbols in those rare cases when you have got a serious enough issue, and you really want to debug it. So for most users, it is an absolutely reasonable decision to not download gigabytes of debug symbols by default.

This blog post provides pointers to where to get debug symbols for Percona Server for MongoDB or vanilla MongoDB.

Percona Server for MongoDB

Using the corresponding package manager

Percona provides debug symbols packages for Percona Server for MongoDB. It is recommended to install Percona packages from official Percona repositories using the corresponding tool for your system:

Installing debug symbols manually

You can also download packages from the Percona website and install them manually using dpkg or rpm.

To get debug symbols for Percona Server for MongoDB, go to the downloads page. Then look for the “Percona Server for MongoDB” section and click the “Download X.X Now” button corresponding to the version you are interested in.

On the new page, select the minor release version from the Version: dropdown list and the target platform from the Software: dropdown list. This will reveal a list of available packages for the selected platform. You can search for a dbg or debuginfo package (depending on the target platform) and download it.

In most cases, it is possible to download debug symbols as a special package or as part of the “All Packages” bundle. For example, on the Percona Server for MongoDB 5.0.98 page for Ubuntu 20.04 you can download either a separate percona-server-mongodb-dbg_5.0.98.focal_amd64.deb package or all packages bundle which contains debug symbols package: percona-server-mongodb-5.0.9 8-r15a95b4-focal-x86_64-bundle.tar

MongoDB Debug Symbols

There are no debug symbols packages provided by MongoDB Inc., but fortunately, it is possible to download binary tarballs containing debug symbols files from the non-advertised location: 

Be careful – the above link opens a huge list containing thousands of tgz archives created since 2009. This is virtually the full MongoDB history, in a single directory.

The names of those files are talking for themselves: it’s a combination of architecture, platform, and MongoDB version.

For example, there are two files for MongoDB 5.0.9 on Ubuntu 20.04:

The first of those files are just server core binaries:

$ tar -tf mongodb-linux-x86_64-ubuntu2004-5.0.9.tgz

The second file contains corresponding debug symbols files:

$ tar -tf mongodb-linux-x86_64-ubuntu2004-debugsymbols-5.0.9.tgz

Thus, you can use those symbol files with gdb to analyze the core dump file if you have one or to debug a running instance of MongoDB Community Edition.

Each xxxx.debug file is a debug symbols file for the corresponding xxxx binary. If you accidentally try to debug with a mismatched symbols file, gdb will politely inform you about that:

Reading symbols from ./mongod...
warning: the debug information found in "/home/igor/5.0.9/bin/mongod.debug" does not match "/home/igor/5.0.9/bin/mongod" (CRC mismatch).

(no debugging symbols found)...done.

This especially can happen if you upgrade the binaries package but not debug symbols.


It is a really rare case when you will need to debug Percona Server for MongoDB or MongoDB, but if you really need to I hope this debug symbols information will save a few minutes for you.

Happy debugging!


Percona Monthly Bug Report: June 2022

Percona Monthly Bug Report June 2022

Percona Monthly Bug Report June 2022Here at Percona, we operate on the premise that full transparency makes a product better. We strive to build the best open-source database products, but also to help you manage any issues that arise in any of the databases that we support. And, in true open source form, report back on any issues or bugs you might encounter along the way.

We constantly update our bug reports and monitor other boards to ensure we have the latest information, but we wanted to make it a little easier for you to keep track of the most critical ones. These posts are a central place to get information on the most noteworthy open and recently resolved bugs. 

In this June 2022 edition of our monthly bug report, we have the following list of bugs:


Percona Server for MySQL/MySQL Bugs

PS-8250(MySQL#107352): Rows_examined, regardless if checked via performance_schema or slow query logs, have confusing values. It looks like only rows examined from an outer query are counted.

The row examined count is retrieved from thd->get_examined_row_count(); the member (ha_rows m_rows_examined). This variable is incremented after joins. THD::inc_examined_row_count().

In 8.0, the optimizer moved to a new style, the QueryIterator model. The examined row count is calculated differently.

The rows examination in 5.7 includes rows examined in all phases of the query which we expect the same in 8.0 as well.

Affected version/s: 8.0.28, 8.0.29


PS-8271(MySQL#82599): slave_type_conversions messing with data that still fits within the allowed range.

It may happen that you want to prepare for int column growth and change datatype from int to bigint, so you do it first on the slave.

Unfortunately, after passing the maximum value for signed int (2147483647), when the source has unsigned int, regardless of whether the replica has signed or unsigned bigint, the values get either truncated to 0 (when unsigned) or stored as negative numbers counting from -2147483647 downwards.

Affected version/s: 5.7.14, 5.6, 5.6.32, 8.0.28


PS-8070(MySQL#106560): Regression on DDL statements with big stage/sql/checking permissions.

So the reason for performance degradation is due to DDL operations in 8.0 are all atomic, while 5.7 operations did not allow that. Next, 8.0 has been designed to provide much better performance in multi-threaded loads. Hence, we recommend running your DDLs in many threads. That also means to dump your schemas with mysqlpump, instead of mysqldump.

Affected version/s: 8.0


PS-8050(MySQL#106616) : PS 8.0 upgrade (from PS 5.7) crashes with Assertion failure: == nullptr

Please note that this bug is fixed and the fix is available in 8.0.28-19

Affected version/s: 8.0
Fixed version: 8.0.28-19


PS-8115(MySQL#106728) : CREATE EVENT DEFINER=test@usr@localhost processed as ‘test’@’usr@localhost’

Event scheduler looks for the first @ sign to separate the user name from the hostname.
MySQL can create users with @ side in the middle of user name: create user test@usr@localhost;

Affected version/s: 5.7.36-39
Fixed version: 5.7.37-40

Please note that 8.0 is unaffected by this issue.


Percona XtraDB Cluster

PXC-3923 : ANALYZE TABLE crashes node when [super_]read_only is set!

When the read_only or super_read_only is set, ANALYZE TABLE command gives an error “ERROR 1290 (HY000): The MySQL server is running with the –read-only option so it cannot execute this statement”  which is expected but in the background, the node is evicted from the cluster with “Inconsistency detected” message.

Affected version/s: 8.0.26-16, 8.0.27-18
Fixed version : 8.0.28-19


Percona Toolkit

PT-2070 : pt-table-checksum to be run on PXC with pxc_strict_mode=ENFORCING

pt-table-checksum cannot be run with pxc_strict_mode=ENFORCING as STATEMENT events are rejected in this mode.

Setting the pxc_strict_mode=PERMISSIVE enables pt-table-checksum to run successfully.

Affected Version/s:  3.3.1


PT-2066 : pt-online-schema-change can’t connect on non-default port

When trying to use pt-online-schema-change to connect to MySQL DB with a non-default port it is throwing an error saying Can’t connect to MySQL server on ‘[db instance]‘ (110) at /usr/bin/pt-online-schema-change line 2345.

As a workaround to this issue, we can pass the connection options via DSN.

Affected Version/s:  3.3.1


Percona Monitoring and Management (PMM)

PMM-9413 : pmm-managed gets deadlocked when an agent connects with a duplicate agent_id.

A VM with a PMM client installed is replicated in a cloud environment. PMM client runs in a replica with the same configuration data as the original VM. Duplicate agent ID is detected by pmm-managed. It causes failure to add any new resources for monitoring and OOM-killer kills pmm-managed.

Affected version/s:  2.25.0
Fixed version: 2.27.0


Percona Operator for MySQL based on Percona XtraDB Cluster

K8SPXC-804 : Mark pxc container restarts in logs container output

mysqld could be killed by oom killer or by kill -9 in private clouds.

There are no messages generated between the last valid log entry and the first mysqld startup message.

Affected version/s: 1.7.0
Fixed version: 1.12.0


K8SPXC-1036 : Liveness check fails when XtraBackup is running and wsrep_sync_wait is set

Backups run on non-writer nodes. When XtraBackup runs, it performs a LOCK TABLES FOR BACKUP on the instance. If the writer instance performs a DDL, such as creating a table, on the node where the backup is running, the DDL will wait behind the BACKUP LOCK to complete. If wsrep_sync_wait is != 0, subsequent SELECTs will remain on the write-set queue until the node is in sync.

Affected version/s: 1.10.0
Fixed version: 1.12.0



We welcome community input and feedback on all our products. If you find a bug or would like to suggest an improvement or a feature, learn how in our post, How to Report Bugs, Improvements, New Feature Requests for Percona Products.

For the most up-to-date information, be sure to follow us on Twitter, LinkedIn, and Facebook. 

Quick References:

Percona JIRA

MySQL Bug Report

Report a Bug in a Percona Product

MySQL 8.0.27 Release notes


About Percona:

As the only provider of distributions for all three of the most popular open source databases—PostgreSQL, MySQL, and MongoDB—Percona provides expertise, software, support, and services no matter the technology.

Whether its enabling developers or DBAs to realize value faster with tools, advice, and guidance, or making sure applications can scale and handle peak loads, Percona is here to help.

Percona is committed to being open source and preventing vendor lock-in. Percona contributes all changes to the upstream community for possible inclusion in future product releases.


Working With Large PostgreSQL Databases

Working With Large PostgreSQL Databases

It’s a funny thing when the topic of database sizes comes up. Calling one small, medium, large, or even huge isn’t as straightforward as you’d think. Distinguishing the size of a database is based upon a number of factors whose characteristics can be classified as either “tangible”, things that you can measure in an objective manner, or “intangible”, those attributes best expressed using the catch-all phrase “it depends”. For example, a 2TB database is, to many people, a large database. On the other hand, a veteran DBA could describe a PostgreSQL database cluster as large when it enters the realm of Peta-Bytes.

Here’s a recap of some of PostgreSQL’s basic capabilities:

database size


number of databases


relations per database


table size


rows per table, defined by the number
of tuples that can fit onto the page

4,294,967,295 pages

field per table


field size


identifier length

63 bytes

indexes per table


columns per index


partition keys


NB: Despite possible physical constraints one faces when creating large numbers of schema, there is no theoretical limitation to the number created in postgres.

I’ve come to differentiate a small database from a large one using the following caveats. And while it is true that some of the caveats for a large database can be applied to a small one, and vice-versa, the fact of the matter is that most of the setups out there in the wild follow these observations:

  1. Small databases are often administered by a single person
  2. Small databases can be managed manually.
  3. Small databases are minimally tuned.
  4. Small databases can typically tolerate production inefficiencies more than large ones.
  5. Large databases are managed using automated tools.
  6. Large databases must be constantly monitored and go through an active tuning life cycle.
  7. Large databases require rapid response to imminent and ongoing production issues to maintain optimal performance.
  8. Large databases are particularly sensitive to technical debt.

Large databases often bring up the following questions and issues:

  • Is the system performance especially sensitive to changes in production load?
  • Is the system performance especially sensitive to minor tuning effects?
  • Are there large amounts of data churn?
  • Does the database load system saturate your hardware’s capabilities?
  • Do the maintenance activities, such as logical backups and repacking tables, take several days or even more than a week?
  • Does your Disaster Recovery Protocol require having a very small Recovery Point Objective (RPO) or Recovery Time Objective (RTO)?

The key difference between a small vs large database is how they are administered:

  1. Whereas it is common that small databases are manually administered, albeit it’s not best practice, using automation is the industry default mode of operation in many of these situations for large databases.
  2. Because circumstances can change quickly, large databases are particularly sensitive to production issues.
  3. Tuning is constantly evolving; while it is true that newly installed architectures are often well-tuned, circumstances change as they age and large databases are especially vulnerable.

Good planning is your friend: addressing potential issues for a large database by anticipating future conditions is the goal i.e. testing the entire infrastructure before it goes into production. 

Scripting your build environment using tools such as Ansible, Puppet, Terraform, etc. mitigates human error when provisioning the underlying infrastructure. It’s important to be able to build in a consistent and repeatable manner.

Once a database is in production it must be monitored and wired with alerts for the various critical thresholds. Aside from the standard errors, consider configuring your monitoring solution to follow the “Rule Of Three”. Select and watch only three metrics that track and alert for a specific “change of state”. This is not to be confused with following a particular issue, rather it is meant to inform you that you should pay attention to your system in order to understand that something has changed from what is considered normal behavior. Depending on your preferences you may want to watch for known production issues or when the system is stable you might be more interested in trending alerts such as query performance which have slowed below a predefined threshold.

In regards to system tuning: while small databases can, after a fashion, perform in a satisfactory manner using the default values large databases cannot. Configuring initial tuning parameters such as the shared_buffers etc is de rigueur but you should also monitor the environment in order to trend issues such as for example bloat and long-term query performance. Remember, the most common problem experienced by an otherwise stable and well-thought-out architecture is table and index bloat. Addressing bloat by tuning the autovacuum characteristics is essential.

Monitoring, especially before and after maintenance windows, is required because they can catch potential problems to the update before becoming production issues.

Pay close attention to following the regular maintenance activities during the life-cycle of your system:

  • Logical backups and misc database redundancies
  • Architectural evolution:
    • application stack updates, upgrades, and rollbacks
    • application/database scaling
  • PostgreSQL server upgrades:
    • minor
    • major
  • Database migrations
  • Hardware upgrades
  • Scaling the cluster by adding and removing server nodes

Maintenance activities such as logical backups and PostgreSQL minor upgrades are performed at regular intervals.

Plan for space utilization requirements of logical dumps and WAL archives.

In regards to logical backups: it can be difficult to justify backing up an entire database when it can take a week. Alternatively, differential backups are a potential solution. Backing up tables that are updated and deleted regularly can be archived at a faster frequency than the slower changing tables which can be stored without changes for a longer period of time. This approach however requires the appropriate architectural design considerations such as using table partitioning. 

An alternative to logical backups is to consider Copy On Write (COW), or stacked file systems, such as ZFS and BTRFS. Environments within containers for example can leverage snapshots and clones allowing for near-instant recoveries in a disaster recovery scenario.

Complex operations, such as hardware and database scaling, encompass many sub-activities and can often involve working with several teams at the same time. In this case, maintaining reference documentation is critical. Activities such as these are best tracked and planned in a Kanban, or Scrum, environment.

In regards to Disaster Recovery (DR) consider automating the following operations:

  • Recovery via Logical backups
  • Failover of a PRIMARY to a REPLICA
  • Dropping/Building a new REPLICA
  • Point In Time Recovery (PITR): rebuilding a cluster to a point in time

As an aside to PITR: instead of rebuilding an entire data cluster from scratch to a particular point in time, one can instead create a STANDBY host that is replicated on a delay and can be recovered to a particular point in time or promoted in its current state. Refer to run-time parameter recovery_min_apply_delay for more information.

In conclusion, while small databases can be managed by administrating in an ad hoc manner, the administration of a large database must always be performed using a more rigorous and conscientious approach. And what you learn from administering a large database can be carried over to administering a small one.



Hidden Cost of Foreign Key Constraints in MySQL

Hidden Cost of Foreign Key Constraints in MySQL

Hidden Cost of Foreign Key Constraints in MySQLDo you wonder if MySQL tells you the truth about writes to tables that have foreign key constraints? The situation is complex, and getting visibility on what is really happening can be a problem.

I found this issue intriguing and decided to share and highlight some examples.

Query Plan

Let us take this example table:

CREATE TABLE `product` (
  `category` int NOT NULL,
  `id` int NOT NULL,
  `price` decimal(10,0) DEFAULT NULL,
  PRIMARY KEY (`category`,`id`)

We want to know how costly an example UPDATE against this table will be:

mysql > EXPLAIN update product set id=id+1 where id=65032158 and category=3741760\G
*************************** 1. row ***************************
           id: 1
  select_type: UPDATE
        table: product
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: const,const
         rows: 1
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

This shows that just one table and one row will be considered. This looks right as we’re using a primary key lookup, and one row matches:

mysql > select * from product where id=65032158;
| category | id       | price |
|  3741760 | 65032158 |     2 |
1 row in set (0.02 sec)

In our case, this was not the entire truth as the table has FK relationships:

*************************** 1. row ***************************
            TABLE_NAME: product_order
           COLUMN_NAME: product_category
       CONSTRAINT_NAME: product_order_ibfk_1
*************************** 2. row ***************************
            TABLE_NAME: product_order
           COLUMN_NAME: product_id
       CONSTRAINT_NAME: product_order_ibfk_1
2 rows in set (0.01 sec)

The related table has a defined ON UPDATE CASCADE action linked to our table:

CREATE TABLE `product_order` (
  `product_category` int NOT NULL,
  `product_id` int NOT NULL,
  `customer_id` int NOT NULL,
  PRIMARY KEY (`no`),
  KEY `product_category` (`product_category`,`product_id`),
  KEY `customer_id` (`customer_id`),
  CONSTRAINT `product_order_ibfk_1` FOREIGN KEY (`product_category`, `product_id`) REFERENCES `product` (`category`, `id`) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT `product_order_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)

Therefore, the EXPLAIN plan completely fails to recognize this fact, and the plan tries to convince us that such an update will only change one row in our database.

Another typical method to analyze slow queries is checking the per-session status handlers. In this case, it looks like this:

mysql > flush status; update product set id=id+1 where id=65032158 and category=3741760\G
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql > show status like 'handler%';
| Variable_name              | Value |
| Handler_commit             | 2     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 2     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 1     |
| Handler_write              | 0     |
18 rows in set (0.01 sec)

Handler_update, as well as the Rows Changed information in the query outcome output, are not taking changes in referencing product_order table into account either. Without additional checks, we don’t even know that additional work has been done! 


Let’s see how foreign constraints can impact monitoring database activities.

We already know monitoring Handler_update won’t work as expected. Let’s check the InnoDB engine-related counter (on an idle MySQL instance as this is a global only counter):

mysql > show status like 'Innodb_rows_updated';
| Variable_name       | Value |
| Innodb_rows_updated | 21369 |
1 row in set (0.00 sec)

Here is our UPDATE and how many data rows it really changes:

mysql > select product_id,count(*) from product_order where product_id in (65032159,65032160) group by product_id;
| product_id | count(*) |
|   65032159 |      897 |
1 row in set (0.02 sec)

mysql > update product set id=id+1 where id=65032159 and category=3741760\G
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql > select product_id,count(*) from product_order where product_id in (65032159,65032160) group by product_id;
| product_id | count(*) |
|   65032160 |      897 |
1 row in set (0.01 sec)

mysql > show status like 'Innodb_rows_updated';
| Variable_name       | Value |
| Innodb_rows_updated | 22267 |
1 row in set (0.00 sec)

mysql > select 22267-21369;
| 22267-21369 |
|         898 |
1 row in set (0.00 sec)

This confirms that InnoDB exposes the real number of rows updated correctly here.

Child table locks are also exposed for active transactions with SHOW ENGINE INNODB STATUS (after enabling innodb_status_output_locks).

What about Performance Schema, another popular method to monitor the database? 

mysql > truncate `performance_schema`.`table_io_waits_summary_by_table`;
Query OK, 0 rows affected (0.00 sec)

mysql > select OBJECT_NAME,COUNT_UPDATE,COUNT_WRITE,COUNT_READ,COUNT_DELETE,COUNT_INSERT from `performance_schema`.`table_io_waits_summary_by_table` where OBJECT_NAME like "prod%";
| product       |            0 |           0 |          0 |            0 |            0 |
| product_order |            0 |           0 |          0 |            0 |            0 |
2 rows in set (0.00 sec)

mysql > update product set id=id+1 where id=65032159 and category=3741760\G
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql > select OBJECT_NAME,COUNT_UPDATE,COUNT_WRITE,COUNT_READ,COUNT_DELETE,COUNT_INSERT from `performance_schema`.`table_io_waits_summary_by_table` where OBJECT_NAME like "prod%";
| product       |            1 |           1 |          1 |            0 |            0 |
| product_order |            0 |           0 |          0 |            0 |            0 |
2 rows in set (0.00 sec)

mysql > select product_id,count(*) from product_order where product_id in (65032159,65032160) group by product_id;
| product_id | count(*) |
|   65032160 |      897 |
1 row in set (0.02 sec)

mysql > select OBJECT_NAME,COUNT_UPDATE,COUNT_WRITE,COUNT_READ,COUNT_DELETE,COUNT_INSERT from `performance_schema`.`table_io_waits_summary_by_table` where OBJECT_NAME like "prod%";
| product       |            1 |           1 |          1 |            0 |            0 |
| product_order |            0 |           0 |      54028 |            0 |            0 |
2 rows in set (0.00 sec)

Unfortunately, Performance Schema completely missed what happened due to the Foreign Key constraint! I think it is a serious issue and so I reported it here:


I hope I was able to draw your attention to the need to exercise caution when investigating DML queries and system load when Foreign Key constraints are used! Maybe you were surprised that a simple single row update or delete required so much time? It could be that under the hood MySQL changed thousands of rows and hid this fact from you!


MySQL Shell For VS Code – Your New GUI?

MySQL Shell For VS Code

MySQL Shell For VS CodeMySQL Shell For VS Code integrates the MySQL Shell directly into VS Code development workflows and was released last week. This extension to the popular VS Code platform enables interactive editing and execution of SQL for MySQL Databases and optionally the MySQL Database Service for several simultaneous sessions.  It is a preview release and not ready for production but it does have several features that may make the MySQL GUI of choice.


The installation itself is easy but you will need to download the code from here and not the usual places for MySQL products.  You will, of course, have to have VS Code installed first, and be warned that some of the more tantalizing links for things like documentation are not connected.

install screen

MySQL Shell for VS Code installation screen and yes, you will need VS Code installed first.


The interface is familiar to that of MySQL Workbench but seems more intuitive.  You will need to set up a connection to your server with all the usual host, user, and password information. From there you can create a session to that server.

The big change is to remember to use Control-ENTER to send commands to the MySQL instance.  The GUI allows easy SQL query editing. But be warned there are problems with the preview.  A familiar query to the world database did not work with the GUI but ran perfectly with the stand-alone version of MySQL Shell.  Other queries worked well.

The first query is wrong

The first query was not correct. The second query went a little better.

For some reason, the first query did not get sent correctly to the server.

correct answer

The correct answer to the first query.

Quick Take

This is an impressive product even though it is a preview.  This is far better than the old MySQL CLI and the GUI editing makes the very good MySQL Shell features extremely useful.


Migrating to utf8mb4: Things to Consider

Migrating to utf8mb4

Migrating to utf8mb4The utf8mb4 character set is the new default as of MySQL 8.0, and this change neither affects existing data nor forces any upgrades.

Migration to utf8mb4 has many advantages including:

  • It can store more symbols, including emojis
  • It has new collations for Asian languages
  • It is faster than utf8mb3

Still, you may wonder how migration affects your existing data. This blog covers multiple aspects of it.

Storage Requirements

As the name suggests, the maximum number of bytes that one character can take with character set utf8mb4 is four bytes. This is larger than the requirements for utf8mb3 which takes three bytes and many other MySQL character sets.

Fortunately, utf8mb3 is a subset of utf8mb4, and migration of existing data does not increase the size of the data stored on disk: each character takes as many bytes as needed. For example, any digit or letter in the Latin alphabet will require one byte. Characters from other alphabets can take up to four bytes. This can be verified with a simple test.

mysql?> set names utf8mb4;
Query OK, 0 rows affected (0,00 sec)

mysql?> CREATE TABLE charset_len( name VARCHAR(255), val CHAR(1) ) CHARACTER SET=utf8mb4;
Query OK, 0 rows affected (0,03 sec)

mysql?> INSERT INTO charset_len VALUES('Latin A', 'A'),  ('Cyrillic ?', '?'), ('Korean ?', '?'), ('Dolphin ?', '?');
Query OK, 4 rows affected (0,02 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql?> SELECT name, val, HEX(val), BIT_LENGTH(val)/8 FROM charset_len;
| name         | val  | HEX(val) | BIT_LENGTH(val)/8 |
| Latin A      | A    | 41       |            1.0000 |
| Cyrillic ?   | ?    | D090     |            2.0000 |
| Korean ?    | ?    | E389BF   |            3.0000 |
| Dolphin ?   | ?    | F09F90AC |            4.0000 |
4 rows in set (0,00 sec)

As a result, all your data that uses a maximum of three bytes would not change and you will be able to store characters that require 4-bytes encoding.

Maximum Length of the Column

While the data storage does not change, when MySQL calculates the maximum amount of data that the column can store, it may fail for some column size definitions that work fine for utf8mb3. For example, you can have a table with this definition:

mysql?> CREATE TABLE len_test(
      -> foo VARCHAR(16384)
      -> ) ENGINE=InnoDB CHARACTER SET utf8mb3;
Query OK, 0 rows affected, 1 warning (0,06 sec)

If you decide to convert this table to use the utf8mb4 character set, the operation will fail:

mysql?> ALTER TABLE len_test CONVERT TO CHARACTER SET utf8mb4;
ERROR 1074 (42000): Column length too big for column 'foo' (max = 16383); use BLOB or TEXT instead

The reason for this is that the maximum number of bytes that MySQL can store in a VARCHAR column is 65,535, and that is 21845 characters for utf8mb3 character set and 16383 characters for the utf8mb4 character set.

Therefore, if you have columns that could contain more than 16383 characters, you will need to convert them to the TEXT or LONGTEXT data type.

You can find all such columns if you run the query:

FROM information_schema.columns
      DATA_TYPE NOT LIKE '%text%' AND 
      DATA_TYPE NOT LIKE '%blob%' AND
      TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema');

For example, in my test environment, it returns:

*************************** 1. row ***************************
DATA_TYPE: varchar
1 row in set (0,02 sec


Index Storage Requirement

MySQL does not know in advance which characters you will store in the column when you are creating indexes. Therefore, when it calculates the storage required for the index, it takes the maximum value for the character set chosen. As a result, you may hit the index storage limit when converting from another character set to utf8mb4. For InnoDB, the maximum size of the index is 767 bytes for REDUNDANT and COMPACT row formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats. See The User Reference Manual for details.

That means you need to check if you have indexes that could grow to exceed these values before performing the update. You can do this with the following query:

WITH indexes AS (
     WITH tables AS  (
          SELECT SUBSTRING_INDEX(t.NAME, '/', 1) AS `database`, SUBSTRING_INDEX(t.NAME, '/', -1) AS `table`, i.NAME AS `index`, ROW_FORMAT
          FROM information_schema.INNODB_INDEXES i JOIN information_schema.INNODB_TABLES t USING(TABLE_ID)
    SELECT `database`, `table`, `index`, ROW_FORMAT, GROUP_CONCAT(kcu.COLUMN_NAME) AS columns,
           SUM(c.CHARACTER_MAXIMUM_LENGTH) * 4 AS index_len_bytes
    FROM tables JOIN information_schema.KEY_COLUMN_USAGE kcu
         ON (`database` = TABLE_SCHEMA AND `table` = kcu.TABLE_NAME AND `index` = kcu.CONSTRAINT_NAME)
         JOIN information_schema.COLUMNS c ON (kcu.COLUMN_NAME = c.COLUMN_NAME AND `database` = c.TABLE_SCHEMA AND `table` = c.TABLE_NAME)
    GROUP BY `database`, `table`, `index`, ROW_FORMAT ORDER BY index_len_bytes
) SELECT * FROM indexes WHERE index_len_bytes >= 768;

Here is the result of running the query in my test environment:

| database | table        | index   | ROW_FORMAT | columns    | index_len_bytes |
| cookbook | hitcount     | PRIMARY | Dynamic    | path       |            1020 |
| cookbook | phrase       | PRIMARY | Dynamic    | phrase_val |            1020 |
| cookbook | ruby_session | PRIMARY | Dynamic    | session_id |            1020 |
3 rows in set (0,04 sec)

Once you have identified such indexes, check the columns and adjust the table definition accordingly.

Note: The query uses CTE, available as of MySQL 8.0. If you are still on version 5.7 or earlier, you will need to rewrite the query.

Temporary Tables

One more issue you can hit after converting to the utf8mb4 character set is an increased size of the implicit temporary tables that MySQL creates to resolve queries. Since utf8mb4 may store more data than other character sets, the column size of such implicit tables will also be bigger. To figure out if you are affected by this issue, watch the global status variable Created_tmp_disk_tables. If this starts significantly increasing after the migration, you may consider updating RAM on your machine and increasing the maximum size of the temporary tables. Note that this issue could be a symptom that some of your queries are poorly optimized.


Converting to the utf8mb4 character set brings you the advantages of better performance, a larger range of characters that you can use, including emojis and new collations (sorting rules). This conversion comes at almost no price, and it can be done smoothly.


  • You converted all VARCHAR columns that could store more than 16383 characters to the TEXT or LONGTEXT data type
  • You adjusted index definitions that could take more than 767 bytes for the REDUNDANT and COMPACT row formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats after migration.
  • You optimized your queries so that they should not start using internal disk-based temporary tables

Powered by WordPress | Theme: Aeros 2.0 by