Apr
12
2021
--

Replay the Execution of MySQL With RR (Record and Replay)

MySql Record and Replay

MySql Record and ReplayChasing bugs can be a tedious task, and multi-threaded software doesn’t make it any easier. Threads will be scheduled at different times, instructions will not have deterministic results, and in order for one to reproduce a particular issue, it might require the exact same threads, doing the exact same work, at the exact same time. As you can imagine, this is not straightforward.

Let’s say your database is crashing or even having a transient stall.  By the time you get to it, the crash has happened and you are stuck restoring service quickly and doing after-the-fact forensics.  Wouldn’t it be nice to replay the work from right before or during the crash and see exactly what was happening?

Record and Replay is a technique where we record the execution of a program allowing it to be replayed over and over producing the same result. Engineers at Mozilla have created RR, and basically, this open source tool allows you to record the execution of the software and replay it under the well-known GDB.

A Backup Problem

To demonstrate how powerful the tool is, we will be walking through how we used it to narrow down the issue from PXB-2180 (Special thanks to Satya Bodapati, who helped with all the InnoDB internals research for this bug). 

In summary, we were seeing Percona XtraBackup crashing at the prepare stage (remember, always test your backup!). The crash was happening randomly, sometimes after the second incremental, sometimes after the 10th incremental, with no visible pattern.

The stack trace was also not always the same. It was crashing on different parts of InnoDB, but here we had one commonality from all crashes – it always happened while trying to apply a redo log record to the same block page and space id:

#12 0x00000000015ad05f in recv_parse_or_apply_log_rec_body (type=MLOG_COMP_REC_INSERT, ptr=0x7f2849150556 "\003K4G", '\377' <repeats 13 times>, end_ptr=0x7f2849150573 "", space_id=<optimized out>, page_no=<optimized out>, block=0x7f2847d7da00, mtr=0x7f286857b4f0, parsed_bytes=18446744073709551615) at /home/marcelo.altmann/percona-xtrabackup/storage/innobase/log/log0recv.cc:2002
2002         ptr = page_cur_parse_insert_rec(FALSE, ptr, end_ptr, block, index, mtr);
(gdb) p block->page->id
+p block->page->id
$3 = {
  m_space = 4294967294,
  m_page_no = 5
}

Our suspicion was that the page layout on this block diverged between MySQL and XtraBackup. When working with these types of bugs, the crash is always the consequence of something that happened earlier, eg.: a crash on the sixth incremental backup could be the consequence of an issue that happened on the fourth incremental. 

The main goal at this step is to prove and identify where the page layout has diverted.

With this information, we ran MySQL under RR and reran the backup until we saw the same issue at prepare. We can now replay the MySQL execution and check how it compares. Our idea is to:

  1. Read the LSNs for this same page before/after each backup prepare.
  2. Identify all changes to  m_space = 4294967294 & m_page_no = 5 at mysqld.

Before we progress further, let’s explain a few things:

  1. m_space = 4294967294 correspond to the MySQL data dictionary (mysql.ibd) – dict0dict.h:1146
  2. On disk page, LSN is stored at the 16th byte of the page and has a size of 8 bytes – fil0types.h:66
  3. Pages are written sequentially to disk, as an example, for the default 16k page size, from bytes 1 to 16384 will have the data for page 0, from byte 16385 to 32768 data from page 1, and so on. 
  4. Frame is raw data of a page – buf0buf.h:1358

Replaying the Execution

To start, let’s read what LSN we have on mysql.ibd for page five before the backup. We will be using od (check man od for more information) and the information explained above:

$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 10 21 85
0240030

And check if it matches an LSN stamp from mysqld. For that we will add a conditional breakpoint on the replay execution of MySQL at function buf_flush_note_modification:

$ rr replay .
. . .
(rr) b buf_flush_note_modification if block->page->id->m_space == 4294967294 && block->page->id->m_page_no == 5
+b buf_flush_note_modification if block->page->id->m_space == 4294967294 && block->page->id->m_page_no == 5
Breakpoint 1 at 0x495beb1: file /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic, line 69.
(rr) c
[Switching to Thread 18839.18868]

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=17892965, end_lsn=17893015, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$1 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x10,
  [0x6] = 0x21,
  [0x7] = 0x85}
(rr)

We can see the LSN stamp from before the preparation of full backup and the first stamp from the replay session match. Time to prepare the backup, advance the replay execution, and recheck:

xtrabackup --prepare --apply-log-only --target-dir=full/
. . .
Shutdown completed; log sequence number 17897577
Number of pools: 1
210402 17:46:29 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 11 07 06
0240030


(rr) c
+c
Continuing.
[Switching to Thread 18839.18868]

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=19077332, end_lsn=19077382, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$16 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x11,
  [0x6] = 0x7,
  [0x7] = 0x6}
(rr)

Same LSN stamp on both, server and backup. Time to move on and start to apply the incrementals:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc1/
. . .
Shutdown completed; log sequence number 19082430
. . .
210402 18:12:20 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 23 19 06
0240030


(rr) c
+c
Continuing.
Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=20262758, end_lsn=20262808, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$17 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x23,
  [0x6] = 0x19,
  [0x7] = 0x6}
(rr)

Once again, we have a matching LSN stamp on both sides. Moving to the next incremental:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc2/
. . .
Shutdown completed; log sequence number 20269669
. . .
210402 18:15:04 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 35 2f 98
0240030


(rr) c
+c
Continuing.

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=21449997, end_lsn=21450047, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$18 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x35,
  [0x6] = 0x2f,
  [0x7] = 0x98}
(rr)

Incremental two applied and matching LSN stamp from mysqld. Let’s keep doing this until we find a mismatch:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc3/
. . .
Shutdown completed; log sequence number 21455916
. . .
210402 18:18:25 completed OK!


$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 47 4d 3f
0240030


(rr) c
+c
Continuing.

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=25529471, end_lsn=25529521, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$19 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x47,
  [0x6] = 0x4d,
  [0x7] = 0x3f}
(rr)


xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc4/
. . .
Shutdown completed; log sequence number 23044902
. . .
210402 18:24:00 completed OK!

$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 5f a3 26
0240030


(rr) c
+c
Continuing.

Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=27218464, end_lsn=27218532, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69
69     ut_ad(!srv_read_only_mode ||
++rr-set-suppress-run-hook 1
(rr) p/x block->frame[16]@8
+p/x block->frame[16]@8
$242 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x9f,
  [0x6] = 0x3f,
  [0x7] = 0xc9}
(rr)

Ok, here we have something. Backup files jumped from 0x01474d3f to 0x015fa326  when applying incremental four while the server moved from 0x01474d3f to 0x019f3fc9 . Perhaps we missed some other place where we can update the LSN stamp of a page? But now, we are at a point in the future with our replay execution of the MySQL server.

Replaying the Execution Backward

Here is (yet) another very cool feature from RR, it allows you to replay the execution backward. To eliminate the possibility of missing a place that is also updating the LSN of this block, let’s add a hardware watchpoint on the block->frame memory address and reverse the execution:

(rr) p block->frame
+p block->frame
$243 = (unsigned char *) 0x7fd2e0758000 "\327\064X["
(rr) watch *(unsigned char *) 0x7fd2e0758000
+watch *(unsigned char *) 0x7fd2e0758000
Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000
(rr) disa 1
+disa 1
(rr) reverse-cont
+reverse-cont
+continue
Continuing.
Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000

Old value = 215 '\327'
New value = 80 'P'

0x0000000004c13903 in mach_write_to_4 (b=0x7fd2e0758000 "P\257\"\347", n=3610531931) at /home/marcelo.altmann/percona-server/storage/innobase/include/mach0data.ic:135
135   b[0] = static_cast<byte>(n >> 24);
++rr-set-suppress-run-hook 1
++rr-set-suppress-run-hook 1
(rr) p/x buf_flush_init_for_writing::block->frame[16]@8
+p/x buf_flush_init_for_writing::block->frame[16]@8
$11 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x9f,
  [0x6] = 0x3f,
  [0x7] = 0xc9}
(rr) reverse-cont
+reverse-cont
+continue
Continuing.
Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000

Old value = 80 'P'
New value = 43 '+'
0x0000000004c13903 in mach_write_to_4 (b=0x7fd2e0758000 "+k*\304", n=1353655015) at /home/marcelo.altmann/percona-server/storage/innobase/include/mach0data.ic:135
135   b[0] = static_cast<byte>(n >> 24);
++rr-set-suppress-run-hook 1
++rr-set-suppress-run-hook 1
(rr) p/x buf_flush_init_for_writing::block->frame[16]@8
+p/x buf_flush_init_for_writing::block->frame[16]@8
$12 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x1,
  [0x5] = 0x47,
  [0x6] = 0x4d,
  [0x7] = 0x3f}
(rr)

By replaying the execution backward we can see that indeed the server changed the LSN  from 0x01474d3f to 0x019f3fc9. This confirms the issue is at incremental backup four as the LSN 0x015fa326 we see at end of incremental four was never a valid LSN at the server execution.

Root Cause

Now that we have limited the scope from six backups to a single one, things will become easier.

If we look closely at the log messages from the –prepare of the backup we can see that LSN of mysql.ibd matches the LSN stamp at the end of the backup:

xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc4/
. . .
Shutdown completed; log sequence number 23044902
. . .
210402 18:24:00 completed OK!

$ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd
0240020 00 00 00 00 01 5f a3 26
0240030


$ echo $(( 16#015fa326 ))
23044902

By checking the stack trace of the issue and examining further the block we have parsed we can see that this is innodb_dynamic_metadata index:

(gdb) f 13
+f 13
#13 0x00000000015af3dd in recv_recover_page_func (just_read_in=just_read_in@entry=true, block=block@entry=0x7f59efd7da00) at /home/marcelo.altmann/percona-xtrabackup/storage/innobase/log/log0recv.cc:2624
2624       recv_parse_or_apply_log_rec_body(recv->type, buf, buf + recv->len,
(gdb) p/x block->frame[66]@8
+p/x block->frame[66]@8
$4 =   {[0x0] = 0x0,
  [0x1] = 0x0,
  [0x2] = 0x0,
  [0x3] = 0x0,
  [0x4] = 0x0,
  [0x5] = 0x0,
  [0x6] = 0x0,
  [0x7] = 0x2}

You might be wondering where 66 came from; this is from examining position FIL_PAGE_DATA + PAGE_INDEX_ID. That gave us index ID 2. This is below 1024, which is reserved for Data Dictionary tables. By checking what is the second table on that list, we can see that it’s innodb_dynamic_metadata. With all this information summed up we can look at what the server does at shutdown, and it becomes clear what the issue is:
srv0start.cc:3965

/** Shut down the InnoDB database. */
void srv_shutdown() {
  . . .

  /* Write dynamic metadata to DD buffer table. */
  dict_persist_to_dd_table_buffer();
. . .
}

As part of the shutdown process, we are persisting dirty metadata back to the DD Buffer table (innodb_dynamic_metadata), which is wrong. Those changes will likely be persisted by the server and redo logged once the server performs a checkpoint. Also, more data can be merged together by the point of when the backup was taken and when the server itself persists this data to DD Tables. This is a result of the implementation of WL#7816 and WL#6204 which required Percona XtraBackup to change how it handles these types of redo records.

Summary

In this blog, we walked through the process of analyzing a real Percona XtraBackup bug. This bug exposes a challenge we face in various types of bugs, where the crash/malfunction is a consequence of something that happened way before, and by the time we have a stack trace/coredump, it is too late to perform a proper analysis. Record and Replay enabled us to consistently replay the execution of the source server, making it possible to narrow down the issue to where the root cause was. 


Percona XtraBackup is a free, open source, complete database backup solution for all versions of Percona Server for MySQL and MySQL

Mar
30
2021
--

How To Automate Dashboard Importing in Percona Monitoring and Management

Automate Dashboard Importing in Percona Monitoring and Management

Automate Dashboard Importing in Percona Monitoring and ManagementIn this blog post, I’ll look at how to import custom dashboards into Percona Monitoring and Management (PMM) 2.x, and give some tips on how to automate it.

The number of dashboards in PMM2 is constantly growing. For example, we recently added a new HAProxy dashboard to the latest 2.15.0 release. Even though the PMM server has more than fifty dashboards, it’s not possible to cover all common server applications.

The greatest source of dashboards is the official Grafana site. Here, anyone can share their own dashboards with the community or find already uploaded ones. Percona has its own account and publishes as-yet-unreleased or unique (non-PMM) dashboards.

Each dashboard has its own number which can be used to refer to it. For example, 12630 is assigned to the dashboard “MySQL Query Performance Troubleshooting”.
Percona Monitoring and Management Dashboard

You can download dashboards as JSON files and import them into your PMM2 installation using the UI.

PMM2
This is easy, but we are forgetting that dashboards can be updated by publishers as new revisions. So it’s possible that the dashboard has a bunch of useful changes that were published after you downloaded it. But, you keep using an old version of the dashboard.

So the only way to use the latest dashboard version is to check the site from time to time. It can really be a pain in the neck, especially if you have to track more than one dashboard.

This is why it’s time to take a look at automation. Grafana has a very powerful API that I used to create this shell script. Let’s take a peek at it. It’s based on the api/dashboards/import API function. The function requires a POST request with a dashboard body.

The first step is to download a dashboard.

curl -s https://grafana.com/api/dashboards/12630/revisions/1/download --output 12630_rev1.json

Note how I used dashboard number 12630 and revision 1 in the command. By increasing the revision number I can find out the latest available dashboard version. This is exactly the approach used in the script.

In the next example, I’ll use a dashboard from our dashboard repository. (I will explain why later.)

curl -L -k https://github.com/percona/grafana-dashboards/raw/PMM-2.0/dashboards/Disk_Details.json --output Disk_Details.json

Now I have a file and can form a POST request to import the dashboard into a PMM installation.

$ curl -s -k -X POST -H "Content-Type: application/json" -d "{\"dashboard\":$(cat Disk_Details.json),\"overwrite\":true}" -u admin:admin https://18.218.63.13/graph/api/dashboards/import


The dashboard has been uploaded. If you take a look at the output you may notice the parameter folderId. With this, it’s possible to specify a Grafana folder for my dashboards.

Here is the command for fetching a list of existing folders.

curl -s -k -u admin:admin https://172.20.0.1/graph/api/folders

I now have folder IDs and can use them in the importing command. The Folder ID should be specified in a POST request as shown in the next example.


Now you are familiar with API import commands, I’ll give you a closer look at community dashboards.

Most of them have the parameter “Data Sources”.
It means that for dashboard importing, you have to specify the data source names assigned by your installation.


This point makes it impossible to import any downloaded dashboards with the API without modifying them. If I execute the import command used earlier (the 12630_rev1.json file downloaded from Grafana.com) I will get an error.


So, here’s another script (cleanup_dash.py) that replaces the datasource fields in dashboards and allows me to pass an importing command. The script takes a dashboard file name as a parameter.


The importing script calls cleanup-dash.py automatically if an initial importing attempt was unsuccessful.

Note the parameters of the importing script. Here you should set the details of your PMM installation. dashboards is an array of dashboards IDs that you want to import into PMM2.

#!/bin/bash
dashboards=(13266 12630 12470)
pmm_server="172.20.0.1"
user_pass="admin:admin"
folderName="General"

Now, you should download both scripts and try to import dashboards. Make sure that both scripts are executable and in the same folder. Here are the commands to do it.

curl -LJOs https://github.com/Percona-Lab/pmm-dashboards/raw/master/misc/import-dashboard-grafana-cloud.sh --output import-dashboard-grafana-cloud.sh
curl -LJOs https://github.com/Percona-Lab/pmm-dashboards/raw/master/misc/cleanup-dash.py --output cleanup-dash.py

chmod a+x import-dashboard-grafana-cloud.sh
chmod a+x cleanup-dash.py

./import-dashboard-grafana-cloud.sh

You can next find the imported dashboards in your PMM installation. They were put into the ‘Insight’ folder and can be found by the keyword ‘PMM2’.

imported PMM dashboards

By default, the script imports all designed for PMM2 dashboards from Percona account. Also, folder names and dashboard IDs can be specified as parameters for the script.

Here are some usage examples:

import-dashboard-grafana-cloud.sh Default list of dashboards will be uploaded into General folder
import-dashboard-grafana-cloud.sh Insight Default list of dashboards will be uploaded into Insight folder
import-dashboard-grafana-cloud.sh 13266 12630 12470 Dashboards 13266 12630 12470 will be uploaded into General folder
import-dashboard-grafana-cloud.sh Insight 13266 12630 12470 Dashboards 13266 12630 12470 will be uploaded into Insight folder

You can define any number of dashboards in the script parameters and run the script periodically to always have the most recent dashboard versions.


Percona Monitoring and Management is free to download and use. Try it today!

Mar
26
2021
--

Overview of MySQL Alternative Storage Engines

MySQL Alternative Storage Engines

MySQL Alternative Storage EnginesFor MySQL, MyISAM and InnoDB storage engines are very popular. Currently, we are mostly using InnoDB engines for high reliability and high performance. Apart from those engines, we also have some other alternative engines and they have some nice features in them. In this blog, I am going to explain some of those engines, which I have listed below. 

  • FEDERATED Storage Engine
  • Merge or MRG_MyISAM Engine
  • Blackhole Engine
  • CSV Engine

FEDERATED Storage Engine

Overview:

  • FEDERATED Storage Engine allows you to access the data remotely without replication and cluster technologies. 
  • Using the FEDERATED tables, you can scale your server load. Queries for the given table will be sent over the network to another MySQL instance. In this case, to scale the DB, you can use many MySQL instances without changing the application code.
  • FEDERATED tables are a security concern because you will need to save the host and user information in the table. It can be viewed using SHOW CREATE TABLE command.
  • Query optimization is limited and JOINs are slow.
  • Doing the bulk transaction may crash the local server.

By default, FEDERATED Storage Engine support is disabled. To enable it, you need to manually enable the variable “federated = ON” in the MySQL config file and restart the MySQL service. 

mysql> select * from information_schema.engines where engine='federated'\G
*************************** 1. row ***************************
      ENGINE: FEDERATED
     SUPPORT: NO
     COMMENT: Federated MySQL storage engine
TRANSACTIONS: NULL
          XA: NULL
  SAVEPOINTS: NULL
1 row in set (0.00 sec)

#vi /etc/my.cnf 
[mysqld]
federated = ON

[root@mass ~]# service mysqld restart
Redirecting to /bin/systemctl restart mysqld.service
[root@mass ~]#

mysql> select * from information_schema.engines where engine='federated'\G
*************************** 1. row ***************************
      ENGINE: FEDERATED
     SUPPORT: YES
     COMMENT: Federated MySQL storage engine
TRANSACTIONS: NO
          XA: NO
  SAVEPOINTS: NO
1 row in set (0.00 sec)

How Does it work?

  • FEDERATED tables need to be created on a local server and the remote table needs to be created on a remote server.
  • Make sure that you have the MySQL port and user access between the local and remote servers.
  • Remote tables can be created as MyISAM or InnoDB storage engines.
  • The FEDERATED table will not store any data. Data will be stored on the remote server.
  • Both local and remote servers should have the same columns and structure.
  • You can execute the query on both local or remote servers to modify or retrieve the data.

Example

I have two servers:

  • 172.28.128.16 (local server)
  • 172.28.128.17 (remote server)

On the local server, I am creating the FEDERATED table:

mysql> create table fed_source(id int, name varchar(16)) engine=federated connection="mysql://fed:Fede4!i&1@172.28.128.17/percona/fed_destination";
Query OK, 0 rows affected (0.02 sec)

mysql> show create table fed_source\G
*************************** 1. row ***************************
       Table: fed_source
Create Table: CREATE TABLE `fed_source` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(16) DEFAULT NULL
) ENGINE=FEDERATED DEFAULT CHARSET=latin1 CONNECTION='mysql://fed:Fede4!i&1@172.28.128.17/percona/fed_destination'
1 row in set (0.00 sec)

Syntax is:

 “connection=mysql://<user>:<password>@<remote_host_ip>/<remote_database>/<remote_table>”

On the remote server, I am creating the table with InnoDB engine:

mysql> create table fed_destination(id int, name varchar(16));
Query OK, 0 rows affected (0.00 sec)

mysql> show create table fed_destination\G
*************************** 1. row ***************************
       Table: fed_destination
Create Table: CREATE TABLE `fed_destination` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(16) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

As I mentioned earlier, the data will be physically stored on the remote server. FEDERATED tables will not store the data. From the below example, you can see the data file (.ibd) was created on the remote server and the local server just has the table structure file ( .frm ).

Local server:

[root@mass percona]# pwd
/var/lib/mysql/percona
[root@mass percona]# ls -lrth
total 16K
-rw-r-----. 1 mysql mysql 8.4K Mar 19 18:00 fed_source.frm

Remote server:

root@repl percona]# pwd
/var/lib/mysql/percona
[root@repl percona]# ls -lrth
total 112K
-rw-r-----. 1 mysql mysql 8.4K Mar 19 18:00 fed_destination.frm
-rw-r-----. 1 mysql mysql  96K Mar 19 18:01 fed_destination.ibd

Let’s do this experiment. On the local server, I am inserting the record. 

mysql> insert into fed_source values (1,'herc');
Query OK, 1 row affected (0.00 sec)

mysql> select * from fed_source;
+------+------+
| id   | name |
+------+------+
|    1 | herc |
+------+------+
1 row in set (0.00 sec)

And on the remote server:

mysql> select * from fed_destination;
+------+------+
| id   | name |
+------+------+
|    1 | herc |
+------+------+
1 row in set (0.00 sec)

Now, I am going to update the data on a remote server.

mysql> update fed_destination set name='hercules7sakthi' where name='herc';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from fed_destination;
+------+-----------------+
| id   | name            |
+------+-----------------+
|    1 | hercules7sakthi |
+------+-----------------+
1 row in set (0.00 sec)

At the local server:

mysql> select * from fed_source;
+------+-----------------+
| id   | name            |
+------+-----------------+
|    1 | hercules7sakthi |
+------+-----------------+
1 row in set (0.00 sec)

It seems that you can execute the query on both local and remote servers. The FEDERATED Engine is mostly supported for data manipulation languages (INSERT/UPDATE/DELETE/TRUNCATE).

Merge or MRG_MyISAM Engine

Overview:

  • The collection of identical MyISAM tables can be used as a single table for better performance.
  • Only supported for MyISAM tables.
  • Merge tables will use more file descriptors. 
  • You can’t perform the FULL TEXT SEARCH using the merge tables.
  • Merge tables used extremely rare since partitions came around.

How Does it work?

  • It works only for the MyISAM tables.
  • The columns order, index, data types should be the same on all the tables.

Example

I have created two tables:

mysql> create table merge_1(id int, name varchar(16)) engine = myisam;
Query OK, 0 rows affected (0.00 sec)

mysql> create table merge_2(id int, name varchar(16)) engine = myisam;
Query OK, 0 rows affected (0.01 sec)

Inserting some data on both tables:

mysql> insert into merge_1 values (1,'herc'),(2,'sakthi'),(3,'sri');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into merge_2 values (4,'jc'),(5,'xxx'),(3,'yyy');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

Now, creating the merge table:

mysql> create table merge_1_and_2 (id int, name varchar(16)) engine = mrg_myisam union=(merge_1,merge_2);
Query OK, 0 rows affected (0.01 sec)

Let’s query the merge table:

mysql> select * from merge_1_and_2;
+------+--------+
| id   | name   |
+------+--------+
|    1 | herc   |
|    2 | sakthi |
|    3 | sri    |
|    4 | jc     |
|    5 | xxx    |
|    3 | yyy    |
+------+--------+
6 rows in set (0.00 sec)

It seems, when I query the merge table, it merges both the tables (merge_1, merge_2) and displays the results. 

Physically, the MERGE table will not occupy any disk space. When querying the table, it will just merge the data from the configured tables and display the result.

[root@mass percona]# ls -lrth
total 72K

-rw-r-----. 1 mysql mysql 8.4K Mar 19 18:44 merge_1.frm
-rw-r-----. 1 mysql mysql 1.9K Mar 19 18:51 merge_1.MYD
-rw-r-----. 1 mysql mysql 1.0K Mar 19 18:51 merge_1.MYI

-rw-r-----. 1 mysql mysql 8.4K Mar 19 18:44 merge_2.frm
-rw-r-----. 1 mysql mysql 1.6K Mar 19 18:51 merge_2.MYD
-rw-r-----. 1 mysql mysql 1.0K Mar 19 18:51 merge_2.MYI

-rw-r-----. 1 mysql mysql 8.4K Mar 19 18:48 merge_1_and_2.frm
-rw-r-----. 1 mysql mysql   16 Mar 19 18:48 merge_1_and_2.MRG

Blackhole Engine

Overview: 

  • Blackhole Engine will accept the data from SQL. The accepted data will not be stored, whenever you are querying the data it will give the empty result.
  • Can be used for SQL syntax checking purposes. 
  • Can be used for the replication filter purpose. 
  • You have to be very careful when you use the table in a replication environment. Because the SQL will be logged in the binary log.

How Does it work?

Example

Creating the blackhole table:

mysql> create table black_hole (id int, name varchar(16)) engine = blackhole;
Query OK, 0 rows affected (0.00 sec)

Inserting and retrieving the data:

mysql> insert into black_hole values (1,'sri'),(2,'jc');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from black_hole;
Empty set (0.00 sec)

The data will be stored on the binary logs:

# at 23445
#210319 19:19:15 server id 10  end_log_pos 23497 CRC32 0x36e22a05       Write_rows: table id 115 flags: STMT_END_F
### INSERT INTO `percona`.`black_hole`
### SET
###   @1=1 /* INT meta=0 nullable=1 is_null=0 */
###   @2='sri' /* VARSTRING(16) meta=16 nullable=1 is_null=0 */
### INSERT INTO `percona`.`black_hole`
### SET
###   @1=2 /* INT meta=0 nullable=1 is_null=0 */
###   @2='jc' /* VARSTRING(16) meta=16 nullable=1 is_null=0 */
# at 23497
#210319 19:19:15 server id 10  end_log_pos 23573 CRC32 0x4d79cba4       Query   thread_id=5     exec_time=0     error_code=0
SET TIMESTAMP=1616181555/*!*/;
COMMIT

Syntax Checking Purposes

If you want to check any syntax of the SQL statements, you can directly execute them against the blackhole tables as it is not going to do anything with the data.

Replication Filter Purpose

Let’s consider that I have a source-replica setup. At the source, I have created the below table.

mysql> create table test_blackhole(id int, name varchar(16)) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

I don’t want to replicate this table to replica nodes. In this case, I just converted the table to BLACKHOLE engine on the replica node. 

At replica node:

mysql> alter table test_blackhole engine=blackhole;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test_blackhole\G
*************************** 1. row ***************************
       Table: test_blackhole
Create Table: CREATE TABLE `test_blackhole` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(16) DEFAULT NULL
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Now, at source, I am inserting some records:

mysql> insert into test_blackhole values (1,'aaa');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test_blackhole;
+------+------+
| id   | name |
+------+------+
|    1 | aaa  |
+------+------+
1 row in set (0.00 sec)

At replica, the data is not available. The data has been ignored as the table was converted to the blackhole engine.

mysql> select * from test_blackhole;
Empty set (0.00 sec)

CSV Engine

Overview: 

  • The CSV storage engine stores data in csv files. 
  • If you need the data into a CSV file, you can just copy the table physical file and use it. No need to export the data using the command SELECT INTO OUTFILE.
  • It will not support nullable columns.
  • The data will store comma-separated values.

Example

Creating the CSV table:

mysql> create table csv_test (id int not null, name varchar(16) not null, cur_time datetime default current_timestamp not null) engine = csv;
Query OK, 0 rows affected (0.00 sec)

Inserting data:

mysql> insert into csv_test (id,name) values (1,'jc'),(2,'sri'),(3,'herc');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from csv_test;
+----+------+---------------------+
| id | name | cur_time            |
+----+------+---------------------+
|  1 | jc   | 2021-03-19 19:40:40 |
|  2 | sri  | 2021-03-19 19:40:40 |
|  3 | herc | 2021-03-19 19:40:40 |
+----+------+---------------------+
3 rows in set (0.00 sec)

Physically, you can see the data is stored as the .csv file. You can view the data from the file itself. 

[root@mass percona]# ls -lrth | grep -i csv
-rw-r-----. 1 mysql mysql 8.5K Mar 19 19:38 csv_test.frm
-rw-r-----. 1 mysql mysql   90 Mar 19 19:40 csv_test.CSV
-rw-r-----. 1 mysql mysql   35 Mar 19 19:40 csv_test.CSM

[root@mass percona]# cat csv_test.CSV
1,"jc","2021-03-19 19:40:40"
2,"sri","2021-03-19 19:40:40"
3,"herc","2021-03-19 19:40:40"

As you see, MySQL alternative engines are having some good features. Based on my point of view, I would not suggest having them on production until finding a valid reason. But, it is still good to know about those engines and understand their features.


Percona Distribution for MySQL: An enterprise-grade solution for your most critical business applications.

Mar
25
2021
--

The Steps Involved in Creating a Percona Product Release

creating a percona product release

Have you ever wondered what it takes to complete Percona Server for MySQL (PS), Percona XtraDB Cluster (PXC), and Percona XtraBackup (PXB) releases? 

Let’s step back just a minute and talk about what Percona stands for. We believe we “stand on the shoulders of giants.” This means we respect our upstream sources and work to add value to the base products. Over time, new functionality is added. Much of this value-add was implemented on the 5.7 series and pulled forward to the 8.0 series. Each time we receive an upstream release, we must reapply these features to the release we receive. This process is what we call the Merge Process. There are close to 40 add-on features being maintained with each release. 

Creating a Percona Product Release

The Merge Process

The Merge Process is completed first for Percona Server for MySQL (PS), and then the merged product is used as the basis for merging the release for Percona XtraDB Cluster (PXC).  In addition, Percona XtraBackup (PXB) is based on Oracle’s MySQL. 

The merge process actually applies the Oracle MySQL commits for a received release to the base Percona Server for MySQL (PS) code. Then the validation process is begun.

For each Oracle commit, an Engineer reviews the code, what it will do, and verifies there will be no adverse effect to the base PS code. In some cases, the Oracle code will replace the existing Percona Server for MySQL (PS) code (Percona may have added the same functionality previously or fixed a bug) or could cause conflicts with the existing Percona code. Each situation is then evaluated and resolved. For the last three 8.0 releases, there have been 2000-2500 commits and 50-100 for the 5.7 base in each Oracle release.

Validation and Testing

Once all commits have been applied to the Percona Server for MySQL (PS) previous base, the validation/testing process begins. This process uses automated testing to validate the new codebase. We execute 1500-2000 regression-type test cases in addition to specifically validating bug fixes and new functionality. This process can take anywhere from 5-10 days for a 5.7 merge and between 35-40 days for an 8.0 merge.

Once the Percona Server for MySQL (PS) merge is completed, the new base is used for the Percona XtraDBCluster (PXC) merge, which combines the new Percona Server for MySQL (PS) base with any Percona XtraDBCluster (PXC) specific functionality, using the same process as Percona Server for MySQL (PS). In addition, if a Codership (Galera) release is also available, we go through a slightly different process. In the case of Codership, beginning with the 8.0.19 release, there are no more git commits (just a source tarball), and we have to use release notes to point to the places we need to look for changes. The 5.7 codebase uses Galera 3.0 (still use committed work), and the 8.0 codebase uses the 4.x base (currently on 4.6, soon to go to 4.7). This also includes the same level of validation. This process again can take anywhere from 5-10 days for a 5.7 merge and between 35-40 days for an 8.0 merge.

The last of the merge processes is to evaluate and complete processing for Percona XtraBackup (PXB). The first thing we do when Oracle releases a new wave of its product updates is verify whether the most recent Percona XtraBackup (PXB) – the one from the previous wave – is compatible with the new MySQL Server release. If we find compatibility issues, they will be documented and resolved, and be included in the merge process. The Percona XtraBackup (PXB) merge is done in the same way as Percona Server for MySQL (PS) for a small portion of the Oracle MySQL codebase. Percona XtraBackup (PXB) uses functionality related to redo log processing, such as writes/read/parsing and checksum processing for the read/write process. This process is done concurrently with the Percona Server for MySQL (PS) process and is done for both the 2.4 and 8.0 versions of Percona XtraBackup (PXB).  

For all three products, security (CVE) modifications are purposefully not exposed individually. They are included and noted, but you will not find specific commits for them.


We understand that choosing open source software for your business can be a potential minefield. You need to select the best available options, which fully support and adapt to your changing needs. In this white paper, we discuss the key features that make open source software attractive, and why Percona’s software might be the best option for your business.

Download “When is Percona Software the Right Choice?”

Mar
23
2021
--

Is a Session Analyzer a Good Tool to Simulate Real Traffic?

Session Analyzer traffic non production

Session Analyzer traffic non productionStarting a long time ago, we wanted to reproduce workload in a non-production environment, and there were different attempts to achieve that goal (Query Playback is just one of them). But there is another point of view, where you need to write your own workload to do so.

Both Have Pros and Cons

Reproduce Workload:

Pros:

  • Simple to implement
  • Ready to go

Cons:

  • Need to rebuild the environment each time

Custom Scripts:

Pros:

  • Possible to have a more realistic workload
  • You can reuse the environment
  • You can use Sysbench that allows you to change several options in your test like increasing threads or limiting throughput

Cons:

  • You need to invest a lot of time to create the scripts to have a realistic workload

Is it Possible to Have the Best of Both Worlds?

My idea is simple; use the slow query log to get a usable template script of workload. This might sound simple, but it requires defining the steps needed:

  • We need a slow query reader
  • We need to identify each query template
  • We need to keep track of the queries executed by session, as we want to simulate the sessions

Another two important aspects will be to collect the data in the queries to create variables that will be filled up with valid functions and each session have “variables” that repeat across the whole execution. 

Slow Query Reader

It is not complex, but we need to split the query into three pieces, so a query like this:

SELECT `id`, `age` FROM `person` WHERE `birthdate` > '2004-01-09'

Will have this query template:

SELECT `id`, `age` FROM `person` WHERE `birthdate` > %s

This data template:

'%s-%s-%s'

And the real data will be:

2004 01 09

Analyzing the Data Per Query

As we are going to have queries that execute the same query template with the same data template, we can collect the real data per position and process. For instance, if we have all these queries executed:

SELECT `id`, `age` FROM `person` WHERE `birthdate` > '2004-01-24';
SELECT `id`, `age` FROM `person` WHERE `birthdate` > '2004-12-01';
SELECT `id`, `age` FROM `person` WHERE `birthdate` > '2005-05-27';
SELECT `id`, `age` FROM `person` WHERE `birthdate` > '2005-09-13';

We need to extract all this integer value to determine that in column1, values will be between 2004 and 2005, column2 values between 01 and 12, and for column3, between 01 and 27.

The same analysis should be done to consider:

  • Integers and if they have some specific distribution
  • Alpha or Alphanumerics value and their length
  • Hexadecimal

Analyzing the Workload

As we want to reproduce the real workload, the queries need to be sent in the same order that they were executed by the application and we need to process the variables. For instance, if we have a session executing:

SELECT `id`, `age` FROM `person` WHERE `birthdate` < '2000-01-01';
UPDATE `person` SET `col1`=5050 WHERE `id` = 10;
UPDATE `person` SET `col2`='another value' WHERE `id` = 10;

We need to identify the 10 as an app-level variable for the session that could be inside other values and strings. With this information, we can develop a session template, and we need to summarize this information as most of the time the applications execute the same queries in the same order but with different data.

Merging the Information Collected

Once we have the functions that generate the values to fill up the queries templates and the sessions template, we are able to simulate the workload. Sysbench will be able to get the data in files that are generated dynamically and send it to the database. 

What’s Next?

I worked on a tool that does it, but perl and bash were not the best choices, so, now I’m already working on a tool written in C and using GLib. Hopefully, it will be published soon.

Mar
22
2021
--

Storing Kubernetes Operator for Percona Server for MongoDB Secrets in Github

storing kubernetes MongoDB secrets github

storing kubernetes MongoDB secrets githubMore and more companies are adopting GitOps as the way of implementing Continuous Deployment. Its elegant approach built upon a well-known tool wins the hearts of engineers. But even if your git repository is private, it’s strongly discouraged to store keys and passwords in unencrypted form.

This blog post will show how easy it is to use GitOps and keep Kubernetes secrets for Percona Kubernetes Operator for Percona Server for MongoDB securely in the repository with Sealed Secrets or Vault Secrets Operator.

Sealed Secrets

Prerequisites:

  • Kubernetes cluster up and running
  • Github repository (optional)

Install Sealed Secrets Controller

Sealed Secrets rely on asymmetric cryptography (which is also used in TLS), where the private key (which in our case is stored in Kubernetes) can decrypt the message encrypted with the public key (which can be stored in public git repository safely). To make this task easier, Sealed Secrets provides the kubeseal tool, which helps with the encryption of the secrets.

Install kubeseal operator into your Kubernetes cluster:

kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.15.0/controller.yaml

It will install the controller into the kube-system namespace and provide the Custom Resource Definition

sealedsecrets.bitnami.com

. All resources in Kubernetes with

kind: SealedSecrets

will be handled by this Operator.

Download the kubeseal binary:

wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.15.0/kubeseal-linux-amd64 -O kubeseal
sudo install -m 755 kubeseal /usr/local/bin/kubeseal

Encrypt the Keys

In this example, I intend to store important secrets of the Percona Kubernetes Operator for Percona Server for MongoDB in git along with my manifests that are used to deploy the database.

First, I will seal the secret file with system users, which is used by the MongoDB Operator to manage the database. Normally it is stored in deploy/secrets.yaml.

kubeseal --format yaml < secrets.yaml  > blog-data/sealed-secrets/mongod-secrets.yaml

This command creates the file with encrypted contents, you can see it in the blog-data/sealed-secrets repository here. It is safe to store it publicly as it can only be decrypted with a private key.

Executing

kubectl apply -f blog-data/sealed-secrets/mongod-secrets.yaml

does the following:

  1. A sealedsecrets custom resource (CR) is created. You can see it by executing
    kubectl get sealedsecrets

    .

  2. The Sealed Secrets Operator receives the event that a new sealedsecrets CR is there and decrypts it with the private key.
  3. Once decrypted, a regular Secrets object is created which can be used as usual.

$ kubectl get sealedsecrets
NAME               AGE
my-secure-secret   20m

$ kubectl get secrets my-secure-secret
NAME               TYPE     DATA   AGE
my-secure-secret   Opaque   10     20m

Next, I will also seal the keys for my S3 bucket that I plan to use to store backups of my MongoDB database:

kubeseal --format yaml < backup-s3.yaml  > blog-data/sealed-secrets/s3-secrets.yaml
kubectl apply -f blog-data/sealed-secrets/s3-secrets.yaml

Vault Secrets Operator

Sealed Secrets is the simplest approach, but it is possible to achieve the same result with HashiCorp Vault and Vault Secrets Operator. It is a more advanced, mature, and feature-rich approach.

Prerequisites:

Vault Secrets Operator also relies on Custom Resource, but all the keys are stored in HashiCorp Vault:

Preparation

Create a policy on the Vault for the Operator:

cat <<EOF | vault policy write vault-secrets-operator -
path "kvv2/data/*" {
  capabilities = ["read"]
}
EOF

The policy might look a bit differently, depending on where your secrets are.

Create and fetch the token for the policy:

$ vault token create -period=24h -policy=vault-secrets-operator

Key                  Value                                                                                                                                                                                        
---                  -----                                                                                               
token                s.0yJZfCsjFq75GiVyKiZgYVOm
...

Write down the token, as you will need it in the next step.

Create the Kubernetes Secret so that the Operator can authenticate with the Vault:

export VAULT_TOKEN=s.0yJZfCsjFq75GiVyKiZgYVOm
export VAULT_TOKEN_LEASE_DURATION=86400

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: vault-secrets-operator
type: Opaque
data:
  VAULT_TOKEN: $(echo -n "$VAULT_TOKEN" | base64)
  VAULT_TOKEN_LEASE_DURATION: $(echo -n "$VAULT_TOKEN_LEASE_DURATION" | base64)
EOF

Deploy Vault Secrets Operator

It is recommended to deploy the Operator with Helm, but before we need to create the values.yaml file to configure the operator.

environmentVars:
  - name: VAULT_TOKEN
    valueFrom:
      secretKeyRef:
        name: vault-secrets-operator
        key: VAULT_TOKEN
  - name: VAULT_TOKEN_LEASE_DURATION
    valueFrom:
      secretKeyRef:
        name: vault-secrets-operator
        key: VAULT_TOKEN_LEASE_DURATION
vault:
  address: "http://vault.vault.svc:8200"

Environment variables are pointing to the Secret that was created in the previous chapter to authenticate with Vault. We also need to provide the Vault address for the Operator to retrieve the secrets.

Now we can deploy the Vault Secrets Operator:

helm repo add ricoberger https://ricoberger.github.io/helm-charts
helm repo update

helm upgrade --install vault-secrets-operator ricoberger/vault-secrets-operator -f blog-data/sealed-secrets/values.yaml

Give me the Secret

I have a key created in my HashiCorp Vault:

$ vault kv get kvv2/mongod-secret
…
Key                                 Value
---                                 -----                                                                                                                                                                         
MONGODB_BACKUP_PASSWORD             <>
MONGODB_CLUSTER_ADMIN_PASSWORD      <>
MONGODB_CLUSTER_ADMIN_USER          <>
MONGODB_CLUSTER_MONITOR_PASSWORD    <>
MONGODB_CLUSTER_MONITOR_USER        <>                                                                                                                                                               
MONGODB_USER_ADMIN_PASSWORD         <>
MONGODB_USER_ADMIN_USER             <>

It is time to create the secret out of it. First, we will create the Custom Resource object of

kind: VaultSecret

.

$ cat blog-data/sealed-secrets/vs.yaml
apiVersion: ricoberger.de/v1alpha1
kind: VaultSecret
metadata:
  name: my-secure-secret
spec:
  path: kvv2/mongod-secret
  type: Opaque

$ kubectl apply -f blog-data/sealed-secrets/vs.yaml

The Operator will connect to HashiCorp Vault and create regular Secret object automatically:

$ kubectl get vaultsecret
NAME               SUCCEEDED   REASON    MESSAGE              LAST TRANSITION   AGE
my-secure-secret   True        Created   Secret was created   47m               47m

$ kubectl get secret  my-secure-secret
NAME               TYPE     DATA   AGE
my-secure-secret   Opaque   7      47m

Deploy MongoDB Cluster

Now that the secrets are in place, it is time to deploy the Operator and the DB cluster:

kubectl apply -f blog-data/sealed-secrets/bundle.yaml
kubectl apply -f blog-data/sealed-secrets/cr.yaml

The cluster will be up in a minute or two and use secrets we deployed.

By the way, my cr.yaml deploys MongoDB cluster with two shards. Multiple shards support was added in version 1.7.0of the Operator – I encourage you to try it out. Learn more about it here: Percona Server for MongoDB Sharding.

Mar
22
2021
--

Want MongoDB Performance? You Will Need to Add and Remove Indexes!

MongoDB Performance

MongoDB PerformanceGood intentions can sometimes end up with bad results.  Adding indexes boosts performance until it doesn’t. Avoid over-indexing.

The difference between your application being fast, responsive, and scaling properly is often dependent on how you use indexes in the database.  MongoDB is no different, its performance (and the overall performance of your application) is heavily dependent on getting the proper amount of indexes on the right things.   A simple index or two can speed up getting data from MongoDB a million-fold for million-records tables.  But at the same time having too many indexes on a large collection can lead to massive slowdowns in overall performance.  You need to get your indexes just right.

For this blog, we are going to talk about having too many indexes and help you find both duplicate and unused indexes.  If you are interested in finding out if you need additional indexes or if your query is using an index, I would suggest reading previous Percona articles on query tuning (Part 1 & Part 2 of that series).

So, indexes are very good for getting faster queries. How many indexes do I need to create on a collection? What are the best practices for the indexes? How do I find which indexes are being used or not?  What if I have duplicated indexes?

Common Performance Problems

After analyzing a lot of different MongoDB environments I can provide the following list summarizing the typical errors I have seen:

  • Not creating indexes at all, other than the primary key _id created by design.
    • I’m not joking – I have seen databases without any user-created indexes, which had owners surprised the server was overloaded and/or the queries were very slow.
  • Over-indexing the collection.
    • Some developers usually create a lot of indexes without a specific reason or just for testing a query. Then they forget to drop them.
    • In some cases, the size of all the indexes was larger than the data. This is not good; indexes should be as small as possible to be really effective.

I’m not considering the first case. I’m going to discuss instead the second one.

How Many Indexes you Need in a Collection

It depends – that’s the right answer. Basically, it depends on your application workload. You should consider the following rules when indexing a collection:

  • Create as many indexes as possible for your application.
  • Don’t create a lot of indexes.

What? These rules are stating the opposite thing! Well, we can summarize in just one simple rule:

  • You need to create all the indexes your application really needs for solving the most frequent queries. Not one more, not one less.

That’s it.

Pros and Cons of Indexing

The big advantage of the indexes is that they permit the queries, updates, and deletes to run as fast as possible if they are used. (Every update or delete also needs to do a lookup step first). More indexes in a collection can benefit several queries.

Unfortunately, the indexes require some extra work for MongoDB. Any time your run a write, all the indexes must be updated. The new values are stored or dropped into the B-Tree structure, some splitting or merging is needed, and this requires some time.

The main problem is that “more indexes you have in a collection, the slower all the writes will be”.

A very large collection with just 10 or 15 indexes can have a significant performance loss for the writes. Also, remember that indexes have to be copied into the WiredTiger cache. More indexes imply also more pressure for the memory cache. The pressure can then lead to more cache evictions and slowness.

A good example of this is when I was working with a customer a few weeks ago we found 12 extra indexes on a collection they did not need. The collection was around 80GB; the total index size was more than the data size. They had a relevant write load based on several frequent inserts and updates all the time. Cleaning these indexes increased their write queries execution time by 25-30 percent on average. The improvement observed for this real case won’t be the same quantitative amount in other cases, but for sure the fewer indexes you have the faster all the writes will be.

We need to find some kind of balancing: creating more indexes, but not that much.

How to Reduce Over-Indexing

Very easy to say: drop all the indexes you don’t need.

There are two things you can do to identify the indexes to get dropped:

  • Check for the duplicates.
  • Check for the unused indexes.

For dropping an index you need to run something like the following:

db.mycollection.dropIndex("index_name")

Find Duplicate Indexes

A duplicate index could be an index with the same exact definition as another index that already exists in the collection. Fortunately, MongoDB is able to check this and it is not permitted to create such an index.

Let’s do a test using a simple collection with no indexes.

rs_test:PRIMARY> db.test.find()
{ "_id" : ObjectId("60521309d7268c122c7cd630"), "name" : "corrado", "age" : 49 }
{ "_id" : ObjectId("60521313d7268c122c7cd631"), "name" : "simone", "age" : 12 }
{ "_id" : ObjectId("6052131cd7268c122c7cd632"), "name" : "gabriele", "age" : 19 }
{ "_id" : ObjectId("60521323d7268c122c7cd633"), "name" : "luca", "age" : 14 }
{ "_id" : ObjectId("60521328d7268c122c7cd634"), "name" : "lucia", "age" : 49 }

# create an index on name field
rs_test:PRIMARY> db.test.createIndex( { name: 1 } )
{
   "createdCollectionAutomatically" : false,
   "numIndexesBefore" : 1,
   "numIndexesAfter" : 2,
   "commitQuorum" : "votingMembers",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615991942, 5),
      "signature" : {
         "hash" : BinData(0,"vQN6SGIL0fAMvTusJ12KgySqKOI="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615991942, 5)
}

# check indexes available
rs_test:PRIMARY> db.test.getIndexes()
[
   {
      "v" : 2,
      "key" : {
         "_id" : 1
      },
      "name" : "_id_"
   },
   {
      "v" : 2,
      "key" : {
         "name" : 1
      },
      "name" : "name_1"
   }
]

# try to create again the same index
rs_test:PRIMARY> db.test.createIndex( { name: 1 } )
{
   "numIndexesBefore" : 2,
   "numIndexesAfter" : 2,
   "note" : "all indexes already exist",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615991942, 5),
      "signature" : {
         "hash" : BinData(0,"vQN6SGIL0fAMvTusJ12KgySqKOI="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615991942, 5)
}

# great, MongoDB can detect the index already exists

# let's try to see if you can create the same index with a different name
rs_test:PRIMARY> db.test.createIndex( { name: 1 }, { name: "this_is_a_different_index_name" } )
{
   "operationTime" : Timestamp(1615991981, 1),
   "ok" : 0,
   "errmsg" : "Index with name: this_is_a_different_index_name already exists with a different name",
   "code" : 85,
   "codeName" : "IndexOptionsConflict",
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615991981, 1),
      "signature" : {
         "hash" : BinData(0,"whkRyQQxyJVBt+7d3HOtFvYY32g="),
         "keyId" : NumberLong("6890926313742270469")
      }
   }
}

# even in this case MongoDB doesn't permit the index creation

MongoDB is then clever enough to avoid the creation of duplicate indexes. But what about the creation of an index that is the left-prefix of an existing index? Let’s test it.

# let's drop the previous index we have created
rs_test:PRIMARY> db.test.dropIndex( "name_1" )
{
   "nIndexesWas" : 2,
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615993029, 1),
      "signature" : {
         "hash" : BinData(0,"njFiuCeyA5VcdNOOP2ASboOpWwo="),
         "keyId" : NumberLong("6890926313742270469")  
      }
   },
   "operationTime" : Timestamp(1615993029, 1)
}

# check indexes. Only _id available
rs_test:PRIMARY> db.test.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } ]

# create a compound index 
rs_test:PRIMARY> db.test.createIndex( { name:1, age: 1 } )
{
   "createdCollectionAutomatically" : false,
   "numIndexesBefore" : 1,
   "numIndexesAfter" : 2,
   "commitQuorum" : "votingMembers",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615993054, 5),
      "signature" : {
         "hash" : BinData(0,"gfaPsWsSM745opEiQORCt2L3HYo="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615993054, 5)
}

# create another index that is the leftmost prefix of the compound index
rs_test:PRIMARY> db.test.createIndex( { name:1 } )
{
   "createdCollectionAutomatically" : false,
   "numIndexesBefore" : 2,
   "numIndexesAfter" : 3,
   "commitQuorum" : "votingMembers",
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1615993060, 5),
      "signature" : {
         "hash" : BinData(0,"C2XWVA5mi+WWyPMn3Jw2VHTw/Dk="),
         "keyId" : NumberLong("6890926313742270469")
      }
   },
   "operationTime" : Timestamp(1615993060, 5)
}

# check indexes
rs_test:PRIMARY> db.test.getIndexes()
[
   {
      "v" : 2,
      "key" : {
         "_id" : 1 
      },
      "name" : "_id_"
   },
   {
      "v" : 2,
      "key" : {
         "name" : 1,
         "age" : 1
      },
      "name" : "name_1_age_1"
   },
   {
      "v" : 2,
      "key" : {
         "name" : 1
      },
      "name" : "name_1"
   }
]

We consider a leftmost-prefix index as a duplicate as well.

To take benefit from a compound index MongoDB doesn’t need to use all the fields of that index, the leftmost prefix is enough. For example an index on (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C). As a consequence, if I have two different indexes, one on (A, B, C) and another one on (A, B), the second is a duplicate because the first can be used the same way for solving the query with the combinations (A, B) and (A).

Then, find all duplicate indexes and drop them since they’re useless. Just be aware and check that your application doesn’t use hint() on the indexes you’re going to drop.

In order to avoid manually checking all the collections to discover the duplicates, I provide here a javascript code for that:

var ldb = db.adminCommand( { listDatabases: 1 } );

for ( i = 0; i < ldb.databases.length; i++ ) {

   if ( ldb.databases[i].name != 'admin' && ldb.databases[i].name != 'config' && ldb.databases[i].name != 'local') {

      print('DATABASE ',ldb.databases[i].name);
      print("+++++++++++++++++++++++++++++++++++++++++")

      var db = db.getSiblingDB(ldb.databases[i].name); 
      var cpd = db.getCollectionNames();

      for ( j = 0; j < cpd.length; j++ ) { 

         if ( cpd[j] != 'system.profile' ) {

            var indexes = JSON.parse(JSON.stringify(db.runCommand( { listIndexes: cpd[j] } ).cursor.firstBatch));
            print("COLL :"+cpd[j]);

            for ( k = 0; k < indexes.length; k++ ) {

               indexes[k] = (((JSON.stringify(indexes[k].key)).replace("{","")).replace("}","")).replace(/,/g ,"_");

            }

            var founddup = false;

            for ( k1 = 0; k1 < indexes.length; k1++ ) {

               for ( k2 = 0; k2 < indexes.length; k2++ ) {

                  if ( k1 != k2 ) {

                     if (indexes[k1].startsWith(indexes[k2],0)) {

                        print("{ "+indexes[k2]+" } is the left prefix of { "+indexes[k1]+" } and should be dropped");

                        founddup = true;

                     }
                  }
               } 
            }

            if (!founddup) {

               print("no duplicate indexes found");

            }

            print("\n");

         } 
      }

      print("\n");
   } 
}

Note: this script is just an initial test and could be improved, but it should work in most cases.

Find Unused Indexes

MongoDB maintains internal statistics about index usage. Any time an index is used for solving a query a specific counter is an increment. After running MongoDB for a significant amount of time – days or weeks – the statistics are reliable and we can find out which indexes have been used or not.

For looking at the index stats, MongoDB provides a stage in the aggregation pipeline: $indexStats

Here you can see an example:

rs_test:PRIMARY> db.restaurants.aggregate([ { $indexStats: {} } ]).pretty()
{
   "name" : "borough_1",
   "key" : {
      "borough" : 1
   },
   "host" : "ip-172-30-2-12:27017",
   "accesses" : {
      "ops" : NumberLong(312),
      "since" : ISODate("2021-03-17T13:48:51.305Z")
   },
   "spec" : {
      "v" : 2,
      "key" : {
         "borough" : 1
      },
      "name" : "borough_1"
   }
}
{
   "name" : "_id_",
   "key" : {
      "_id" : 1
   },
   "host" : "ip-172-30-2-12:27017",
   "accesses" : {
      "ops" : NumberLong(12),
      "since" : ISODate("2021-03-17T13:48:51.305Z")
   },
   "spec" : {
      "v" : 2,
      "key" : {
         "_id" : 1
      },
      "name" : "_id_"
   }
}
{
   "name" : "cuisine_1_borough_1",
   "key" : {
      "cuisine" : 1,
      "borough" : 1
   },
   "host" : "ip-172-30-2-12:27017",
   "accesses" : {
      "ops" : NumberLong(0),
      "since" : ISODate("2021-03-17T13:48:51.305Z")
   },
   "spec" : { 
      "v" : 2,
      "key" : {
         "cuisine" : 1,
         "borough" : 1
      },
      "name" : "cuisine_1_borough_1"
   }
}

The accesses.ops is the number of times the index has been used. In the example you can see the { borough:1 } has been used 312 times, the index { _id } 12 times, and the index { cuisine:1, borough: 1} 0 times. The last one could be dropped.

If the database is running for a long time with millions of queries executed and if an index was not used, most probably it won’t be used even in the future.

Then you should consider dropping the unused indexes in order to improve the writes, reduce the cache pressure, and saving disk space as well.

Using the following script you can find out the index statistics for all the collections:

var ldb=db.adminCommand( { listDatabases: 1 } );

   for (i=0; i<ldb.databases.length; i++) { 

      print('DATABASE ',ldb.databases[i].name);

      if ( ldb.databases[i].name != 'admin' && ldb.databases[i].name != 'config' && ldb.databases[i].name != 'local' ) {

      var db = db.getSiblingDB(ldb.databases[i].name); 
      var cpd = db.getCollectionNames();

      for (j=0; j<cpd.length; j++) {

         if ( cpd[j] != 'system.profile' ) {

            print(cpd[j]); 

            var pui = db.runCommand({ aggregate : cpd[j] ,pipeline : [{$indexStats: {}}],cursor: { batchSize: 100 } }); 
            printjson(pui);

         } 
      }

      print('\n\n'); 
   }
}

Look for the indexes having “ops”: NumberLong(0)

Conclusion

Creating indexes for solving queries is a good habit, but be aware to not abuse indexing. Excessive indexing can lead to slower writes, excessive pressure on the memory cache, and more evictions.

You should consider maintaining your indexes from time to time dropping all the duplicates and the unused indexes. The scripts provided in this article may help your index analysis.

Feb
26
2021
--

Connection Queuing in pgBouncer: Is it a Magical Remedy?

Connection Queuing in pgBouncer

Yes, this post is about connection queueing, not just pooling. Because “connection pooling” – pre-created connections as a pool – is a much-celebrated feature. Almost every discussion on connection pool/pgBouncer starts with the overhead of establishing a new connection to PostgreSQL… and how pre-created connections in the pool can save the world.

But there is a non-celebrity feature in pgBouncer (not denying others) that can address some of the real big operational challenges. It is the connection queueing. Many new PostgreSQL users don’t know there is something like this. In this blog post, I am planning to discuss some of the problems related to connection management first, and then explore how connection queueing can address those problems.

Problem 1: Spike in Load can Jam and Halt the Server

PostgreSQL has a dedicated backend server process for every user connection. So there will be one backend process running on a CPU core for every active queries/sessions. This means there is a one-to-one mapping between active sessions and running processes in the server. If we consider parallel execution of SQL statements, there will be many more running processes than active sessions. In many real-world cases, a sudden spike in load can result in hundreds of active queries starting at once while the server is equipped with a small number of CPUs (sometimes just virtual CPUs with only fraction of performance). As the number of active sessions/processes increases, the overhead of scheduling and context switches takes over.  Many times, the host server becomes unresponsive, and even opening a bash shell/terminal can take time. This is quite easy to simulate. Just 10 active connections on a two virtual CPU server with SELECT only workload can cause this.

With two active sessions:

$ time ssh -t postgres@pghost 'ls -l'
real 0m0.474s
user 0m0.017s
sys 0m0.006s

When there are 10 active sessions on PostgreSQL, just establishing an ssh connection to the server took 15 seconds.

real 0m15.307s
user 0m0.026s
sys 0m0.015s

**These are indicative numbers from a very specific system and do not qualify for a benchmark.

Generally, we could see that as the number of active sessions approaches double the number of CPU cores the performance penalty starts increasing heavily.

Many times, the problem won’t end there. Session level resource allocations (work_mem, temporary tables, etc.) can lead to overall server resource consumption. As the host server slows down, each session will take more time to complete while holding the resources, which could lead to more accumulation of active sessions. It is a spiral of evil. There are many real-world cases, where the entire show ended in a complete halt of the host server or OOM kick-in, terminating the PostgreSQL process and forcing it for crash recovery.

 

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Problem 2: “Too Many Clients Already” Errors

Few smart DBAs will prevent this database disaster by setting max_connections properly to a smaller value than the database can handle, which is the right thing to do from a DB server perspective. Allowing an excessive number of connections to the database can lead to different types of abuses, attacks, and disasters. But the flip side to it is an abusive application may be greeted with the message as follows:

FATAL:  sorry, too many clients already

The same will be logged in the PostgreSQL log.

2021-02-15 13:40:50.017 UTC [12829] FATAL:  sorry, too many clients already

Unfortunately, this could lead to an application crash or misbehavior. From the business point of view, we just shifted the problem from database to application.

Problem 3: Big max_connection Value and Overhead

Because of the above-mentioned problem, it is common to see max_connection to have a very high value. The overhead of connections is probably one of the most actively-discussed topics these days because Postgres 14 is expected to have some of the connection scalability fixes. Andres Freund blogged about the details of analyzing the connection scalability problem and how it is addressed.

Even the idling connection may occupy server resources like memory. The overhead is considered as very low on a properly configured server; however, the impact could be heavy in reality. Again, a lot of things depend on the workload. There are at least a few cases that reported up to 50MB consumption per session. That means 500 idle connections can result in up to 25GB of memory usage.

In addition to this, more connections can lead to more lock management-related overheads. And don’t forget that system becomes vulnerable to sudden spikes as the max_connections are increased.

Solution: Connection Queueing

At the very least, connection queueing is the queueing of connections so that they can absorb the sudden spike in load. The connections can be put into a queue rather than straight away rejecting or sending it to the server and jamming it. This results in streamlining the execution. PostgreSQL server can keep doing what it can do rather than dealing with a jam situation.

Let me demonstrate with an example. For this demonstration, I set the max_connections to “2”, assuming that this is the maximum the server can accommodate without causing too many context switches. Too many connections won’t come and overload my database.

postgres=# show max_connections ;
2

A third connection to the database will result in an error as expected.

psql: error: FATAL: sorry, too many clients already

Now let’s use the pgbouncer for the connection queue. Many of users may not be knowing that it exists, by default. I used the following pgbouncer configuration for testing:

[databases]
pgbounce = host=172.16.2.16 port=5432 dbname=postgres

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
logfile = pgbouncer.log
pidfile = pgbouncer.pid
admin_users = postgres
application_name_add_host=1
default_pool_size=1
min_pool_size=1

Yes, the pooler will establish only one connection to the database. pgBouncer establishes this one connection when the client connection establishes for the first time because the min_pool_size is 1. The pgBouncer log says:

2021-02-16 15:19:30.725 UTC [2152] LOG C-0x1fb0c58: pgbounce/postgres@172.16.2.56:54754 login attempt: db=pgbounce user=postgres tls=no
2021-02-16 15:19:30.726 UTC [2152] LOG S-0x1fbba80: pgbounce/postgres@172.16.2.16:5432 new connection to server (from 172.16.2.144:58968)

pgbouncer pool statistics also shows the details:

pgbouncer=# show pools;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         1 |          0 |         0 |       0 |       1 |         0 |        0 |       0 |          0 | session

But the beauty is that we won’t get any more “FATAL: sorry, too many clients already” errors. All client connections are accepted and put into the connection queue. For example, I have five client connections. please see the value of cl_active:

pgbouncer=# show pools;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         5 |          0 |         0 |       0 |       1 |         0 |        0 |       0 |          0 | session

As each client connection becomes active (with a SQL statement), they will be put into waiting.

pgbouncer=# show pools;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         1 |          4 |         1 |       0 |       0 |         0 |        0 |      28 |     438170 | session

Each client connection will be executed over the available database connection, one after another. This is a case with a single database connection. If the connection count and pool size can be increased, multiple client connections can hit the server at the same time and queue size drops. The following is a case with two connections (pool size two) to the database.

database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         2 |          1 |         2 |       0 |       0 |         0 |        0 |       4 |     978081 | session

Putting Connection Queueing into a Simple Test

This is not an extensive benchmarking, but a quick test to see the benefits for a typical host with two virtual CPUs. I have created 20 active connections to PostgreSQL with select-only load using pgbench.

pgbench -c 20 -j 20 -h pghost -U postgres -S -T 100

As the 20 server processes started running, the load average went out of the roof.

As you can see in the screenshot, the load average spiked to 17+. And as expected, the server response also becomes very poor consistently.

time ssh -t postgres@pghost 'ls -l'
real 0m17.768s
user 0m0.019s
sys 0m0.007s

At this stage, I tried sending the same 20 active connections through the connection queue of pgbouncer with pool size four (default_pool_size=4). The pgbouncer is at the client-side.

Since there are only four server-side processes, the load average dropped drastically. The maximum I could see is 1.73:

The server response is also very good.

$ time ssh -t postgres@pghost 'ls -l'
real 0m0.559s
user 0m0.021s
sys 0m0.008s

A load average of 17+ vs 1.73! That must be too good to be true.

There was a bit of skepticism about whether the low load on the server and the better server response is coming at the cost of database throughput.  I was expecting to see not-so-great throughput numbers. So I took the same test to a more consistently-performing platform (AWS r5.large with two virtual CPUs) again. To a bit of surprise, the numbers were even better.

The following are the numbers I got. At least it is not bad; it’s better. 5% better.

Direct With Queueing
20190.301781 21663.454921
20308.115945 21646.195661
20434.780692 21218.44989

Since we are using just four connections on the database side in this case, it also gives us the opportunity to reduce the max_connection value on the database side. Another check was whether switching to transaction-level pooling can save more because the database connection will be back to the pool and it could serve another client connection after each transaction. This could result in better concurrency.

Queue + max_ connection=5 Queue + max_connection=5 + transaction level pool
21897.685891 23318.655016
21913.841813 23486.485856
21933.129685 23633.39607

As expected, it delivered even better numbers. I would like to encourage readers to do more tests and proper benchmarking.

Summary and References

A really large number of application/client connections can be multiplexed over a very few database connections using the connection queueing. This queue helps in absorbing any spike in connections without overloading the database server. Instead of session processes queueing up on the OS scheduler/run queue, the connections can be kept outside safely and the database server can operate at full-throttle without any contentions. Streamlined database traffic results in better throughput also.

Feb
25
2021
--

Percona Monthly Bug Report: February 2021

Percona Bug report Feb 2021

Percona Bug report Feb 2021Here at Percona, we operate on the premise that full-transparency makes a product better. We strive to build the best open-source database products, but also to help you manage any issues that arise in any of the databases that we support. And, in true open-source form, report back on any issues or bugs you might encounter along the way.

We constantly update our bug reports and monitor other boards to ensure we have the latest information, but we wanted to make it a little easier for you to keep track of the most critical ones. This monthly post is a central place to get information on the most noteworthy open and recently resolved bugs. 

In this February 2021 edition of our monthly bug report, we have the following list of bugs:

Percona Server for MySQL/MySQL Bugs

PS-7477 (MySQL#101257) : MySQL-8 Crash. Prepared statements involving stored programs could cause MySQL to crash due to heap-use-after-free memory problems.

Affects Version/s: 8.0.22  [Tested/Reported version 8.0.22]

Fixed Version/s: 8.0.23

 

PS-7567: When enforce_storage_engine = InnoDB is set and a minor version upgrade performed, you can no longer view system and status variables.

enforce_storage_engine option is only available in Percona Server, so the bug only applicable to Percona Server. SHOW VARIABLES and SHOW STATUS both affected due to engine change of Performance_schema tables.

Affects Version/s: 8.0.22  [Tested/Reported version 8.0.22]

 

MySQL#102586:  When doing a multiple-table DELETE that is anticipating a foreign key ON DELETE CASCADE, the statements work on the primary but it breaks row-based replication.

Affects Version/s: 8.0, 5.7  [Tested/Reported version 8.0.23, 5.7.33]

 

MySQL#102175:  When the server is under heavy write operation, binary_log_position of gitid_executed shown in PS.log_status table doesn’t match with the position of that gitd shown in the binary log file. The impact of this bug is, It could break replication when using this GTID position information to start replication. 

Affects Version/s: 8.0.22  [Tested/Reported version 8.0.22]

 

PS-7498(MySQL#102647): Replica sql_thread getting stuck due to MASTER_DELAY. The replication applier thread in a multi-threaded replica can randomly get stuck in the “Waiting until MASTER_DELAY seconds after master executed event” state and doesn’t apply any transaction for a long time when the replica is configured with MASTER_DELAY.

The issue happens only when parallel replication enabled (slave_parallel_workers > 1) and sql_delay is enabled on the replica. Disabling the parallel replication fixes the issue and sql_thread resumes the work as expected.

Affects Version/s: 5.7  [Tested/Reported version 5.7.32]

In Percona Server we fixed this issue and the fix will be available in the next release.

 

PS-7542: SELECT query which creates temporary tables could lead to “ERROR 1022 (23000): Can’t write; duplicate key in table ‘/path/tmp/#sqla882_21_2‘” error.

Workaround: Running the same SELECT query works fine after changing internal_tmp_mem_storage_engine from TempTable(default) to MEMORY.

Affects Version/s: 8.0  [Tested/Reported version 8.0.17-8, 8.0.22-13]

 

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Percona XtraDB Cluster

PXC-3536: PXC Cluster data inconsistency after MDL conflict solved.

Affects Version/s: 8.0,5.7  [Tested/Reported version 8.0.21-12.1, 5.7.31-31.45.3]

 

PXC-3455: When binlog disabled (skip-log-bin = 1) on the PXC node it crashes when a table with composite keys queried by SHOW INDEXES. 

Affects Version/s: 8.0  [Tested/Reported version 8.0.19-10, 8.0.21-12.1]

 

Percona XtraBackup

PXB-2375:  In some cases, XtraBackup will write the wrong binlog filename, pos, and GTID combination info in xtrabackup_binlog_info. 

If we are using this backup with GTID position details in xtrabackup_binlog_info to create a new replica, then most likely replication will break due to incorrect GTID position.

Looks like the GTID position is not consistent with binlog file pos, they are captured differently and later printed together in xtrabackup_binlog_info  file.

Affects Version/s:  8.0 [Tested/Reported version 8.0.14]

 

PXB-2274: XtraBackup prepare fails with the wrong LSN message when there are DML statements running during a backup stage.

Affects Version/s: 8.0 [Tested/Reported version 8.0.22-15]

Fixed Version/s: 8.0.23 [Next release]

 

Percona Toolkit

PT-1914: When using pt-osc on a table that has a column with the ‘Generated’ word in the column comment then it will remove all data in some columns.

Affects Version/s:  3.0.13, 3.2.1, 3.3.0

 

PT-1919:  Running a pt-osc on a table that has trigger/s with alter-foreign-keys-method drop_swap mode (or auto mode which could choose drop swap) will delete some trigger/s.

Affects Version/s:  3.2.1, 3.3.0

 

PMM  [Percona Monitoring and Management]

PMM-4665: Frequent error messages in pmm-agent.log for components like tokudb storage engine which are not supported by upstream MySQL instance. 

Affects Version/s:  2.x  [Tested/Reported version 2.0.13]

 

Summary

We welcome community input and feedback on all our products. If you find a bug or would like to suggest an improvement or a feature, learn how in our post, How to Report Bugs, Improvements, New Feature Requests for Percona Products.

For the most up-to-date information, be sure to follow us on Twitter, LinkedIn, and Facebook.

Quick References:

Percona JIRA  

MySQL Bug Report

Report a Bug in a Percona Product

___

About Percona:

As the only provider of distributions for all three of the most popular open source databases—PostgreSQL, MySQL, and MongoDB—Percona provides expertise, software, support, and services no matter the technology.

Whether it’s enabling developers or DBAs to realize value faster with tools, advice, and guidance, or making sure applications can scale and handle peak loads, Percona is here to help.

Percona is committed to being open source and preventing vendor lock-in. Percona contributes all changes to the upstream community for possible inclusion in future product releases.

Feb
17
2021
--

The Most Important Skills for an SRE, DBRE, or DBA

Important Skills for an SRE DBRE or DBA

Important Skills for an SRE DBRE or DBAI have talked extensively about the DBA’s evolving role and how many DBA’s and operations professionals are now becoming SRE’s (site reliability engineers) or DBRE’s (database reliability engineers). Often, databases get blamed as the bottleneck for application slowdowns and issues, so DBAs have had to develop the skills needed to chase problems up and down the stack over the years. This full-stack approach to hunting out problems has resulted in many former DBAs and Sysadmins successfully taking on the role of an SRE/DBRE.

The question is, then, what are the most critical skills for this important role?

I personally have interviewed 1000’s of technical candidates over the last 10 years and have hired hundreds in various roles here at Percona. I often get asked the most critical skill for the next generation of technical engineers, SREs, or DBREs. The answer has been consistent for me over my career – I want engineers, SREs, etc., with good problem-solving skills and the ability to think outside the box. I am not looking for book knowledge or a detailed understanding of every concept; I want people who can see something new and…

  1. Be curious enough to ask “Why?” and want to know the answer.
  2. Will dig into the ambiguous and want to learn, and can learn the why.
  3. Can solve the issue, answer the question, and share that knowledge effectively.

From a technical perspective, while it is wonderful to have a great depth of knowledge, I generally am not looking for the experts’ expert. Rather, I look for people who are smart, passionate, and who learn quickly. I am not alone in this. I asked the question of those involved in hiring technical people here at Percona.

Peter Zaitsev (CEO) said the number one skill he is looking for is this: “Attitude and Ability to do independent research and find information to solve the problem at hand.” For many like Peter, having an encyclopedic knowledge of how things work or the right commands to use is secondary to solving problems never seen before. Many problems and issues that come up you cannot specifically train for. The unique nature of the workload, size, and way too many external factors often offer unique challenges to even the most experienced SRE. Peter added: “So many people now have this ‘I have not been trained on this’ attitude instead of doing some basic googling for the answer.” Indeed, there is a lot of information out there, and while searching for an answer quickly during an outage or performance event may seem like a no-brainer, more than half the people I have interviewed don’t even think about it. Thinking on your feet, reacting quickly, and restoring service can save companies millions of dollars of lost revenue and business in an outage.

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Marco Tusa (MySQL Tech Lead) echoed Peter’s sentiment by saying that there are two important skills for him. One of these is the ability to learn what they don’t know. “This is because no matter what, often the best one on tech knowledge won’t know some important stuff. The will to learn is the key.Lenz Grimmer (Sr Director of Server Engineering) could not have agreed more, adding: “I’m seeking talent that is open-minded about acquiring new skills. So fast learners with the right sense of humility and the right attitude.

Teamwork Makes the Dream Work…

Attitude and humility are critical in building an effective team (especially in a remote team). This was Marco’s second trait he is looking for. Marco went on to add he is also looking strongly into their fit with the team and if they will be a team player. The “no jerks” or “no soloist prima donna” mottos are very important. You have to be willing to share what you learned and look for help from your teammates.

This is the same thing Jay Janssen (Director of IT) said when asked about the number one thing he looks for: “Humility comes to mind — smart and humble is a good combination. While kind of cliche, it’s generally true.” We are all looking to hire smart people, but smart people who are Jerks or flaunt how smart they are generally don’t operate well in a team environment. You want someone who is smart but does not make other people feel small or insignificant.

Sanja Bonic (Head of Percona’s Open Source Program Office) also values teamwork and makes sure she tries to understand how people handle positive and negative interactions as a team.  Sanja, who has previously led Engineering teams at OpenShift and now works with Percona’s community, asks people in interviews about “their best and worst experiences in teams they’ve previously worked with. This usually shows what people are paying attention to, and you’ll quickly get a hint of what people attribute value to.”

While you need people to work and learn independently, you equally also need them to function as a unit (or as a team). Remember to ensure the uptime, availability, and performance of the entire application spanning potentially hundreds or thousands of nodes, you need to use all the resources at your disposal when things go wrong, and having teammates who you trust, can help, and can augment your knowledge with is very important. You can’t do it all alone, so having the ability to “team-up” and work with others is a must.

The strength of the team is each individual member. The strength of each member is the team.” ~ Phil Jackson

Sharing is Caring…

The ability for smart people to effectively share their knowledge and have good meaningful conversations is also critical in this role. Vadim Tkachenko (CTO) said he is looking for “People who have a brain and can have a meaningful conversation.” He went on to say he is looking for people who “Can speak well about previous relevant experiences.” This ability to share goes a long way internally to increase the collaborative spirit within the team. But this is not merely about speaking a single language; rather, it’s being able to talk about the technologies and match your audience’s expectations (or teammates).

Tate Mcdaniel (DBA Manager) says this is the number one thing he looks for when hiring people. His approach, in his words – “I ask questions about contentious and complicated things, and I look for answers that explain the complexity in ways a layperson can understand while giving pros/cons judiciously.” Taking the complex, explaining it, and educating others is of critical importance.

It is why Peter, Vadim, Jay, Marco, Tate, Lenz, and myself all said we look online at what people have written, where they have talked at conferences, what code they may have written, and other traces of their public persona before interviewing someone.

When I asked Lenz Grimmer if he looked at a candidate’s online persona, he said: “Absolutely, that’s one of the beauties of hiring in the open-source ecosystem. A public track record of contributions in various forms tells me much more than a CV. Forum and mailing list contributions, YouTube videos, all of which help get a better understanding of the candidate.”

One Person is an Island…

I personally highly value people’s willingness to share their insights, knowledge, and sometimes struggles. This is especially critical in the open-source space. I mentioned that no one person could manage a complex environment alone. Training and educating team members and others in the community is critical. The willingness to share and educate via online blogs, articles, and technical talks is, in my opinion, essential to the SRE/DBRE community as a whole.

So what do we see as the must-have skills?

  1. Problem-solving skills, the ability to troubleshoot unique and challenging problems.
  2. The passion and desire to learn, research, and acquire skills quickly.
  3. Humility and the ability to be a “team player” – No jerks allowed!
  4. The ability and passion for sharing their knowledge and educating others.

What do you think? Did we miss any?

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com