Mar
09
2021
--

Webinar March 24: Introduction to pg_stat_monitor

Percona Webinar Introduction to pg_stat_monitor

Join Peter Zaitsev, Percona CEO, and Ibrar Ahmed, Percona Sr. Software Engineer, as they discuss pg_stat_monitor!

If you’re tasked with optimizing PostgreSQL performance, chances are you’re relying on the pg_stat_statements extension to capture information about query performance. While this extension provides a lot of great insights, PostgreSQL makes it possible to go even further! In this webinar, we introduce pg_stat_monitor – Open Source extension, based on pg_stat_statements which provide such advanced query performance details. We talk about additional design goals we had and why those are important, additional information we capture, and how you can use it to get your PostgreSQL running even faster.

Please join Peter Zaitsev, Percona CEO, and Ibrar Ahmed, Percona Sr. Software Engineer, on Wednesday, March 24, 2021, at 1:00 PM EST for their webinar Introduction to pg_stat_monitor.

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.

Mar
08
2021
--

Testing the Value of ScaleFlux Computational Storage Drive (CSD) for PostgreSQL

ScaleFlux Computational Storage Drive PostgreSQL

Some time ago we at Percona were approached by ScaleFlux Inc to benchmark their latest hardware appliance, the CSD 2000 Drive, which is a next-generation SSD computational storage drive. It goes without saying that a truly relevant report requires us to be as honest and as forthright as possible. In other words, my mission was to, ahem, see what kind of mayhem I could cause.

Benchmarking is a bit like cooking; it requires key ingredients, strict adherence to following a set of instructions, mixing the ingredients together, and a bit of heat to make it all happen. In this case, the ingredients include the Linux OS running Ubuntu 18.04 on both the database and the bench-marking hosts, PostgreSQL version 12, SysBench the modular, cross-platform, and multi-threaded benchmark tool, and a comparable, competing appliance i.e. the Intel DC P4610 series drive. The two appliances are mounted as partitions respectively both using the same type of file system.

 

Once the environment is ready, the next step involves declaring and implementing the bench-marking rules which consist of various types of DML and DDL activity. Keeping in mind that apart from the classic OLAP vs OLTP modes of database processing, executing a benchmark that closely follows real production activities can be problematic. Quite often, when pushing a system to its full capacity, one can say that all production systems are to some extent unique. Therefore, for our purposes, we used the testing regime SysBench offers by default.

Once the system was ready, loading started out slow and gentle. The idea was to develop a baseline for the various types of activity and Postgres runtime conditions. Then, the bench-marking intensity was gradually increased to the point where we eventually started getting interesting results.

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Needless to say, it took quite a bit of time running the various permutations, double-checking our numbers, graphing the data, and then after all that, interpreting the output. I’m not going to go into any great detailing the analysis itself. Instead, I encourage you to look at the whitepaper itself.

So after all this effort, what was the takeaway?

There are two key observations that I’d like to share:

  1. At peak loading, the ScaleFlux CSD 2000 Drive demonstrated less performance variance than that of the Intel DC P4610. Variance being the statistical encapsulation of IO read-write spread between maximum and minimum values. The significance is server predictability. This becomes important when, for example, finely tuned application processes depend upon consistent performance with the RDBMS. Many a time I’ve seen applications get upset when response times between inserting, updating, or deleting data and getting the resultant queries would suddenly change.
  2. Remarkable space savings were realized when the Postgres fillfactor was reduced. As you know, the fillfactor can become a critical runtime parameter in regards to performance when high-frequency UPDATE and DELETE operations take place on the same tuple over and over again.

Finally, one last item… I didn’t mention it but we also benchmarked MySQL for ScaleFlux. The results were pretty remarkable. It’s worth your while to have a look at that one too.

ScaleFlux White Papers:

Feb
26
2021
--

Connection Queuing in pgBouncer: Is it a Magical Remedy?

Connection Queuing in pgBouncer

Yes, this post is about connection queueing, not just pooling. Because “connection pooling” – pre-created connections as a pool – is a much-celebrated feature. Almost every discussion on connection pool/pgBouncer starts with the overhead of establishing a new connection to PostgreSQL… and how pre-created connections in the pool can save the world.

But there is a non-celebrity feature in pgBouncer (not denying others) that can address some of the real big operational challenges. It is the connection queueing. Many new PostgreSQL users don’t know there is something like this. In this blog post, I am planning to discuss some of the problems related to connection management first, and then explore how connection queueing can address those problems.

Problem 1: Spike in Load can Jam and Halt the Server

PostgreSQL has a dedicated backend server process for every user connection. So there will be one backend process running on a CPU core for every active queries/sessions. This means there is a one-to-one mapping between active sessions and running processes in the server. If we consider parallel execution of SQL statements, there will be many more running processes than active sessions. In many real-world cases, a sudden spike in load can result in hundreds of active queries starting at once while the server is equipped with a small number of CPUs (sometimes just virtual CPUs with only fraction of performance). As the number of active sessions/processes increases, the overhead of scheduling and context switches takes over.  Many times, the host server becomes unresponsive, and even opening a bash shell/terminal can take time. This is quite easy to simulate. Just 10 active connections on a two virtual CPU server with SELECT only workload can cause this.

With two active sessions:

$ time ssh -t postgres@pghost 'ls -l'
real 0m0.474s
user 0m0.017s
sys 0m0.006s

When there are 10 active sessions on PostgreSQL, just establishing an ssh connection to the server took 15 seconds.

real 0m15.307s
user 0m0.026s
sys 0m0.015s

**These are indicative numbers from a very specific system and do not qualify for a benchmark.

Generally, we could see that as the number of active sessions approaches double the number of CPU cores the performance penalty starts increasing heavily.

Many times, the problem won’t end there. Session level resource allocations (work_mem, temporary tables, etc.) can lead to overall server resource consumption. As the host server slows down, each session will take more time to complete while holding the resources, which could lead to more accumulation of active sessions. It is a spiral of evil. There are many real-world cases, where the entire show ended in a complete halt of the host server or OOM kick-in, terminating the PostgreSQL process and forcing it for crash recovery.

 

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Problem 2: “Too Many Clients Already” Errors

Few smart DBAs will prevent this database disaster by setting max_connections properly to a smaller value than the database can handle, which is the right thing to do from a DB server perspective. Allowing an excessive number of connections to the database can lead to different types of abuses, attacks, and disasters. But the flip side to it is an abusive application may be greeted with the message as follows:

FATAL:  sorry, too many clients already

The same will be logged in the PostgreSQL log.

2021-02-15 13:40:50.017 UTC [12829] FATAL:  sorry, too many clients already

Unfortunately, this could lead to an application crash or misbehavior. From the business point of view, we just shifted the problem from database to application.

Problem 3: Big max_connection Value and Overhead

Because of the above-mentioned problem, it is common to see max_connection to have a very high value. The overhead of connections is probably one of the most actively-discussed topics these days because Postgres 14 is expected to have some of the connection scalability fixes. Andres Freund blogged about the details of analyzing the connection scalability problem and how it is addressed.

Even the idling connection may occupy server resources like memory. The overhead is considered as very low on a properly configured server; however, the impact could be heavy in reality. Again, a lot of things depend on the workload. There are at least a few cases that reported up to 50MB consumption per session. That means 500 idle connections can result in up to 25GB of memory usage.

In addition to this, more connections can lead to more lock management-related overheads. And don’t forget that system becomes vulnerable to sudden spikes as the max_connections are increased.

Solution: Connection Queueing

At the very least, connection queueing is the queueing of connections so that they can absorb the sudden spike in load. The connections can be put into a queue rather than straight away rejecting or sending it to the server and jamming it. This results in streamlining the execution. PostgreSQL server can keep doing what it can do rather than dealing with a jam situation.

Let me demonstrate with an example. For this demonstration, I set the max_connections to “2”, assuming that this is the maximum the server can accommodate without causing too many context switches. Too many connections won’t come and overload my database.

postgres=# show max_connections ;
2

A third connection to the database will result in an error as expected.

psql: error: FATAL: sorry, too many clients already

Now let’s use the pgbouncer for the connection queue. Many of users may not be knowing that it exists, by default. I used the following pgbouncer configuration for testing:

[databases]
pgbounce = host=172.16.2.16 port=5432 dbname=postgres

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
logfile = pgbouncer.log
pidfile = pgbouncer.pid
admin_users = postgres
application_name_add_host=1
default_pool_size=1
min_pool_size=1

Yes, the pooler will establish only one connection to the database. pgBouncer establishes this one connection when the client connection establishes for the first time because the min_pool_size is 1. The pgBouncer log says:

2021-02-16 15:19:30.725 UTC [2152] LOG C-0x1fb0c58: pgbounce/postgres@172.16.2.56:54754 login attempt: db=pgbounce user=postgres tls=no
2021-02-16 15:19:30.726 UTC [2152] LOG S-0x1fbba80: pgbounce/postgres@172.16.2.16:5432 new connection to server (from 172.16.2.144:58968)

pgbouncer pool statistics also shows the details:

pgbouncer=# show pools;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         1 |          0 |         0 |       0 |       1 |         0 |        0 |       0 |          0 | session

But the beauty is that we won’t get any more “FATAL: sorry, too many clients already” errors. All client connections are accepted and put into the connection queue. For example, I have five client connections. please see the value of cl_active:

pgbouncer=# show pools;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         5 |          0 |         0 |       0 |       1 |         0 |        0 |       0 |          0 | session

As each client connection becomes active (with a SQL statement), they will be put into waiting.

pgbouncer=# show pools;
 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         1 |          4 |         1 |       0 |       0 |         0 |        0 |      28 |     438170 | session

Each client connection will be executed over the available database connection, one after another. This is a case with a single database connection. If the connection count and pool size can be increased, multiple client connections can hit the server at the same time and queue size drops. The following is a case with two connections (pool size two) to the database.

database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode 
-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+-----------
 pgbounce  | postgres  |         2 |          1 |         2 |       0 |       0 |         0 |        0 |       4 |     978081 | session

Putting Connection Queueing into a Simple Test

This is not an extensive benchmarking, but a quick test to see the benefits for a typical host with two virtual CPUs. I have created 20 active connections to PostgreSQL with select-only load using pgbench.

pgbench -c 20 -j 20 -h pghost -U postgres -S -T 100

As the 20 server processes started running, the load average went out of the roof.

As you can see in the screenshot, the load average spiked to 17+. And as expected, the server response also becomes very poor consistently.

time ssh -t postgres@pghost 'ls -l'
real 0m17.768s
user 0m0.019s
sys 0m0.007s

At this stage, I tried sending the same 20 active connections through the connection queue of pgbouncer with pool size four (default_pool_size=4). The pgbouncer is at the client-side.

Since there are only four server-side processes, the load average dropped drastically. The maximum I could see is 1.73:

The server response is also very good.

$ time ssh -t postgres@pghost 'ls -l'
real 0m0.559s
user 0m0.021s
sys 0m0.008s

A load average of 17+ vs 1.73! That must be too good to be true.

There was a bit of skepticism about whether the low load on the server and the better server response is coming at the cost of database throughput.  I was expecting to see not-so-great throughput numbers. So I took the same test to a more consistently-performing platform (AWS r5.large with two virtual CPUs) again. To a bit of surprise, the numbers were even better.

The following are the numbers I got. At least it is not bad; it’s better. 5% better.

Direct With Queueing
20190.301781 21663.454921
20308.115945 21646.195661
20434.780692 21218.44989

Since we are using just four connections on the database side in this case, it also gives us the opportunity to reduce the max_connection value on the database side. Another check was whether switching to transaction-level pooling can save more because the database connection will be back to the pool and it could serve another client connection after each transaction. This could result in better concurrency.

Queue + max_ connection=5 Queue + max_connection=5 + transaction level pool
21897.685891 23318.655016
21913.841813 23486.485856
21933.129685 23633.39607

As expected, it delivered even better numbers. I would like to encourage readers to do more tests and proper benchmarking.

Summary and References

A really large number of application/client connections can be multiplexed over a very few database connections using the connection queueing. This queue helps in absorbing any spike in connections without overloading the database server. Instead of session processes queueing up on the OS scheduler/run queue, the connections can be kept outside safely and the database server can operate at full-throttle without any contentions. Streamlined database traffic results in better throughput also.

Feb
01
2021
--

PostgreSQL Database Security: Authentication

PostgreSQL Database Security: Authentication

Recently, I wrote an overview about what you need to know about PostgreSQL security. For this post, I want to write about PostgreSQL authentication. It is divided into three categories: 1. PostgreSQL Internal Authentication, 2. OS-based Authentication, and 3. External Server-Based Authentication.  In most cases, PostgreSQL is configured to be used with internal authentication, so here we will discuss each and every internal authentication method in detail. The next blog will cover the OS authentication methods such as PAM, Peer, and Ident.

 

Figure 1: PostgreSQL Authentication method.

 

The following is the list of PostgreSQL internal authentication supported methods.

  • Trust
  • Reject
  • md5
  • SCRAM
  • Cert

PostgreSQL has a configuration file to configure authentication called pg_hba.conf. All the authentication-related settings are part of this configuration file. Here is the sample pg_hba.conf file:

 

host    database             user               address      auth-method    [auth-options]
------+--------------- +-----------------+----------------+--------------+---------------
Local |  all           |     all         |                |   trust 
host  |  all           |     all         | 127.0.0.1/32   |   trust
host  |  postgres      |     postgres    | 192.168.1.1/24 |   md5
host  |  replication   |     postgres    | 127.0.0.1/32   |   md5

 

The first column of the “pg_hbaa.conf” file is the “host”. It can be either local or host. The local is fixed for the Unix-Domain socket, and in the case of a host, you need to specify the host IP address in the address column. The second column is the database, which is used to specify the database name. You can set the authentication method based on databases, meaning your database can have its own authentication method. If these values are set to all, then all the databases will use the same authentication method. The third column of the file is the user, which means you can set separate authentication methods for different users and “all” means apply to all users. The fourth parameter is to specify the IP address, which means which IP address can use that authentication method. The next column is the auth-method which can be any of the authentication methods shown in Figure 1. The last column is auth-options, in case any authentication method has some options.

Trust and Reject

When you specify the authentication method Trust, then any user who fulfills the requirement will not require any password. Similarly, in the case of Reject, any user who fulfills the requirement will not be allowed to login into the system. Here is the example of Trust and Reject:

host        database          user    address        auth-method    [auth-options]
------+-----------------+-----------------+----------------+-------------------------+--------------------
host  |    all          |     all   | 127.0.0.1/32    |   trust
host  |    all          |     all   | 127.0.0.1/32    |   Reject

The pg_hba.conf file has two entries; the first one has authentication method trust and the second one has authentication method reject. A local host user no longer needs to have a password and will be granted permission to login into the system without the password. But any computer other than localhost will be rejected, because of the second line of the pg_hba.conf file.

Trust Authentication

postgres@127.0.01:~$ psql postgres -h 127.0.0.1 -U postgres
psql (12.4)
Type "help" for help.
postgres=>

Reject Authentication

postgres@10.0.2.2:~$ psql postgres -h 10.0.2.1 -U postgres
psql: error: could not connect to server:
FATAL:  pg_hba.conf rejects connection for host "10.0.2.2", user "postgres", database "postgres"

 

md5

In the case of md5 authentication, you need to provide the password. Let’s look at a simple example of that.

host        database          user    address            auth-method               [auth-options] 
------+-----------------+-----------------+-----------+-------------------------+-------------------- 
host  |    all          |     all   | 10.0.2.2/32     |   md5

 

vagrant@vagrant:~$ psql postgres -h 10.0.2.1 -U postgres
Password for user postgres: 
psql (12.4)
Type "help" for help.
postgres=>

SCRAM

The SCRAM, or more specifically scram-sha-256, is a challenge-response scheme that prevents password sniffing on untrusted connections. It is one of the most secure authentication methods, using secure, cryptographically-hashed security, to store the passwords on the server.

Step 1: Change the password of the user

postgres=# SET password_encryption = 'scram-sha-256';
SET

postgres=# ALTER USER postgres WITH PASSWORD 'test';
ALTER ROLE

Step 2: Change the pg_hba.conf file.

host        database          user    address        auth-method    [auth-options] 
------+-----------------+-----------------+-----------+-------------------------+-------------------- 
host  |    all          |     all   | 10.0.2.2/32    |   scram-sha-256

Step 3: Test the connection

$ psql postgres -U postgres
Password for user postgres: 
psql (13.0)

Type "help" for help.

 

Percona Distribution for PostgreSQL is free to download and use. It is the best and most critical enterprise-level components from the open-source community, designed and tested to work together in one single source. 

 

CERT

Server Key and Certificate

Step 1: Generate Server keys

$  openssl genrsa -des3 -out server.key 1024
Generating RSA private key, 1024 bit long modulus (2 primes)
.+++++
..................+++++
e is 65537 (0x010001)
Enter pass phrase for server.key:
Verifying - Enter pass phrase for server.key:

$ openssl rsa -in server.key -out server.key
Enter pass phrase for server.key:
writing RSA key

$ chmod og-rwx server.key

Step 2: Generate Server Certificate

$ openssl req -new -key server.key -days 3650 -out server.crt -x509
-----
Country Name (2 letter code) [AU]:PK
State or Province Name (full name) [Some-State]:ISB
Locality Name (eg, city) []:Islamabad
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Percona
Organizational Unit Name (eg, section) []:Dev
Common Name (e.g. server FQDN or YOUR name) []:localhost
Email Address []:ibrar.ahmad@gmail.com 


$ cp server.crt root.crt

Client Keys and Certificate

Step 3: Generate a client certificate

$ openssl genrsa -des3 -out /tmp/postgresql.key 1024
Generating RSA private key, 1024 bit long modulus (2 primes)
..........................+++++
.....................................................+++++
e is 65537 (0x010001)
Enter pass phrase for /tmp/postgresql.key:
Verifying - Enter pass phrase for /tmp/postgresql.key:



$ openssl rsa -in /tmp/postgresql.key -out /tmp/postgresql.key
Enter pass phrase for /tmp/postgresql.key:
writing RSA key


$ openssl req -new -key /tmp/postgresql.key -out /tmp/postgresql.csr
-----
Country Name (2 letter code) [AU]:PK
State or Province Name (full name) [Some-State]:ISB
Locality Name (eg, city) []:Islamabad
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Percona
Organizational Unit Name (eg, section) []:Dev
Common Name (e.g. server FQDN or YOUR name) []:127.0.0.1
Email Address []:ibrar.ahmad@gmail.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:pakistan
An optional company name []:Percona

Step 4: Copy root.crt to the client.

$ cp $PGDATA/root.crt /tmp/

PostgreSQL Settings

Step 5:  In postgrsql.conf file set ssl = on

# - SSL -
ssl = on

#ssl_ca_file = ''
#ssl_cert_file = 'server.crt'

Step 6: Restart PostgreSQL

pg_ctl restart

Connection

Now, all set here, and you can test the connection using the psql command.

$ psql 'host=localhost port=5432 dbname=postgres user=vagrant sslmode=verify-full sslcert=/tmp/postgresql.crt sslkey=/tmp/postgresql.key sslrootcert=/tmp/root.crt'
psql (13.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
postgres=# 

Conclusion

This is the second part of the security series blog, and in the first blog post, we see the main features of security we need to consider. In this blog, we started with authentication and only focused on the PostgreSQL authentication mechanism and still need to see how external authentication methods work in PostgreSQL. Stay tuned!

Jan
28
2021
--

How to Create PostgreSQL Custom Builds and Debian Packages

create postgresql custom builds

Some time ago, I required a specific version of PostgreSQL. After searching on all the repositories, I was unable to find that specific version. It seems that the PostgreSQL community removes the old package when a new minor version comes. For example, if you need the 12.2 version but the 12.3 version is out, it’s really hard to find out the exact version. I am a developer, so I always can build that specific version for myself, but for the user, they have two major issues. The first issue is they require the “deb” package, not the binaries, and the second issue is they don’t want a complete development environment on their production/staging server. So, I have decided to write a blog to explain how to make a custom build for Debian or ubuntu and make Debian packages.

1 – Download PostgreSQL from this FTP site (https://www.postgresql.org/ftp/source/). There are multiple archives available to download the desired version. After downloading you need to untar the archive.

PGVER=12.3
wget https://ftp.postgresql.org/pub/source/v12.3/postgresql-$PGVER.tar.gz
tar -zxvf postgresql-$PGVER.tar.gz

2 – Download the Debian PostgreSQL’s package generator scripts from the site (https://salsa.debian.org/postgresql/postgresql). 

git clone https://salsa.debian.org/postgresql/postgresql.git

3 – Copy the “debian” folder to postgresql-$PGVER

cp postgresql/debian postgresql-$PGVER -rf

4 – Change the directory to PostgreSQL code.

cd postgreql-$PGVER

5 – The command will generate the Debian packages. This will compile the source code and run the test cases. After that, it will generate the Debian packages. 

sudo apt-get install dpkg-dev
dpkg-buildpackage -rfakeroot -b -uc -us

You may face some issues based on what PostgreSQL version you are building and what version of Debian you are using.

Issue – 1: You may see some dependency unmet error messages, something like this below. In that case, you need to install all of these packages using the apt-get command.

dpkg-checkbuilddeps: error: Unmet build dependencies: bison clang-11 debhelper-compat (= 13) dh-exec (>= 0.13~) docbook-xsl (>= 1.77) f
lex gdb gettext libicu-dev libio-pty-perl libipc-run-perl libkrb5-dev libldap2-dev libpam0g-dev | libpam-dev libperl-dev libreadline-de
v libselinux1-dev libssl-dev libsystemd-dev libxml2-dev libxml2-utils libxslt1-dev llvm-11-dev pkg-config python3-dev systemtap-sdt-dev
tcl-dev uuid-dev xsltproc zlib1g-dev | libz-dev
dpkg-buildpackage: warning: build dependencies/conflicts unsatisfied; aborting

Issue – 2: There is no debhelper-compat on LTS Debian.

You need to modify the control file:

$ git diff 
diff --git a/debian/control b/debian/control 
index 3b94937..534fd56 100644 
--- a/debian/control 
+++ b/debian/control 
@@ -11,7 +11,7 @@ Rules-Requires-Root: no 
 Build-Depends: 
bison, clang-11 [!alpha !hppa !hurd-i386 !ia64 !kfreebsd-amd64 !kfreebsd-i386 !m68k !powerpc !riscv64 !sh4 !sparc64 !x32], 
- debhelper-compat (= 13),
+ debhelper, dh-exec (>= 0.13~), 
docbook-xsl (>= 1.77), 
dpkg-dev (>= 1.16.1~),

Then you need to create a compact file:

echo 9 > debian/compat

Issue – 3: You may not find llvm on Debian 9, so there you need to add these lines in /etc/apt/sources.list. The complete list of the repo can be found here (https://apt.llvm.org/).

deb http://apt.llvm.org/stretch/ llvm-toolchain-stretch main
deb-src http://apt.llvm.org/stretch/ llvm-toolchain-stretch main

sudo apt-get update
apt-get install libllvm-11-ocaml-dev libllvm11 llvm-11 llvm-11-dev llvm-11-doc llvm-11-examples llvm-11-runtime

After fixing the issue you can restart from step 5.

Following is the list generated by the last step, which can be copied to the machine where you want to install that.

postgresql-$PGVER_amd64.deb
postgresql-client-$PGVER_amd64.deb
postgresql-server-dev-$PGVER_amd64.deb
postgresql-plperl-$PGVER_amd64.deb
postgresql-doc-$PGVER_all.deb
postgresql-pltcl-$PGVER_amd64.deb
postgresql-plpython3-$PGVER_amd64.deb
postgresql-x_amd64.buildinfo
postgresql-x_amd64.changes

Conclusion

Most people have some misconception that a custom build is a hard way, but it only requires a little bit of managed effort.

The Percona Distribution for PostgreSQL is free to download and use. It is the best and most critical enterprise-level components from the open-source community, designed and tested to work together in one single source. 

Jan
27
2021
--

Deploying Any Version of Leading Open Source Databases for Tests and Experiments

Deploying Any Version of Leading Open Source Databases

Deploying Any Version of Leading Open Source DatabasesI want to present a tool for running a specific version of open source databases in a single instance, replication setups, and Kubernetes. AnyDbVer deploys MySQL/MariaDB/MongoDB/PostgreSQL for testing and experiments.

It Could Be Started By…

Docker (or Podman) or dbdeployer (MySQL-Sandbox successor) could also start a specific database version, but such installations are significantly different from production setups.

Ansible Playbook

There is an Ansible playbook with few roles configurable by environment variables.

Bash Scripts

You may ignore the Ansible layer because the ./anydbver bash script hides it.

In LXD containers

It’s not practical to reserve physical servers or virtual machines for all tasks. Frequently you need just a few database instances and the ability to restart it with systemd and check logs with journalctl. AnydbVer spawns one or more Linux Containers managed by Canonical (Ubuntu) LXD, and LXD containers support systemd not using a layered filesystem setup.

Best Performance with Linux Running on Hardware Directly

Linux Containers is not an emulation layer. From the “host” system, you can see all processes and files created by containers. Thus the performance is very similar to the same database running on a physical server.

You Can Run Vagrant + VirtualBox as Well, For Other OS

LXD setup is relatively easy, but sometimes you may need to run AnydbVer just with a few commands. Vagrant could start an unmodified Ubuntu virtual machine and automatically configure AnydbVer for you.

You can find the required files and setup instructions at https://github.com/ihanick/anydbver.

Clone the https://github.com/ihanick/anydbver repository to setup LXD or start Vagrant.

git clone https://github.com/ihanick/anydbver.git
cd anydbver
vagrant up
vagrant ssh
cd anydbver

Single Instance Usage

Imagine that you need the exact CentOS 7 package version: Percona Server for MySQL 5.6.43-rel84.3:

$ ./anydbver deploy percona-server:5.6.43-rel84.3
$ ./anydbver ssh
$ mysql
mysql> select version();

You are not limited to using full version specification. To use the latest matching version, reduce 5.6.43-rel84.3 down to 5.6.43 or even 5.6. To run other databases, replace percona-server with:

  • pxc: Percona XtraDB Cluster
  • mysql: Oracle MySQL Community Server
  • mariadb: MariaDB
  • mariadb-cluster: MariaDB Galera Cluster
  • pg: Postgres build from https://www.postgresql.org/
  • ppg: Percona Distribution for PostgreSQL
  • psmdb: Percona Server for MongoDB

Multiple Instances

Several containers are not consuming a significant amount of resources. Actually five MySQL container instances will consume the same resources as five individual processes running on the same host. The syntax is:

$ ./anydbver deploy <default node definition> node1 <node1 definition> node2 <node2 definition> ...

For example, run two independent MySQL instances:

$ ./anydbver mysql node1 mysql

SSH access to default node:

$ ./anydbver ssh
# or
$ ./anydbver ssh default

All other nodes (replace node1 with other node names):

$ ./anydbver ssh node1

Hostnames

You may have a server hostname specified. For example let’s run two Percona Server for MySQL instances:

$ ./anydbver deploy ps:5.7 hostname:leader.percona.local node1 ps:5.7 hostname:follower.percona.local
$ ./anydbver ssh leader
or ./anydbver ssh leader.percona or leader.percona.local
[root@leader ~]# mysql --host follower.percona.local --prompt '\h mysql>'
follower.percona.local mysql>

Replication

The most interesting part of modern open-source databases is replication. Even active-active replication setups are starting from a single server (leader or master/primary). Start the first node normally and attach additional nodes with master:nodename or leader:nodename. PXC or Galera servers could participate in both synchronous and asynchronous replication. Thus, for Galera clusters, you need galera-master or galera-leader syntax.

Start a 3 node Percona XtraDB cluster (latest 5.7):

./anydbver deploy pxc:5.7 node1 pxc:5.7 galera-master:default node2 pxc:5.7 galera-master:default

Run master and two async slaves with Percona Server for MySQL and add all three servers to ProxySQL setup:

./anydbver deploy ps:5.7 node1 ps:5.7 master:default node2 ps:5.7 master:default node3 proxysql master:default

Setup physical replication with slots for Postgresql 12.3:

./anydbver deploy pg:12.3 node1 pg:12.3 master:default

Make a Mongo replica set named rs0:

./anydbver deploy psmdb replica-set:rs0 node1 psmdb master:default replica-set:rs0 node2 psmdb master:default replica-set:rs0

MongoDB Sharding

MongoDB sharding setup requires several server types: servers with data (shardsrv), configuration servers (configsrv), and mongos server:

./anydbver deploy \
psmdb:4.2 replica-set:rs0 shardsrv \
node1 psmdb:4.2 master:default replica-set:rs0 shardsrv \
node2 psmdb:4.2 master:default replica-set:rs0 shardsrv \
node3 psmdb:4.2 configsrv replica-set:cfg0 \
node4 psmdb:4.2 configsrv replica-set:cfg0 master:node3 \
node5 psmdb:4.2 configsrv replica-set:cfg0 master:node3 \
node6 psmdb:4.2 mongos-cfg:cfg0/node3,node4,node5 mongos-shard:rs0/default,node1,node2

Containers and Orchestration

The fact that we are already using containers (LXD) shouldn’t confuse you. We can still run docker images inside our nodes. Nested containers and Podman makes it possible.

Run Percona Monitoring and Management Docker containers

Let’s deploy the default node with Podman and run the Percona Monitoring and Management (PMM) server docker container in it. Percona Server for MySQL 5.7 with PMM client will run on node1:

./anydbver deploy pmm node1 ps:5.7 pmm-client pmm-server:default

Run multi-node Kubernetes cluster

Kubernetes also could utilize nested containers. There are several small Kubernetes distributions: minikube, microk8s, k3s, k0s. The simplest fully functional Kubernetes could be implemented with k3s.

The first k3s LXD container executes the API server and workers. Additional LXD containers could run more workers. Multiple workers are important to run complex HA applications with hostname anti-affinity.

./anydbver deploy k3s node1 k3s-master:default node2 k3s-master:default
./anydbver ssh

On a default node, we can execute kubectl or helm:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ihanick-node2 Ready <none> 14m v1.19.2+k3s1
ihanick-default Ready master 17m v1.19.2+k3s1
ihanick-node1 Ready <none> 15m v1.19.2+k3s1

Percona Kubernetes Operator for Percona XtraDB Cluster

The Kubernetes cluster allows running stateful applications, like databases, services (similar to Amazon AWS S3), or monitoring solutions. Let’s start:

  • 4 node cluster
  • MinIO (implements S3 api) for backups
  • Percona XtraDB Cluster (managed by the operator)
  • Percona Monitoring and Management
./anydbver deploy k3s \
node1 k3s-master:default \
node2 k3s-master:default \
node3 k3s-master:default \
default k8s-minio k8s-pmm k8s-pxc backup
./anydbver ssh
kubectl get pods
NAME READY STATUS RESTARTS AGE
svclb-monitoring-service-n5nsx 1/1 Running 0 20m
svclb-monitoring-service-htssw 1/1 Running 0 20m
svclb-monitoring-service-n9kt4 1/1 Running 0 20m
svclb-monitoring-service-7btbh 1/1 Running 0 20m
minio-service-6db6667fb9-tk69n 1/1 Running 0 20m
monitoring-0 1/1 Running 0 20m
percona-xtradb-cluster-operator-7886ccf6b5-rtwxc 1/1 Running 0 18m
cluster1-pxc-0 2/2 Running 2 17m
cluster1-haproxy-0 2/2 Running 0 17m
cluster1-haproxy-1 2/2 Running 0 12m
cluster1-haproxy-2 2/2 Running 0 12m
cluster1-pxc-1 2/2 Running 1 13m
cluster1-pxc-2 2/2 Running 1 10m

Percona Kubernetes Operator for Percona Server for MongoDB

You can install and configure the database in the same way, regardless of the architecture:

./anydbver deploy k3s \
  node1 k3s-master:default \
  node2 k3s-master:default \
  node3 k3s-master:default \
  default k8s-minio k8s-mongo backup

Zalando Postgres Operator

You are not limited to Percona-only products and can run other K8s operators:

./anydbver deploy k3s node1 k3s-master:default node2 k3s-master:default node3 k3s-master:default default k8s-pg

Summary

AnyDbVer is a useful deployment tool for experiments with:

AnyDbVer restrictions:

  • It is not a production deployment tool.
  • The deployment process takes a long time. The tool downloads and installs all packages from OS and Vendor repositories.
Jan
22
2021
--

PostgreSQL on ARM-based AWS EC2 Instances: Is It Any Good?

PostgreSQL on ARM-based AWS EC2

The expected growth of ARM processors in data centers has been a hot topic for discussion for quite some time, and we were curious to see how it performs with PostgreSQL. The general availability of ARM-based servers for testing and evaluation was a major obstacle. The icebreaker was when AWS announced their ARM-based processors offering in their cloud in 2018. But we couldn’t see much excitement immediately, as many considered it is more “experimental” stuff. We were also cautious about recommending it for critical use and never gave enough effort in evaluating it.  But when the second generation of Graviton2 based instances was announced in May 2020, we wanted to seriously consider. We decided to take an independent look at the price/performance of the new instances from the standpoint of running PostgreSQL.

Important: Note that while it’s tempting to call this comparison of PostgreSQL on x86 vs arm, that would not be correct. These tests compare PostgreSQL on two virtual cloud instances, and that includes way more moving parts than just a CPU. We’re primarily focusing on the price-performance of two particular AWS EC2 instances based on two different architectures.

Test Setup

For this test, we picked two similar instances. One is the older

m5d.8xlarge

, and the other is a new Graviton2-based

m6gd.8xlarge

. Both instances come with local “ephemeral” storage that we’ll be using here. Using very fast local drives should help expose differences in other parts of the system and avoid testing cloud storage. The instances are not perfectly identical, as you’ll see below, but are close enough to be considered same grade. We used Ubuntu 20.04 AMI and PostgreSQL 13.1 from pgdg repo. We performed tests with small (in-memory) and large (io-bound) database sizes.

Instances

Specifications and On-Demand pricing of the instances as per the AWS Pricing Information for Linux in the Northern Virginia region. With the currently listed prices,

m6gd.8xlarge

is 25% cheaper.

Graviton2 (arm) Instance

Instance : m6gd.8xlarge 	
Virtual CPUs : 32
RAM  : 128 GiB 	
Storage : 1 x 1900 NVMe SSD (1.9 TiB)
Price : $1.4464 per Hour

Regular (x86) Instance

Instance : m5d.8xlarge
Virtual CPUs : 32
RAM : 128 GiB
Storage : 2 x 600 NVMe SSD (1.2 TiB)
Price : $1.808 per Hour

OS and PostgreSQL setup

We selected Ubuntu 20.04.1 LTS AMIs for the instances and didn’t change anything on the OS side. On the m5d.8xlarge instance, two local NVMe drives were unified in a single raid0 device. PostgreSQL was installed using

.deb

packages available from the PGDG repository.

The PostgreSQL version string shows confirm the OS architecture

postgres=# select version();
                                                                version                                                                 
----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)

** aarch64 stands for 64-bit ARM architecture

The following PostgreSQL configuration was used for testing.

max_connections = '200'
shared_buffers = '32GB'
checkpoint_timeout = '1h'
max_wal_size = '96GB'
checkpoint_completion_target = '0.9'
archive_mode = 'on'
archive_command = '/bin/true'
random_page_cost = '1.0'
effective_cache_size = '80GB'
maintenance_work_mem = '2GB'
autovacuum_vacuum_scale_factor = '0.4'
bgwriter_lru_maxpages = '1000'
bgwriter_lru_multiplier = '10.0'
wal_compression = 'ON'
log_checkpoints = 'ON'
log_autovacuum_min_duration = '0'

pgbench Tests

First, a preliminary round of tests is done using pgbench, the micro-benchmarking tool available with PostgreSQL. This allows us to test with a different combination of a number of clients and jobs like:

pgbench -c 16 -j 16 -T 600 -r

Where 16 client connections and 16 pgbench jobs feeding the client connections are used.

Read-Write Without Checksum

The default load that

pgbench

creates is a tpcb-like Read-write load. We used the same on a PostgreSQL instance which doesn’t have checksum enabled.

We could see a 19% performance gain on ARM.

x86 (tps) 28878
ARM (tps) 34409

Read-Write With Checksum

We were curious whether the checksum calculation has any impact on Performance due to the architecture difference. if the PostgreSQL level checksum is enabled. PostgreSQL 12 onwards, the checksum can be enabled using pg_checksum utility as follows:

pg_checksums -e -D $PGDATA

x86 (tps) 29402
ARM (tps) 34701

To our surprise, the results were marginally better! Since the difference is around just 1.7%, we consider it as a noise. At least we feel that it is ok to conclude that enabling checksum doesn’t have any noticeable performance degradation on these modern processors.

Read-Only Without Checksum

Read-only loads are expected to be CPU-centric. Since we selected a database size that fully fits into memory, we could eliminate IO related overheads.

x86 (tps) 221436.05
ARM (tps) 288867.44

The results showed a 30% gain in tps for the ARM than the x86 instance.

Read-Only With Checksum

We wanted to check whether we could observe any tps change if we have checksum enabled when the load becomes purely CPU centric.

x86 (tps) 221144.3858
ARM (tps) 279753.1082

The results were very close to the previous one, with 26.5% gains.

In pgbench tests, we observed that as the load becomes CPU centric, the difference in performance increases. We couldn’t observe any performance degradation with checksum.

Note on checksums

PostgreSQL calculates and writes checksum for pages when they are written out and read in the buffer pool. In addition, hint bits are always logged when checksums are enabled, increasing the WAL IO pressure. To correctly validate the overall checksum overhead, we would need longer and larger testing, similar to once we did with sysbench-tpcc.

Testing With sysbench-tpcc

We decided to perform more detailed tests using sysbench-tpcc. We were mainly interested in the case where the database fits into memory. On a side note, while PostgreSQL on the arm server showed no issues, sysbench was much more finicky compared to the x86 one.

Each round of testing consisted of a few steps:

  1. Restore the data directory of the necessary scale (10/200).
  2. Run a 10-minute warmup test with the same parameters as the large test.
  3. Checkpoint on the PG side.
  4. Run the actual test.

In-memory, 16 threads:

In-memory, 16 threads

With this moderate load, the ARM instance shows around 15.5% better performance than the x86 instance. Here and after, the percentage difference is based on the mean tps value.

You might be wondering why there is a sudden drop in performance towards the end of the test. It is related to checkpointing with

full_page_writes

. Even though for in-memory testing we used pareto distribution, a considerable amount of pages is going to be written out after each checkpoint. In this case, the instance showing more performance triggered checkpoint by WAL earlier than its counterpart. These dips are going to be present across all tests performed.

In-memory, 32 threads:

In-memory, 32 threads

When concurrency increased to 32, the difference in performance reduced to nearly 8%.

In-memory, 64 threads:

In-memory, 64 threads

Pushing instances close to their saturation point (remember, both are 32-cpu instances), we see the difference reducing further to 4.5%.

In-memory, 128 threads:

In-memory, 128 threads

When both instances are past their saturation point, the difference in performance becomes negligible, although it’s still there at 1.4% Additionally, we could observe a 6-7% drop in throughput(tps) for ARM and a 4% drop for x86 when concurrency increased from 64 to 128 on these 32 vCPU machines.

Not everything we measured is favorable to the Graviton2-based instance. In the IO-bound tests (~200G dataset, 200 warehouses, uniform distribution), we saw less difference between the two instances, and at 64 and 128 threads, regular m5d instance performed better. You can see this on the combined plots below.

A possible reason for this, especially the significant meltdown at 128 threads for m6gd.8xlarge, is that it lacks the second drive that m5d.8xlarge has. There’s no perfectly comparable couple of instances available currently, so we consider this a fair comparison; each instance type has an advantage. More testing and profiling is necessary to correctly identify the cause, as we expected local drives to negligibly affect the tests. IO-bound testing with EBS can potentially be performed to try and remove the local drives from the equation.

More details of the test setup, results of the tests, scripts used, and data generated during the testing are available from this GitHub repo.

Summary

There were not many cases where the ARM instance becomes slower than the x86 instance in the tests we performed. The test results were consistent throughout the testing of the last couple of days. While ARM-based instance is 25 percent cheaper, it is able to show a 15-20% performance gain in most of the tests over the corresponding x86 based instances. So ARM-based instances are giving conclusively better price-performance in all aspects. We should expect more and more cloud providers to provide ARM-based instances in the future. Please let us know if you wish to see any different type of benchmark tests.

Join the discussion on Hacker News

Jan
19
2021
--

pg_stat_monitor: A New Way Of Looking At PostgreSQL Metrics

Percona pg_stat_monitor

Enter pg_stat_monitor: this extension, created here at Percona, has been developed as an advanced replacement of pg_stat_statement, providing new capabilities in addition to the standard fare.

As you may recall, PostgreSQL’s pg_stat_statements extension provides a means of tracking execution statistics of all SQL statements executed by the server. But sometimes just having the basics is not enough as the queries, and their generated metrics, are presented as one big “clump” of data. It neither provides aggregated statistics nor histogram information. This makes it difficult to identify problem queries during peak loading times versus periods of normal loading versus report generation during off-peak times. In order to trend server performance, one is therefore either forced to manually generate the needed data and calculate the aggregate, which can become quite the chore, or use a 3rd party monitoring solution presenting the results on a dashboard and which brings its own overhead requirements (and headaches?).

Download/Compile/Install

Depending upon your circumstances, I’m going to share with you three (3) methods one of which you can use to obtain and try out pg_stat_monitor. Although I’m limiting the installation instructions to Ubuntu, one can, of course, install it on the Redhat/CENTOS distros too.

Method 1: The Percona Distribution For PostgreSQL

The easiest way, of course, is downloading and installing the extension from our own Percona repository. The following instructions are performed as root on your OS.

First things first, update your distribution packages:

apt update
apt upgrade -y

Install the Percona repository and download the latest Percona release package:

apt install -y wget gnupg2 lsb-release curl
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb

The package percona-release_latest.generic_all.deb installs a binary you will use to install the packages of interest:

dpkg -i percona-release_latest.generic_all.deb

And now you install both pg_stat_monitor and Percona’s own bundled version of PostgreSQL at the same time!

percona-release setup ppg-12
apt install -y percona-postgresql-12 percona-pg-stat-monitor12

Attention: At this time, the only real difference between Percona’s and the community version of PostgreSQL is that it is located under a different path. However, over time we will be adding additional capabilities in the form of extensions. This is a work in progress, stay tuned!

Method 2: Compile And Install (Community PostgreSQL Repository)

Suppose you aren’t using the Percona Distribution For PostgreSQL, shame on you, but instead, the community version downloaded from postgresql.org. Although a little more detailed, this set of instructions will help you install the extension with your current version of PostgreSQL. In this case, one downloads the source code for pg_stat_monitor and compiles using the community PostgreSQL development packages.

Download the source code directly from our GIT HUB:

git clone https://github.com/percona/pg_stat_monitor

For demonstration purposes, we use version 12, although any version newer than 11 will work just fine. Be warned; there are a lot of packages! My tests, while writing this blog, required 500MB of additional space:

apt install -y postgresql-server-dev-12 git make gcc

Now it’s an easy matter of compiling; execute the following as root:

cd pg_stat_monitor
make USE_PGXS=1
make USE_PGXS=1 install

Method 3: Roll Your Own Packages

The method lends itself well for production environments by bundling your own package, whether it be DEB or RPM, using the FPM package management system.

FPM is a command-line program designed to help you build packages.

Author’s note:  I won’t go into details using FPM due to the complexity involved in getting and configuring it  … unless you’d like me to, and in which case, I’d be very happy to write a blog about this really cool tool. ?

Building a package might look something like this:

fpm -s <source type> -t <target type> [list of sources]…

“Source type” is what your package is coming from; a directory (dir), a rubygem (gem), an rpm (rpm), a python package (python), a PHP pear module (pear), etc.

“Target type” is what your output package form should be, such as RPM and DEB.

Method 4: Using PGXN

pg_stat_monitor is released on PGXN and the latest stable version is available there. It is easily downloadable using pgxn utility.

pgxn install pg_star_monitor.

Create Extension “pg_stat_monitor”

Once compiled and installed, this next step is straightforward. Update the PostgreSQL runtime parameters so it sees the extension’s module:

-- there's more than one way to configure this parameter
alter system set shared_preload_libraries = 'pg_stat_monitor';

Restart the server:

systemctl restart postgresql

You can install this on any database as this is a data cluster-wide extension:

create extension pg_stat_monitor;

There are two views:

  • The first view is pg_stat_monitor which is similar to pg_stat_statements in that you can view generated metrics in real-time.
  • The second view, pg_stat_monitor_settings, returns the entire suite of parameters defining and controlling this extension’s behavior. One can edit these parameters using ALTER SYSTEM.
List of relations
Schema | Name                     | Type | Owner
-------+--------------------------+------+----------
public | pg_stat_monitor          | view | postgres
public | pg_stat_monitor_settings | view | postgres

db01=# \d pg_stat_monitor_settings

View "public.pg_stat_monitor_settings"
         Column | Type    | Collation | Nullable | Default
 ---------------+---------+-----------+----------+---------
         name   | text    |           |          |
          value | integer |           |          |
  default_value | integer |           |          |
    description | text    |           |          | 
        minimum | integer |           |          |
        maximum | integer |           |          |
        restart | integer |           |          |

Using pg_stat_monitor

Let’s generate some activity using pgbench:

pgbench -i db01
pgbench -c 4 -j 2 -T 300 -b tpcb-like db01 > /dev/null 2>&1 &

Now query the view pg_stat_monitor, returning the top ten results for all operations on the current database:

select application_name,
       userid::regrole AS user_name,
       datname AS database_name,
       substr(query,0, 50) AS query,
       calls,
       client_ip
from pg_stat_monitor, pg_database
where dbid = oid
order by calls desc, application_name
limit 10;

And here’s our results, notice pg_stat_monitor returns information similarly to, but not quite, to pg_stat_statements:

application_name | user_name | db    | query                                             | calls | client_ip
------------------+-----------+-------+---------------------------------------------------+-------+-----------
          pgbench |  postgres | db01  | UPDATE pgbench_branches SET bbalance = bbalance + | 3523  | 127.0.0.1
          pgbench |  postgres | db01  | END                                               | 3523  | 127.0.0.1
          pgbench |  postgres | db01  | UPDATE pgbench_tellers SET tbalance = tbalance +  | 3523  | 127.0.0.1
          pgbench |  postgres | db01  | SELECT abalance FROM pgbench_accounts WHERE aid = | 3523  | 127.0.0.1
          pgbench |  postgres | db01  | UPDATE pgbench_accounts SET abalance = abalance + | 3523  | 127.0.0.1
          pgbench |  postgres | db01  | INSERT INTO pgbench_history (tid, bid, aid, delta | 3523  | 127.0.0.1
          pgbench |  postgres | db01  | BEGIN                                             | 3523  | 127.0.0.1
          pgbench | postgres  | db01  | END                                               | 3257  | 127.0.0.1
          pgbench | postgres  | db01  | INSERT INTO pgbench_history (tid, bid, aid, delta | 3257  | 127.0.0.1
          pgbench | postgres  | db01  | UPDATE pgbench_branches SET bbalance = bbalance + | 3256  | 127.0.0.1

This query highlights the key difference between pg_stat_monitor and pg_stat_statements, i.e. aggregating performance over a time interval using buckets:

postgres=# SELECT bucket,
                  bucket_start_time,
                  application_name,
                  datname AS database_name,
                  substr(query,0, 50) AS query,
                  calls 
            FROM pg_stat_monitor
            LIMIT 10;

 bucket |  bucket_start_time  | application_name | database_name |                       query                       | calls 
——–+———————+——————+—————+—————————————————+——-
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | vacuum analyze pgbench_history                    |     1
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | select count(*) from pgbench_branches             |     1
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | UPDATE pgbench_accounts SET abalance = abalance + |  1375
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | begin                                             |     1
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | UPDATE pgbench_branches SET bbalance = bbalance + |  1372
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | copy pgbench_accounts from stdin                  |     1
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | create table pgbench_branches(bid int not null,bb |     1
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | alter table pgbench_accounts add primary key (aid |     1
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | INSERT INTO pgbench_history (tid, bid, aid, delta |  1372
    5 | 2021-01-14 14:49:10 | pgbench          | db01          | alter table pgbench_branches add primary key (bid |     1
(10 rows)

Updating pg_stat_monitor_settings

View pg_stat_monitor_settings returns those parameters controlling the metrics that gather the data.

This example query returns the list of runtime parameters that can be edited:

select name,description from pg_stat_monitor_settings;

As you can see, one has substantially more parameters allowing for targeted investigations and analysis:

                    name                      |                                               description                                                
-----------------------------------------------+----------------------------------------------------------------------------------------------------------
 pg_stat_monitor.pgsm_max                      | Sets the maximum size of shared memory in (MB) used for statement's metadata tracked by pg_stat_monitor.
 pg_stat_monitor.pgsm_query_max_len            | Sets the maximum length of query.
 pg_stat_monitor.pgsm_enable                   | Enable/Disable statistics collector.
 pg_stat_monitor.pgsm_track_utility            | Selects whether utility commands are tracked.
 pg_stat_monitor.pgsm_normalized_query         | Selects whether save query in normalized format.
 pg_stat_monitor.pgsm_max_buckets              | Sets the maximum number of buckets.
 pg_stat_monitor.pgsm_bucket_time              | Sets the time in seconds per bucket.
 pg_stat_monitor.pgsm_respose_time_lower_bound | Sets the time in millisecond.
 pg_stat_monitor.pgsm_respose_time_step        | Sets the response time steps in millisecond.
 pg_stat_monitor.pgsm_query_shared_buffer      | Sets the maximum size of shared memory in (MB) used for query tracked by pg_stat_monitor.
(10 rows)

Altering the runtime parameters is as simple as executing ALTER SYSTEM:

-- the default length is 1,024 characters
show pg_stat_monitor.pgsm_query_max_len;

-- increase the query length to 2,048 characters
alter system set pg_stat_monitor.pgsm_query_max_len = 2048;

Restarting the server updates the parameter:

systemctl restart postgresql

Here is the updated value:

SELECT name, 
       value 
FROM pg_stat_monitor_settings 
where name = 'pg_stat_monitor.pgsm_query_max_len';

                               name | value
------------------------------------+-------
 pg_stat_monitor.pgsm_query_max_len | 2048

Error Monitoring

pg_stat_monitor not only records the successful queries but all the ERROR and WARNINGS too.

SELECT decode_error_level(elevel) AS level,
query, message FROM pg_stat_monitor WHERE elevel != 0; 
elevel | sqlcode  |         query         |             message 
--------+----------+-- ---------------------+---------------------------------- 
ERROR  | 16908420 | SELECT * FROM pg_foo; | relation "pg_foo" does not exist
ERROR  | 33816706 | SELECT 1/0;           | division by zero

Tell us what you think; your feedback is important!

Jan
14
2021
--

Webinar January 28: Tuning PostgreSQL for High Performance and Optimization

Tuning PostgreSQL webinar

Tuning PostgreSQL webinarPostgreSQL is one of the leading open-source databases, but, out of the box, the default PostgreSQL configuration is not tuned for any workload. Thus, any system with the least resources can run it. PostgreSQL does not give optimum performance on high permanence machines because it is not using all available resources. PostgreSQL provides a system where you can tune your database according to your workload and machine specifications. In addition to PostgreSQL, we can also tune our Linux box so that the database load can work optimally.

In this webinar on Tuning PostgreSQL for High Performance and Optimization, we will learn how to tune PostgreSQL and we’ll see the results of that tuning. We will also touch on tuning some Linux kernel parameters.

Please join Ibrar Ahmed, Percona Software Engineer, on January 28, 2021, at 1 pm EST as he presents his webinar “Tuning PostgreSQL for High Performance and Optimization.”

Register for Webinar

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Jan
13
2021
--

Percona 2020 Recap: Great Content and Software Releases

Percona 2020 content and releases

Percona 2020 content and releasesThe Percona team provided the community with some excellent content and several new releases in 2020. I wanted to highlight some of your favorites (based on popularity) if you missed them.

First up is our most-read blog from last year, which ironically was published before 2020. Ananias Tsalouchidis’s blog on when you should use Aurora and when should you use RDS MYSQL continued to attract readers all year long. People don’t always understand the key differences between the two, so having a guide is great and timely for many.

What about the most read blogs or watched videos published in 2020?

PostgreSQL Takes Our Most-Read Spot of 2020

The Percona blog is known for its great in-depth MySQL coverage, but experts in the MongoDB and PostgreSQL space have also written some quality content over the last few years. It is exciting to see that the most popular blog published last year was outside of MySQL: Ibrar Ahmed’s deep dive into handling null values in PostgreSQL.

Interested in the top six PostgreSQL reads from 2020? Here they are:

We also had some fantastic conference talks this year you may want to check out. Here are the most-watched PostgreSQL videos of 2020:

Awesome PostgreSQL talks and blogs from the community:

Our Percona University Online posted its first PostgreSQL training last year; if you are looking for a deeper understanding of indexes (and who isn’t), check out our training, Deep Dive Into PostgreSQL Indexes.

MySQL is Still as Popular as Ever

Even though PostgreSQL took this year’s top spot, not too far behind was a great blog series by our CEO Peter Zaitsev on solving MySQL bottlenecks. His three-part series, 18 things you can do to remove MySQL Bottlenecks caused by high traffic, was not only highly read, but it also spawned one of the most-watched webinars of the year. Scalability and performance are critical to any application and can mean life or death for any application. A vital read and a great link to bookmark for when you have one of those odd performance issues you can not seem to find!

Interested in the top five MySQL reads from 2020? Here they are:

Interested in watching some outstanding MySQL sessions? Check out some of the most-watched MySQL sessions of 2020:

Awesome MySQL talks and blogs from the community:

Our Percona University Online posted its first MySQL training; if you are looking at how to upgrade to MySQL 8, it is worth watching. Check out the training, How to Upgrade to MySQL 8.0.

The Staying Power of MongoDB is Undeniable

MongoDB growth in 2020 was undeniable, which is why it’s no surprise that another one of our top blogs was on MongoDB. Percona most-read tech blog on MongoDB published in 2020 was Vinicius Grippa’s must-read work outlining the best practices for running MongoDB. If you are new or old to MongoDB, it is worth reading and double-checking to ensure you have MongoDB optimized.

Interested in the top five MongoDB reads from 2020? Here they are:

Interested in watching some MongoDB sessions? Check out some of the most-watched MongoDB sessions of 2020:

Awesome MongoDB talks and blogs from the community:

More Popular Blogs and Discussions

Sometimes topics cross databases and delve into general advice. Let’s look at some of the more popular talks and blogs that are not tied to a specific database.

If you like videos, you may want to check out these great Percona Live Sessions from last year:

Other Popular Blogs:

Finally, Some Great Percona Software Released This Year

Here is the list of interesting software changes and news on Percona software in 2020:

Percona Distributions for MongoDB and MySQL:

  • What are Percona distributions? We take the best components from the community and ensure they work together. This way, you know your backup, HA, monitoring, etc., will all work together seamlessly.

Percona XtraDB Cluster 8.0 (PXC) was released, with improved performance, scalability, and security. Long sought after features include:

  • Streaming replication to support larger transactions
  • More granular and improved logging and troubleshooting options
  • Multiple system tables help find out more about the state of the cluster state.
  • Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases), avoiding manual intervention and simplifying operation in the cloud.

Percona Distribution for PostgreSQL 13. Version 13 of PostgreSQL was a leap forward, and our distribution was updated to support all the additional functionality. Better indexing, better performance, and better security! Sign me up!

Percona Monitoring And Management (PMM) jumped forward from 2.2 to 2.13 adding some very cool features like:

  • Alert manager integration and integrated alerting
  • A brand new Query Analyzer with awesome features to allow you to find problem queries quicker and more efficiently
  • Enhanced metrics for AWS RDS monitoring
  • Added support for External Exporters so you can monitor 3rd party and custom services through the installed PMM-agent
  • New security threat tool allows for alerts and visibility into the most common security issues
  • Support for group replication
  • Better MongoDB and PostgreSQL monitoring
  • Better support for larger environments (Monitor More Stuff Faster)
  • Plus a ton of misc small enhancements!

Percona Kubernetes Operator for Percona XtraDB Cluster continued to evolve with several new features helping users build their own DYI DBaaS:

  • Auto-Tuning MySQL Parameters
  • Integration with Percona Monitoring and Management
  • Full data encryption at rest
  • Support for Percona XtraDB Cluster 8.0
  • Support for the latest version of Open Shift and Amazon’s Elastic Container Service
  • Dual support for ProxySQL and HA Proxy
  • Automated minor upgrades
  • Clone backups to set up a new PXC cluster on a different Kubernetes cluster

Percona Kubernetes Operator for Percona Server for MongoDB added several features, including:

  • Support for Percona Server for MongoDB 4.4
  • Automated management of system users
  • Support for the latest version of Open Shift and Amazon’s Elastic Container Service
  • Automated minor upgrades

While 2020 was far from the best year for many of us and we are glad it is behind us, it did generate some good content that we can use in 2021 and going forward to help us better manage and run our databases. Thanks for reading and happy database tuning!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com