Jun
11
2021
--

PostgreSQL HA with Patroni: Your Turn to Test Failure Scenarios

PostgreSQL HA with Patroni

A couple of weeks ago, Jobin and I did a short presentation during Percona Live Online bearing a similar title as the one for this post: “PostgreSQL HA With Patroni: Looking at Failure Scenarios and How the Cluster Recovers From Them”. We deployed a 3-node PostgreSQL environment with some recycled hardware we had lying around and set ourselves at “breaking” it in different ways: by unplugging network and power cables, killing main processes, attempting to saturate processors. All of this while continuously writing and reading data from PostgreSQL. The idea was to see how Patroni would handle the failures and manage the cluster to continue delivering service. It was a fun demo!

We promised a follow-up post explaining how we set up the environment, so you could give it a try yourselves, and this is it. We hope you also have fun attempting to reproduce our small experiment, but mostly that you use it as an opportunity to learn how a PostgreSQL HA environment managed by Patroni works in practice: there is nothing like a hands-on lab for this!

Initial Setup

We recycled three 10-year old Intel Atom mini-computers for our experiment but you could use some virtual machines instead: even though you will miss the excitement of unplugging real cables, this can still be simulated with a VM. We installed the server version of Ubuntu 20.04 and configured them to know “each other” by hostname; here’s how the hosts file of the first node looked like:

$ cat /etc/hosts
127.0.0.1 localhost node1
192.168.1.11 node1
192.168.1.12 node2
192.168.1.13 node3

etcd

Patroni supports a myriad of systems for Distribution Configuration Store but etcd remains a popular choice. We installed the version available from the Ubuntu repository on all three nodes:

sudo apt-get install etcd

It is necessary to initialize the etcd cluster from one of the nodes and we did that from node1 using the following configuration file:

$ cat /etc/default/etcd
ETCD_NAME=node1
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380"
ETCD_INITIAL_CLUSTER_TOKEN="devops_token"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.11:2380"
ETCD_DATA_DIR="/var/lib/etcd/postgresql"
ETCD_LISTEN_PEER_URLS="http://192.168.1.11:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.11:2379,http://localhost:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.11:2379"

Note how ETCD_INITIAL_CLUSTER_STATE is defined with “new”.

We then restarted the service:

sudo systemctl restart etcd

We can then move on to install etcd on node2. The configuration file follows the same structure as that of node1, except that we are adding node2 to an existing cluster so we should indicate the other node(s):

ETCD_NAME=node2
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380,node2=http://192.168.1.12:2380"
ETCD_INITIAL_CLUSTER_TOKEN="devops_token"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.12:2380"
ETCD_DATA_DIR="/var/lib/etcd/postgresql"
ETCD_LISTEN_PEER_URLS="http://192.168.1.12:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.12:2379,http://localhost:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.12:2379"

Before we restart the service, we need to formally add node2 to the etcd cluster by running the following command on node1:

sudo etcdctl member add node2 http://192.168.1.12:2380

We can then restart the etcd service on node2:

sudo systemctl restart etcd

The configuration file for node3 looks like this:

ETCD_NAME=node3
ETCD_INITIAL_CLUSTER="node1=http://192.168.1.11:2380,node2=http://192.168.1.12:2380,node3=http://192.168.1.13:2380"
ETCD_INITIAL_CLUSTER_TOKEN="devops_token"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.1.13:2380"
ETCD_DATA_DIR="/var/lib/etcd/postgresql"
ETCD_LISTEN_PEER_URLS="http://192.168.1.13:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.1.13:2379,http://localhost:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.1.13:2379"

Remember we need to add node3 to the cluster by running the following command on node1:

sudo etcdctl member add node3 http://192.168.1.13:2380

before we can restart the service on node3:

sudo systemctl restart etcd

We can verify the cluster state to confirm it has been deployed successfully by running the following command from any of the nodes:

$ sudo etcdctl member list
2ed43136d81039b4: name=node3 peerURLs=http://192.168.1.13:2380 clientURLs=http://192.168.1.13:2379 isLeader=false
d571a1ada5a5afcf: name=node1 peerURLs=http://192.168.1.11:2380 clientURLs=http://192.168.1.11:2379 isLeader=true
ecec6c549ebb23bc: name=node2 peerURLs=http://192.168.1.12:2380 clientURLs=http://192.168.1.12:2379 isLeader=false

As we can see above, node1 is the leader at this point, which is expected since the etcd cluster has been bootstrapped from it. If you get a different result, check for etcd entries logged to /var/log/syslog on each node.

Watchdog

Quoting Patroni’s manual:

Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.

While the use of a watchdog mechanism with Patroni is optional, you shouldn’t really consider deploying a PostgreSQL HA environment in production without it.

For our tests, we used the standard software implementation for watchdog that is shipped with Ubuntu 20.04, a module called softdog. Here’s the procedure we used in all three nodes to configure the module to load:

sudo sh -c 'echo "softdog" >> /etc/modules'

Patroni will be the component interacting with the watchdog device. Since Patroni is run by the postgres user, we need to either set the permissions of the watchdog device open enough so the postgres user can write to it or make the device owned by postgres itself, which we consider a safer approach (as it is more restrictive):

sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules'

These two steps looked like all that would be required for watchdog to work but to our surprise, the softdog module wasn’t loaded after restarting the servers. After spending quite some time digging around we figured the module was blacklisted by default and there was a strain file with such a directive still lingering around:

$ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdog
/lib/modprobe.d/blacklist_linux_5.4.0-72-generic.conf:blacklist softdog

Editing that file in each of the nodes to remove the line above and restarting the servers did the trick:

$ lsmod | grep softdog
softdog                16384  0

$ ls -l /dev/watchdog*
crw-rw---- 1 postgres postgres  10, 130 May 21 21:30 /dev/watchdog
crw------- 1 root     root     245,   0 May 21 21:30 /dev/watchdog0

PostgreSQL

Percona Distribution for PostgreSQL can be easily installed from the Percona Repository in a few easy steps:

sudo apt-get update -y; sudo apt-get install -y wget gnupg2 lsb-release curl
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb
sudo apt-get update
sudo percona-release setup ppg-12
sudo apt-get install percona-postgresql-12

An important concept to understand in a PostgreSQL HA environment like this one is that PostgreSQL should not be started automatically by systemd during the server initialization: we should leave it to Patroni to fully manage it, including the process of starting and stopping the server. Thus, we should disable the service:

sudo systemctl disable postgresql

For our tests, we want to start with a fresh new PostgreSQL setup and let Patroni bootstrap the cluster, so we stop the server and remove the data directory that has been created as part of the PostgreSQL installation:

sudo systemctl stop postgresql
sudo rm -fr /var/lib/postgresql/12/main

These steps should be repeated in nodes 2 and 3 as well.

Patroni

The Percona Repository also includes a package for Patroni so with it already configured in the nodes we can install Patroni with a simple:

sudo apt-get install percona-patroni

Here’s the configuration file we have used for node1:

$ cat /etc/patroni/config.yml
scope: stampede
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: node1:8008

etcd:
  host: node1:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    master_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: 'on'
        max_wal_senders: 5
        max_replication_slots: 5
        wal_log_hints: "on"
        #archive_mode: "on"
        #archive_timeout: 600
        #archive_command: "cp -f %p /home/postgres/archived/%f"
        #recovery_conf:
        #restore_command: cp /home/postgres/archived/%f %p

  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 192.168.1.1/24 md5
  - host replication replicator 127.0.0.1/32 trust
  - host all all 192.168.1.1/24 md5
  - host all all 0.0.0.0/0 md5
#  - hostssl all all 0.0.0.0/0 md5

  # Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)
# post_init: /usr/local/bin/setup_cluster.sh
  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: node1:5432
  data_dir: "/var/lib/postgresql/12/main"
  bin_dir: "/usr/lib/postgresql/12/bin"
#  config_dir:
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: vagrant
    superuser:
      username: postgres
      password: vagrant
  parameters:
    unix_socket_directories: '/var/run/postgresql'

watchdog:
  mode: required # Allowed values: off, automatic, required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

With the configuration file in place, and now that we already have the etcd cluster up, all that is required is to restart the Patroni service:

sudo systemctl restart patroni

When Patroni starts, it will take care of initializing PostgreSQL (because the service is not currently running and the data directory is empty) following the directives in the bootstrap section of Patroni’s configuration file. If everything went according to the plan, you should be able to connect to PostgreSQL using the credentials in the configuration file (password is vagrant):

$ psql -U postgres
psql (12.6 (Ubuntu 2:12.6-2.focal))
Type "help" for help.

postgres=#

Repeat the operation for installing Patroni on nodes 2 and 3: the only difference is that you will need to replace the references to node1 in the configuration file (there are four of them, shown in bold) with the respective node name.

You can also check the state of the Patroni cluster we just created with:

$ sudo patronictl -c /etc/patroni/config.yml list
+----------+--------+-------+--------+---------+----+-----------+
| Cluster  | Member |  Host |  Role  |  State  | TL | Lag in MB |
+----------+--------+-------+--------+---------+----+-----------+
| stampede | node1  | node1 | Leader | running |  2 |           |
| stampede | node2  | node2 |        | running |  2 |         0 |
| stampede | node3  | node3 |        | running |  2 |         0 |
+----------+--------+-------+--------+---------+----+-----------+

node1 started the Patroni cluster so it was automatically made the leader – and thus the primary/master PostgreSQL server. Nodes 2 and 3 are configured as read replicas (as the hot_standby option was enabled in Patroni’s configuration file).

HAProxy

A common implementation of high availability in a PostgreSQL environment makes use of a proxy: instead of connecting directly to the database server, the application will be connecting to the proxy instead, which will forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read requests to one or more replicas, for load balancing. However, this is not a transparent process: the application needs to be aware of this and split read-only from read-write traffic itself. With HAproxy, this is done by providing two different ports for the application to connect. We opted for the following setup:

  • Writes   ?  5000
  • Reads   ?  5001

HAproxy can be installed as an independent server (and you can have as many as you want) but it can also be installed on the application server or the database server itself – it is a light enough service. For our tests, we planned on using our own Linux workstations (which also run Ubuntu 20.04) to simulate application traffic so we installed HAproxy on them:

sudo apt-get install haproxy

With the software installed, we modified the main configuration file as follows:

$ cat /etc/haproxy/haproxy.cfg
global
    maxconn 100

defaults
    log    global
    mode    tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen primary
    bind *:5000
    option httpchk OPTIONS /master
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 node1:5432 maxconn 100 check port 8008
    server node2 node2:5432 maxconn 100 check port 8008
    server node3 node3:5432 maxconn 100 check port 8008

listen standbys
    balance roundrobin
    bind *:5001
    option httpchk OPTIONS /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 node1:5432 maxconn 100 check port 8008
    server node2 node2:5432 maxconn 100 check port 8008
    server node3 node3:5432 maxconn 100 check port 8008

Note there are two sections: primary, using port 5000, and standbys, using port 5001. All three nodes are included in both sections: that’s because they are all potential candidates to be either primary or secondary. For HAproxy to know which role each node currently has, it will send an HTTP request to port 8008 of the node: Patroni will answer. Patroni provides a built-in REST API support for health check monitoring that integrates perfectly with HAproxy for this:

$ curl -s http://node1:8008
{"state": "running", "postmaster_start_time": "2021-05-24 14:50:11.707 UTC", "role": "master", "server_version": 120006, "cluster_unlocked": false, "xlog": {"location": 25615248}, "timeline": 1, "database_system_identifier": "6965869170583425899", "patroni": {"version": "1.6.4", "scope": "stampede"}}

We configured the standbys group to balance read-requests in a round-robin fashion, so each connection request (or reconnection) will alternate between the available replicas. We can test this in practice, let’s save the postgres user password in a file to facilitate the process:

echo "localhost:5000:postgres:postgres:vagrant" > ~/.pgpass
echo "localhost:5001:postgres:postgres:vagrant" >> ~/.pgpass
chmod 0600 ~/.pgpass

We can then execute two read-requests to verify the round-robin mechanism is working as intended:

$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
 192.168.1.13

$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
 192.168.1.12

as well as test the writer access:

$ psql -Upostgres -hlocalhost -p5000 -t -c "select inet_server_addr()"
 192.168.1.11

You can also check the state of HAproxy by visiting http://localhost:7000/ on your browser.

Workload

To best simulate a production environment to test our failure scenarios, we wanted to have continuous reads and writes to the database. We could have used a benchmark tool such as Sysbench or Pgbench but we were more interested in observing the switch of source server upon a server failure than load itself. Jobin wrote a simple Python script that is perfect for this, HAtester. As was the case with HAproxy, we run the script from our Linux workstation. Since it is a Python script, you need to have a PostgreSQL driver for Python installed to execute it:

sudo apt-get install python3-psycopg2
curl -LO https://raw.githubusercontent.com/jobinau/pgscripts/main/patroni/HAtester.py
chmod +x HAtester.py

Edit the script with the credentials to access the PostgreSQL servers (through HAproxy) if you are using different settings from ours. The only requirement for it to work is to have the target table created beforehand, so first connect to the postgres database (unless you are using a different target) in the Primary and run:

CREATE TABLE HATEST (TM TIMESTAMP);

You can then start two different sessions:

  1. One for writes:

    ./HAtester.py 5000
  2. One for reads:
    ./HAtester.py 5001

The idea is to observe what happens with database traffic when the environment experiences a failure; that is, how HAproxy will route reads and writes as Patroni adjusts the PostgreSQL cluster. You can continuously monitor Patroni from the point of view of the nodes by opening a session in each of them and running the following command:

sudo -u postgres watch patronictl -c /etc/patroni/config.yml list

To facilitate observability and better follow the changes in real-time, we used the terminal multiplexer Tmux to visualize all 5 sessions on the same screen:

  • On the left side, we have one session open for each of the 3 nodes, continuously running:

    sudo -u postgres watch patronictl -c /etc/patroni/config.yml list

    It’s better to have the Patroni view for each node independently because when you start the failure tests you will lose connection to a part of the cluster.

  • On the right side, we are executing the HAtester.py script from our workstation:
    • Sending writes through port 5000:

      ./HAtester.py 5000
    • and reads through port 5001:

      ./HAtester.py 5001

A couple of notes on the execution of the HAtester.py script:

  • Pressing Ctrl+C will break the connection but the script will reconnect, this time to a different replica (in the case of reads) due to having the Standbys group on HAproxy configured with round-robin balancing.
  • When a switchover or failover takes place and the nodes are re-arranged in the cluster, you may temporarily see writes sent to a node that used to be a replica and was just promoted as primary and reads send to a node that used to be the primary and was demoted as secondary: that’s a limitation of the HAtester.py script but “by design”; we favored faster reconnections and minimal checks on the node’s role for demonstration purposes. On a production application, this part ought to be implemented differently.

Testing Failure Scenarios

The fun part starts now! We leave it to you to test and play around to see what happens with the PostgreSQL cluster in practice following a failure. We leave as suggestions the tests we did in our presentation. For each failure scenario, observe how the cluster re-adjusts itself and the impact on read and write traffic.

1) Loss of Network Communication

  • Unplug the network cable from one of the nodes (or simulate this condition in your VM):
    • First from a replica
    • Then from the primary
  • Unplug the network cable from one replica and the primary at the same time:
    • Does Patroni experience a split-brain situation?

2) Power Outage

  • Unplug the power cable from the primary
  • Wait until the cluster is re-adjusted then plug the power cable back and start the node

3) SEGFAULT

Simulate an OOM/crash by killing the postmaster process in one of the nodes with kill -9.

4) Killing Patroni

Remember that Patroni is managing PostgreSQL. What happens if the Patroni process (and not PostgreSQL) is killed?

5) CPU Saturation

Simulate CPU saturation with a benchmark tool such as Sysbench, for example:

sysbench cpu --threads=10 --time=0 run

This one is a bit tricky as the reads and writes are each single-threaded operation. You may need to decrease the priority of the HAtester.py processes with renice, and possibly increase that of Sysbench’s.

6) Manual Switchover

Patroni facilitates changes in the PostgreSQL hierarchy. Switchover operations can be scheduled, the command below is interactive and will prompt you with options:

sudo -u postgres patronictl -c /etc/patroni/config.yml switchover

Alternatively, you can be specific and tell Patroni exactly what to do:

sudo -u postgres patronictl -c /etc/patroni/config.yml switchover --master node1 --candidate node2 --force


We hope you had fun with this hands-on lab! If you have questions or comments, leave us a note in the comments section below!

May
12
2021
--

New Survey Shows Enterprises Increasing Their Reliance on Open Source Software

changing face of open source

changing face of open sourceA survey of 200 IT decision-makers from medium to large enterprises was conducted in Q1 of 2021 by Vanson Bourne and sponsored by Percona in advance of Percona Live ONLINE 2021, which started today! 

Percona Live Online represents dozens of projects, communities, and tech companies, and features more than 150 expert speakers across 200 sessions. There’s still time to register and attend.

Register and Attend

In his keynote address on May 12 at Noon Eastern, Peter Zaitsev, CEO of Percona, addressed the findings of the survey in more detail, along with the impact of licensing changes and the importance of keeping open source truly open. 

The survey focused on the business perspective of open source software. 25% of respondents were from medium-sized enterprises (500-999 employees) and 75% were from large enterprises (over 1,000 employees). 

The Benefit of Open Source

Respondents came from a cross-section of industries and had knowledge of open source software. Their answers showed that enterprises have a deep appreciation for the value of open source software.

100% of information technology (IT) decision-makers said that “using open source provides benefits for their organization.

78% of respondents reported their use of open source software had increased over the past disruptive 12 months.

Cloud Adoption

Large enterprise respondents were most likely to have moved databases and applications to cloud services. Just 13% of large enterprises continue to have all their databases and applications running at their on-premises data center, compared with 29% of medium-size enterprises.

The transition to the cloud was accelerated by the worldwide pandemic and demand for flexible, fast, and reliable technology. However, it’s likely the increase in demand led to an increase in costs for many businesses. 68% of respondents said that cloud infrastructure has become more expensive in the past year.

The survey asked how public cloud providers can contribute back to open source. 59% said by providing better security, 48% said by encouraging open source collaboration, 43% said by improving existing code quality and 43% said by enabling open source to run on their cloud.

Licensing Changes

Nearly half of survey respondents indicated concerns over changing open source licenses, such as the Business Source License (BSL) and Server Side Public License (SSPL). They indicated that it will increase costs (44%); it encourages lock-in (37%); discourages engagement from the open source community (34%); and discourages growth in the open source market (26%).

Download the Full Survey Report

May
12
2021
--

Percona Live ONLINE: Percona Previews Open Source Database as a Service

percona open source dbaas

percona open source dbaasPercona Live ONLINE 2021 starts today!  

Representing dozens of projects, communities, and tech companies, and featuring more than 150 expert speakers across 200 sessions, there’s still time to register and attend. 

Register and Attend

Percona latest product announcements focus on Percona’s open source DBaaS preview, and new Percona Kubernetes Operators features.

During Percona Live ONLINE 2021, our experts will be discussing the preview of Percona’s 100% open source Database as a Service (DBaaS), which eliminates vendor lock-in and enables users to maintain control of their data. 

As an alternative to public cloud and large enterprise database vendor DBaaS offerings, this on-demand self-service option provides users with a convenient and simple way to deploy databases quickly. Using Percona Kubernetes Operators means it is possible to configure a database once and deploy it anywhere.

“The future of databases is in the cloud, an approach confirmed by the market and validated by our own customer research,” said Peter Zaitsev, co-founder and CEO of Percona. “We’re taking this one step further by enabling open source databases to be deployed wherever the customer wants them to run – on-premises, in the cloud, or in a hybrid environment. Companies want the flexibility of DBaaS, but they don’t want to be tied to their original decision for all time – as they grow or circumstances change, they want to be able to migrate without lock-in or huge additional expenses.”

The DBaaS supports Percona open source versions of MySQL, MongoDB, and PostgreSQL. 

Critical database management operations such as backup, recovery, and patching will be managed through the Percona Monitoring and Management (PMM) component of Percona DBaaS. 

PMM is completely open source and provides enhanced automation with monitoring and alerting to find, eliminate, and prevent outages, security issues, and slowdowns in performance across MySQL, MongoDB, PostgreSQL, and MariaDB databases.

Customer trials of Percona DBaaS will start this summer. Businesses interested in being part of this trial can register here.

Easy Deployment and Management with Kubernetes Operators from Percona

The Kubernetes Operator for Percona Distribution of PostgreSQL is now available in technical preview, making it easier than ever to deploy. This Operator streamlines the process of creating a database so that developers can gain access to resources faster, as well as then ongoing lifecycle management.

There are also new capabilities available in the Kubernetes Operator for Percona Server for MongoDB, which support enterprise mission-critical deployments with features for advanced data recovery. It now includes support for multiple shards, which provides horizontal database scaling, and allows for distribution of data across multiple MongoDB Pods. This is useful for large data sets when a single machine’s overall processing speed or storage capacity is insufficient. 

This Operator also allows Point-in-Time Recovery, which enables users to roll back the cluster to a specific transaction and time, or even skip a specific transaction. This is important when data needs to be restored to reverse a problem transaction or ransomware attack.

The new Percona product announcements will be discussed in more detail at our annual Percona Live ONLINE Open Source Database Conference 2021 starting today.

We hope you’ll join us! Register today to attend Percona Live ONLINE for free.

Register and Attend

May
10
2021
--

War Stories and Learning From Others – Percona Live

percona live

percona liveLessons Learned – Learning from those who blazed the trail!

Another cool thing I like about attending conferences is learning from how other companies and people overcame problems, how they run their systems, and figure out problems that I may run into in the future. Secretly I also want to validate how I have done things and make sure I did not miss something. ? Percona Live has a huge group of interesting speakers and users, customers, and companies are sharing tons of interesting war stories, how-to’s, and explaining some things they are very proud of! So what is the HOSS looking forward to hearing about?

SCALE!

Who is going to be sharing their tales of scale? 

Edmodo’s Pandemic Natarajan Chidhambharam & Miklos Szel will be sharing “A Tale of 25x Growth in Three Weeks”. Edmodo does online educational services/software, so when Covid hit their entire platform’s traffic skyrocketed within days. This is a tale of massive growth and dealing with growth you could never imagine.

Another talk to gain insight on that unexpected growth comes from Art van Scheppingen at MessageBird in his talk entitled:  “How to Cope With (Unexpected) Millupling of Your Workload?

While the pandemic caused load for many, high load often occurs with regular frequency for many companies.  Javi Santana from TinyBird is going to share with us his story of growth and scale in the talk: “How We Processed 12 Trillion Rows During Black Friday”.

But one time events or even recurring events are nothing compared to a constant flow of transactions.  To learn how to handle that you may want to see how a company like Adobe handles this workload.  Adobe’s Yeshwanth Vijayakumar will be giving us the details in his talk “How Adobe Does Millions of Records Per Second Using Apache Spark

The team over at Venmo/Payal (Kushal Shah, Neeraj Wadhwani, Van Pham, & Tianshi Wang) will also be sharing the secrets of how they scale in their talk: “Scaling Venmo’s Payments”. While not all of us run at this level of scale, understanding the issues they faced and how they resolved them can help many as they build what’s next.

Migrations! There and Back Again

Who doesn’t love a good story about moving to a new town, state, or in the case of companies new cloud provider!   If you are looking for information and stories from others’ cloud migration adventures look no further than the following:

BlaBlaCar’s Maxime Fouilleul is going to be delivering “Organize the Migration of a Hundred Database Clusters to the Cloud”.  Migrating a few databases is challenging, but hundreds or thousands of them is daunting.  

Sometimes the last part of a migration is the most challenging.  Box’s Jordan Moldow is going to share with us the challenges Box faced in finishing that large migration in his talk: The Last Mile: Delivering the Last 10% of a Four-Year Migration

Groupon recently moved their systems all to AWS.  Groupon’s Mani Subramanian will be sharing the ins and outs of this Journey in his talk  “MySQL & PostgreSQL Migration to AWS at Groupon

Database Operations Best Practices From the Experts

Finding out better ways to do our day-to-day jobs helps us find more time and get more efficient.  Good thing so many great companies are willing to share what has found works for them.

Upgrades Whether in the cloud or not are sometimes challenging. Ashwin Nellore & Kushal Shah are going to share “Venmo’s Aurora Upgrades With Open Source Tools” detailing how they approach and execute upgrades in an AWS environment.

Stephen Borg & Matthew Formosa (GiG) are going to talk about choosing the best tool or in this case database for the job in their talk  “Fun and Games: Why We Picked ClickHouse To Drive Gaming Analytics at GiG “.  Gaming analytics ( any analytics really ) remains a hot topic.  Learning why Clickhouse fits their needs should be interesting.  

When your website has millions of users and 24×7 requirements you have to set things up to scale and survive multiple outages.  Companies like Linkedin have spent lots of money and many hours solving many of these scale and availability challenges.  That is why listening in on Karthik Appigatla talking about “Multi-colo Async Replication at LinkedIn” is going to be very informative. Later on, Karthik is joined by Apoorv Purohit to discuss scaling Linkedin with Vitess

When you have large datasets and lots of individual databases, even something as simple as copying databases from one environment to the next can be a challenge. Nicolai Plum from Booking.com has some practical advice and tips in his talk:  The Many Ways to Copy Your Database.

Finally, keeping track of what’s happening in any server when you have hundreds more is a pain. That’s why I am excited to hear about how Rappi’s Daniel Guzman Burgos & Rodrigo Cadaval did this in their environment.  Their talk “Monitoring Hundreds of RDS PostgreSQL Instances with PMM: The Rappi Case” aims to provide us with practical ways to monitor those oversized database environments.

There is a lot to learn from those in the industry who are solving real problems at scale! These are just a few of the sessions and talks I am looking forward to learning from.

Not registered yet? There’s still time! Don’t miss it! 

May
07
2021
--

Cool New Projects, Technology, and Broadening My Horizons at Percona Live

Percona Live Yonkovit

Percona Live YonkovitI love learning from others and exploring the implementation details other database engines provide. Percona Live has no shortage of new databases, tools, and discussions on interesting new features or ways of tackling old problems. I went through the agenda and wanted to highlight a few new databases and tools (new to me or new to everyone) that have me interested!

  • Sachin Sinha, Author of BangDB is going to be talking about BangDB in his talk:  “Convergence of Different Dimensions within BangDB – A High-Performance Modern NoSQL Database”.  Talk about interesting implementations, let me leave you with a snippet from the talk abstract:

    The native integration at the buffer pool or IO layer will give the user full control of every single byte being ingested and processed by the system, which will reduce the latency to allow high-speed precision processing. Further siloed (semi siloed) architecture forces too many network hops along with too many copies of data. In this scenario, even with a very high processing efficiency, low latency (or high speed) is not possible with this architecture. We need to minimize network hops and copy of data as much as possible. With convergence, we minimize both the network hops and data copy, thereby improving the performance.”

 

  • Jim Tommaney returns to Percona live, this year talking about DuckDB.  His talk DuckDB: “Embedded Analytics with Parallel/Vector/Columnar Performance”. Robert Hodges and I talked a week ago about the analytics track and he said this was the talk he was most interested in hearing. Looking forward to it. 

 

 

 

  • Of course “What is OpenSearch?” presented by Kyle Davis is high on my list.  If you have been living under a rock or on an island with no internet connection you may not have seen all of the news around Elastic changing licenses and AWS forking Elastic Search.  Learning about the new project is a must for those who have used or are using Elastic.

 

  • Super excited to see “Docstore – Uber’s Highly Scalable Distributed SQL Database” by Ovais Tariq &  Himank Chaudhary.  Many large companies end up building their own databases and/or enhancing open source.  Some of the greatest database advances in the past 10 years came from a company thinking differently and solving their own problems with something new.  This is very interesting to me.

 

 

 

Not registered yet? There’s still time! Don’t miss it! 

May
06
2021
--

Percona Live Sessions for Engineers Working on Building and Developing Databases!

Percona Live

Percona LiveWe are in the “home stretch” for Percona Live and I am continuing to highlight some of the great content out there!  Today I would like to focus on content I think will be very interesting to the developers and contributors who are working on the core elements of open source database projects.  We have a ton of content on how to use, deploy, and maintain databases supporting applications, but this year we have some awesome content for those actively building the databases we all love and use.  So what sessions do I think our open source engineers will get the most out of?

On Wednesday, May 12th, how about these sessions:

  • David Zhao’s “Performance Comparison of MySQL and PostgreSQL Based on Kernel Level Analysis”  is a top pick for me from an engineering perspective. David is going to be talking about the differences and similarities in approaches to how “database kernel design and key algorithms” impact performance and troubleshooting bugs and other problems.
  • Valerii Kravchuk’s “Monitoring and Tracing MySQL or MariaDB Server With Bpftrace” is a must-see for developers, SREs, and DBAs alike! Finding bugs, regressions, and performance issues in software is a critical problem all engineering professionals must face. In this talk, you will get a taste of how the Bug & Support King himself Valerii uses Bpftrace to diagnose and find those troublesome issues.
  • Vadim Tkachenko’s talk on “Creating Chaos In Databases” is dive into QA and testing for problems in some of Percona’s core software.   In many modern engineering organizations testing automation and optimization is critical.  Vadim will be sharing some tips and tricks on testing our Kubernetes operators.
  • Sergey Pronin’s talk on “Percona XtraDB Cluster Operator – Architecture Decisions” will walk through the design and implementation challenges the engineering team at Percona had when building our Operators.   If you are developing or contributing to an OSS DB project odds are you’re going to have to work on deployments on Kubernetes sooner rather than later.  Learn about the trade-offs, decisions, and steps taken when building out an operator.
  • Ming Zhang from PingCap is bringing us the background and details on “How We Built a Geo-Distributed Database With Low Latency”. Building large scalable databases is a challenge, and learning from others is critical.

On Thursday, May 13th, how about these sessions:

  • Vladimir Ozerov will be delivering a talk on “Building Cost-Based Query Optimizers With Apache Calcite”.  Query optimizers in databases are a core feature and often one of the most important features when it comes to scale and performance.  Learning about a new approach to an old problem is always open.
  • Percona’s Sanja Bonic & Lenz Grimmer will be talking on “Default to Open: Steps and Traps”.  Can you run your engineering processes and teams completely in the open?  Can you have full transparency?  This should be a good talk.  It does have a special guest moderator… not saying who.
  • Keao Yang will be talking on the session entitled: “Test Applications’ Storage Stability by Injecting Storage Errors”.  This is interesting in many regards because hardware issues are so transient they are often hard to reproduce.  Having tests to automate some hardware errors is interesting.  
  • Wenbo Zhang will be talking about “How to Develop BPF Tools with libbpf + BPF CO-RE” this gets back to diagnosing problems and tracing back performance issues. Regression testing and slowdowns are the banes of many developers and engineering teams.
  • Kyle Davis from AWS is going to be explaining “How to Contribute to a Big, Complex Open Source Project”.  Getting and merging contributions from the community is critical for open source projects.  As developers and engineers on open source projects understanding the user experience for contributions is critical.   
  • Yuvraaj Kelkar &  Mehboob Alam are delivering the talk “Crave for Speed? Accelerating Open-Source Project Builds”  – Crave is a product that is about accelerating the build process and moving faster. These are challenges. It will be interesting to see how deep this gets, but the topic is super interesting.
  • Steve Shaw (Intel)  is going to be talking about “HammerDB: A Better Way to Benchmark Your Open Source Database”. I love benchmarking, I love speed, and I love performance. Setting up consistent performance testing helps track and identify bottlenecks created by new code releases is critical.
  • Karthik Ranganathan CTO at Yugabyte is delivering “Extending PostgreSQL to a Google Spanner Architecture”.  I had Karthik on a few weeks ago on the HOSS talks FOSS podcast and it was a great conversation.  I am very excited about this topic and listening in on the implementation details of this effort.  Today more than ever, learning and integrating ideas, technology, and other components from other open source projects can accelerate your own development efforts.  
  • Ovais Tariq &  Himank Chaudhary from Uber are talking on “Docstore – Uber’s Highly Scalable Distributed SQL Database”. Uber built their own database, why? How? The team will be talking about the work they did and lessons they learned. 

Those are the Hoss’s picks for more developer/engineering-focused picks for Percona Live!

Not registered yet? There’s still time! Don’t miss it! 

Apr
29
2021
--

Percona Live ONLINE: How Companies Manage Customer & Data Volume Rises While Staying Performant

Percona Live ONLINE

Percona Live ONLINEThe latest schedule for Percona Live ONLINE 2021, taking place on May 12-13, allows you to filter by track to identify the sessions that interest you most!

The conference takes place May 12-13 and features more than 150 expert speakers and over 160 hours of content across 200 sessions, representing dozens of projects, communities, and tech companies.

Register and attend for FREE

This week we are highlighting presentations that focus on many Percona customers’ experiences of successfully managing the turbulent times of the worldwide pandemic, and the challenges it brought to those responsible for managing data.

– “Zoned Namespaces for the Next Era in Application PerformanceWestern Digital, a leader in data infrastructure, gives a joint presentation with Percona on Zoned Namespaces technology. This will detail how Zoned Namespaces can solve the next wave of data growth and help push the limits of database performance at scale.

– “Push-button deploy MongoDB with AnsibleFiserv, a leading global provider of payments and financial services technology, describes how it tackled the challenge of deploying a large number of MongoDB servers – creating a push-button approach to deploy sharded MongoDB clusters and replica sets using Ansible. This helps the company serve millions of people and businesses in a world that never powers down.

– “MySQL & PostgreSQL Migration to AWS at GrouponGroupon explains how it migrated its databases to the cloud without disruption and the preparation plans for the onboarding and cutover process, with special consideration for the differences between running in the data center compared with on AWS.

– “Multi-colo Async Replication at LinkedIn” Maintaining and keeping its member data synchronized at multiple data centers requires reliable asynchronous replication globally. LinkedIn will discuss its needs for open source to handle its requirements.

Docstore – Uber’s Highly Scalable Distributed SQL DatabaseUber manages massive amounts of data and the real-time service has no tolerance for downtime. Uber will review the general-purpose multi-model database it created to serve high-volume workloads.

Venmo’s Aurora Upgrades With Open Source Tools” The leading mobile payment service relies on database clusters to handle high volumes of user transactions while maintaining performance and remaining always available. But, what happens when it comes time for major updates to its database clusters? Venmo shows how it uses a number of open source software products to make the process smooth, with no disruptions to service.

Free, just like our software, Percona Live ONLINE is a community-focused event for database developers, administrators, and decision-makers to share their knowledge and experiences.

Attracting thousands of attendees from around the globe, Percona Live is the longest-running, largest, independent conference dedicated to the vast ecosystem of open source database technologies.

We hope you’ll join us! Register today to attend Percona Live ONLINE for free.

Register and attend for FREE

Apr
22
2021
--

Percona Live ONLINE: Keynotes Now Live!

Percona Live Keynotes

Percona Live KeynotesThe full keynotes schedule for Percona Live ONLINE 2021, taking place on May 12-13, is now live online!

Attracting thousands of attendees from around the globe, Percona Live is the longest-running, largest, independent conference dedicated to the vast ecosystem of open source database technologies.

The conference keynote sessions are split over two days. They will highlight the business implications of open source software and the internal and external forces changing the face of the market.

Register and attend for FREE

First Day Keynotes

Wednesday, May 12 (from 12pm EDT) 

Peter Zaitsev, CEO of Percona, will deliver a state of the market address discussing the growth of open source database software adoption and how cloud providers and open source licensing are changing the market.

Amanda Brock, CEO of OpenUK, will present her extensive research on the business of open source, including revenue models and legal considerations. She will offer her perspective on the past decade and her optimistic view of where the open source ecosystem is heading.

Tidelift Co-founder and General Counsel, Luis Villa, will give his opinion on open source software licenses and what the future holds. 

Second Day Keynotes

Thursday, May 13 (from 12pm EDT) 

Kaj Arnö, CEO of the MariaDB Foundation, will discuss often overlooked communication and collaboration in open source projects and software development.

Frédéric Descamps, Community Manager for MySQL at Oracle, will talk about the latest features in MySQL 8.0.

Patrick McFaddin, Vice President of Developer Relations at Datastax, will examine the transformation to cloud-native architectures that require unprecedented levels of scale, along with the impact on job roles. 

Finally, Dor Laor, CEO of ScyllaDB, will review the six years of development of ScyllaDB and future plans.

Free, just like our software, Percona Live ONLINE features more than 150 expert speakers and over 160 hours of content across 200 sessions, representing dozens of projects, communities, and tech companies.

We hope you’ll join us! Register today to attend Percona Live ONLINE for free.

Register and attend for FREE

Apr
14
2021
--

Percona Live ONLINE: Focus on Unprecedented Demand for Scale

Percona Live 2021

Percona Live 2021The full conference schedule for Percona Live ONLINE 2021, taking place on May 12-13, is now live! 

This year we see a strong focus on how organizations successfully managed the technology challenges of the past 12 months.

Register and attend for FREE!

A number of presentations demonstrate the key role databases played in achieving company success over the last year. If you are interested in ensuring your databases meet current and future business demands, we have highlighted some of the sessions you won’t want to miss!

A Tale of 25x Growth in Three Weeks” 

Discover what happened when Edmodo’s database queries per second (QPS) went from 200,000 to five million in just three weeks, as remote learning accelerated during the worldwide pandemic! Edmodo is an education company with 100 million teacher and student users around the world. 

How We Processed 12 Trillion Rows During Black Friday” 

TinyBird’s retail client performed real-time analytics on all its sales data during Black Friday. This presentation discusses how this was possible and what they found out.

Migration of a Hundred Database Clusters to the Cloud” 

The engineering manager of BlaBlaCar, the French online marketplace for carpooling, with 70 million users across 22 countries, reveals how they turned this massive project into a success story.

Scaling Venmo’s Payments” 

The leading mobile payment service details the critical scaling challenges which arose during the COVID-19 pandemic and how they resolved them.

The Real Costs and Benefits of Open Source Database Adoption” 

Percona enterprise architect Michal Nosek examines the ability of open source databases to handle mission-critical workloads and looks at the total cost of ownership (TCO).

Percona Live ONLINE 2021 also features speakers from Groupon, HubSpot, LinkedIn, PayPal, Shopify, and Uber.

Percona Live is the longest-running, largest, independent conference dedicated to the vast ecosystem of open source database technologies. MariaDB, MySQL, PostgreSQL, MongoDB, Percona, Elastic, Clickhouse, and others will all be represented during the two-day event.

Free, just like our software, Percona Live ONLINE 2021 will feature:

  • Over 100 expert speakers
  • More than 150 sessions
  • Dozens of projects, communities, and tech companies
  • 160+ hours of content.
  • Thousands of attendees from around the world

We hope you’ll join us! Register here to attend Percona Live ONLINE for free.

Look out for our keynote schedule, which will be announced next week.

Apr
06
2021
--

What’s New at Percona Live ONLINE 2021?

Percona Live 2021

Percona Live 2021Percona Live ONLINE is a community-focused event for database developers, administrators, and decision-makers to share their knowledge and experiences.

We would love you to join us on May 12-13 for this year’s event.

Register and attend for FREE!

This year Percona Live is going to be our biggest and best event yet! 

Free, just like our software, Percona Live ONLINE 2021 will feature:

  • Over 100 expert speakers
  • More than 150 sessions
  • Dozens of projects, communities, and tech companies
  • 160+ hours of content.
  • Thousands of attendees from around the world

Percona Live is the longest-running, largest, independent conference dedicated to the vast ecosystem of open source database technologies. MariaDB, MySQL, PostgreSQL, MongoDB, Percona, Elastic, Clickhouse, and others will all be represented during the two-day event.

Speakers from Groupon, HubSpot, LinkedIn, PayPal, Shopify, and Uber have signed up to discuss their experiences using leading open source database technologies, including MongoDB, MySQL, PostgreSQL, and Percona software.

The conference provides a unique opportunity for both beginners and experts to hear from many of the largest organizations in the world, and learn from people who solve some of the biggest database challenges.

This year hot topics include:

  • Serverless deployments;
  • Bringing cloud-native to the database space – tips, tricks, and training around running databases on Kubernetes;
  • Learning how to automate database setups and testing in CI/CD pipelines;
  • Building scalable geographically distributed databases;
  • Advice on securing and locking down infrastructure data.

This year, Percona Live ONLINE has added Community Rooms, organized and hosted by community projects, technologies, or by theme, allowing attendees to meet and discuss topics and issues relevant to their mission.

The full agenda is now live here and registration is free at Percona Live ONLINE.

We look forward to seeing you there!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com