Percona Live Europe 2018 Call for Papers is Now Open

Percona Live Europe Open Source Database Conference PLE 2018

Percona Live Europe Open Source Database Conference PLE 2018Announcing the opening of the Percona Live Europe Open Source Database Conference 2018 in Frankfurt, Germany call for papers. It will be open from now until August 10, 2018.

Our theme this year is
Connect. Accelerate. Innovate.

As a speaker at Percona Live Europe, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE!

Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live Europe in showcasing thought leading practices in the open source database world.

With a nod to innovation, for the first time, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits.

Speaking Opportunities

The Percona Live Europe Open Source Database Conference 2018 Call for Papers is open until August 10, 2018. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry level or of general interest to all), Core (intermediate) and Masterclass (advanced) levels.

If selected, you will receive a complimentary full conference pass.

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

Topics and Tracks

We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI?

Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies, as well as new and interesting ways to monitor and manage database environments—what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers.

In the submission entry you will be asked to indicate which of these tracks your proposal best fits: tutorial, business needs; case studies/use cases; operations; or developer.

A few ideas

The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including:

  • Open source – Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI?
  • Security – All of us have experienced security challenges. Whether they are initiated by legislature (GDPR), bugs (Meltdown/Spectre), experience (external attacks) or due diligence (planning for the worst), when do you have ‘enough’ security? Are you finding that security requirements are preventing your ability to be agile?
  • Serverless, Cloud or On-Premise – The technology landscape is no longer a simple one, and mixing infrastructures has almost become the norm. Are you designing data architectures for the new landscape, and eager to share your experience? Have microservices become an important part of your plans?
  • MySQL – Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release get you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment?
  • MongoDB – How has the 3.6 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment?
  • PostgreSQL – Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your tasks? Why and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment?
  • SQL, NewSQL, NoSQL – It’s become a perennial question without an easy answer. How do databases compare, how do you choose the right technology for the job, how do you trade off between features and their benefits in comparing databases? If you have ever tried a hybrid database approach in a single application, how did that work out? How nicely does MongoDB play with MySQL in the real world? Do you have anything to say about using SQL with NoSQL databases?
  • High Availability – What choices are you making to ensure high availability? How do you find the balance between redundancy and cost? Are you using hot backups, and if so, what happened when you needed to rollback on them?
  • Scalability – When did you recognize you needed to address data scale? Did your data growth take you by surprise or were you always in control? Did it take a degradation in performance to get your management to sit up and take notice? How do you plan for scale if you can’t predict demand?
  • What the Future Holds – What do you see as the “next big thing”? What new and exciting features are going to be released? What’s in your next release? What new technologies will affect the database landscape? AI? Machine learning? Blockchain databases? Let us know about innovations you see on the way.

How to respond to the call for papers

For information on how to submit your proposal visit our call for papers page. The conference web pages will be updated throughout the next few weeks and bios, synopsis and slides will be published on those pages after the event.


If you would like to obtain a sponsor pack for Percona Live Europe Open Source Database Conference 2018, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly.

The post Percona Live Europe 2018 Call for Papers is Now Open appeared first on Percona Database Performance Blog.


Faster Point In Time Recovery (PITR) in PostgreSQL Using a Delayed Standby

PostgreSQL Point in Time Recovery

PostgreSQL Point in Time RecoveryThe need to recover a database back to a certain point in time can be a nerve-racking task for DBAs and for businesses. Can this be simplified? Could it be made to work faster? Can we recover to a given point in time with zero loss of transactions/records? Fortunately, the answer to these questions is yes. PostgreSQL Point in Time Recovery (PITR) is an important facility. It offers DBAs the ability to restore a PostgreSQL database simply, quickly and without the loss of transactions or data.

In this post, we’ll help you to understand how this can be achieved, and reduce the potential for pain in the event of panic situations where you need to perform a PITR.

Before proceeding further, let us understand what could force us to perform a PITR.

  1. Someone has accidentally dropped or truncated a table.
  2. A failed deployment has made changes to the database that are difficult to reverse.
  3. You accidentally deleted or modified a lot of data, and as a consequence you cannot run your applications.

In such scenarios, you would immediately look for the latest full backup and the relevant transaction logs (aka WALs in PostgreSQL) to recover up to a known point in the past, before the error occurred. But what if your backup is corrupt and not valid?

Well, it is very important to perform a backup and recovery validation to ensure that the backups are always recoverable—we will address this in a future post. But, if the backup that you are looking at is corrupt, that can be a nightmare. One such unlucky incident for GitLab, where there was a backup restoration failure, caused a major outage followed by a data loss after recovery.


Even the best of plans can be hard to realize in practice.

It may be that our backups are intact and recoverable. Can we afford to wait until we copy/download the backup and recover it to another disk or server? What if the database size is several hundreds of GBs or several TBs like GitLab’s?

The solution to the problem is: add another standby that is always delayed by a few hours or a day.

This is one of the great features available in PostgreSQL. If you have migrated from Oracle RDBMS to PostgreSQL, you can think of it as an equivalent to FLASHBACK DATABASE in Oracle. Flashback database helps you to rewind data back in time. However, the technique does not work if you have dropped a data file. In fact, this is the case for both Oracle RDBMS and PostgreSQL PITR. ?


Adding a Delayed Standby in PostgreSQL

It is important that we use features like streaming replication to achieve high availability in PostgreSQL. Most of the environments have 1 master with 1 or more slaves (standby), either in the same data centre or geographically distributed. To save the time needed for PITR, you can add another slave that can always be delayed by a certain amount of time—this could be hours or days.

For example, if I know that my deployment is determined to be successful when no issues are observed in the first 12 hours, then I might delay one of the standbys by 12 hours.

To delay a standby, once you have setup streaming replication between your PostgreSQL master and slave, the following parameter needs to be added to the recovery.conf file of the slave, followed by a restart.

recovery_min_apply_delay = '12h' # or '1min' or 1d'

Now, let’s consider an example where you have inserted 10000 records at 10:27:34 AM and you have accidentally deleted 5000 records at 10:28:43 AM. Let’s say that you have a standby that is delayed by 1 hour. The steps to perform PITR using the delayed standby through until 10:27:34 AM look like this:

Steps to perform PostgreSQL Point in Time Recovery using a delayed standby

Step 1

Stop the slave (delayed standby) immediately, as soon as you have noticed that an accidental change has happened. If you know that the change has been already applied on the slave, then you cannot perform the point in time recovery using this method.

$ pg_ctl -D $PGDATA stop -mf

Step 2

Rename the recovery.conf file in your standby to another name.

$ mv $PGDATA/recovery.conf $PGDATA/recovery.conf.old

Step 3

Create a new recovery.conf file with the required parameters for PITR.

# recovery.conf file always exists in the Data Directory of Slave
recovery_target_time = '2018-06-07 10:27:34 EDT'
restore_command = 'sh /var/lib/pgsql/scripts/restore_command_script.sh %p %f'
recovery_target_action = 'pause'
recovery_target_inclusive = 'false'


Specifies the timestamp up to which you wish to recover your database.


Shell command that can be used by PostgreSQL to fetch the required Transaction Logs (WALs) for recovery.
PostgreSQL sends the arguments %p (path to WAL file) and % f (WAL file name) to this shell command. These arguments can be used in the script you use to copy your WALs.

Here is an example script for your reference. This example relies on rsync. The script connects to the backup server to fetch the WALs requested by PostgreSQL. (We’ll cover the procedure to archive these WALs in another blog post soon: this could be a good time to subscribe to the Percona blog mailing list!)

$ cat /var/lib/pgsql/scripts/restore_command_script.sh
# Enable passwordless ssh to Backup Server
# $1 is %p substituted by postgres as the path to WAL File
# $2 is the %f substituted by postgres as the WAL File Name
rsync --no-motd -ave ssh ${Backup_Server}:${ArchiveDir}/${wal} ${wal_with_path} >>$LOG 2>&1
if [ "$?" -ne "0" ]
echo "Restore Failed for WAL : $wal" >> $LOG
exit 1


This is the action that needs to be performed after recovering the instance up to the recovery_target_time. Setting this to pause would let you modify the recovery_target_time after recovery, if you need to. You can then replay the transactions at a slow pace until your desired recovery target is reached. For example, you can recover until 2018-06-07 10:26:34 EDT and then modify recovery_target_time to 2018-06-07 10:27:34 EDT when using pause.

When you know that all the data you are looking for has been recovered, you can issue the following command to stop the recovery process, change the timeline and open the database for writes.

select pg_wal_replay_resume();

Other possible settings for this parameter are promote and shutdown. These do not allow you to replay a few more future transactions after the recovery, as you can with pause.


Whether to stop recovery just after the specified recovery_target_time(true) or before(false).

Step 4

Start PostgreSQL using pg_ctl. Now, it should read the parameters in recovery.conf and perform the recovery until the time you set in the recovery_target_time.

$ pg_ctl -D $PGDATA start

Step 5

Here is how the log appears. It says that has performed point-in-time-recovery and has reached a consistent state as requested.

2018-06-07 10:43:22.303 EDT [1423] LOG: starting point-in-time recovery to 2018-06-07 10:27:34-04
2018-06-07 10:43:22.607 EDT [1423] LOG: redo starts at 0/40005B8
2018-06-07 10:43:22.607 EDT [1423] LOG: consistent recovery state reached at 0/40156B0
2018-06-07 10:43:22.608 EDT [1421] LOG: database system is ready to accept read only connections
2018-06-07 10:43:22.626 EDT [1423] LOG: recovery stopping before commit of transaction 570, time 2018-06-07 10:28:59.645685-04
2018-06-07 10:43:22.626 EDT [1423] LOG: recovery has paused
2018-06-07 10:43:22.626 EDT [1423] HINT: Execute pg_wal_replay_resume() to continue.

Step 6

You can now stop recovery and open the database for writes after PITR.

Before executing the next command, you may want to verify that you have got all the desired data by connecting to the database and executing some SQL’s. You can still perform reads before you stop recovery. If you notice that you need another few minutes (or hours) of transactions, then modify the parameter recovery_target_time and go back to step 4. Otherwise, you can stop the recovery by running the following command.

$ psql
select pg_wal_replay_resume();

Summing up

Using PostgreSQL Point in time Recovery is the most simple of procedures that does not involve any effort in identifying the latest backups, transaction logs and space or server to restore in a database emergency. These things happen! Also, it could save a lot of time because the replay of WALs is much faster than rebuilding an entire instance using backups, especially when you have a huge database.

Important post script: I tested and recorded these steps using PostgreSQL 10.4. It is possible with PostgreSQL 9.x versions, however, the parameters could change slightly and you should refer to the PostgreSQL documentation for the correct syntax.

The post Faster Point In Time Recovery (PITR) in PostgreSQL Using a Delayed Standby appeared first on Percona Database Performance Blog.


MongoDB transactions: your very first transaction with MongoDB 4.0

MongoDB 4.0 transactions

MongoDB 4.0MongoDB 4.0 transactions is just around the corner and with rc0 we can get a good idea of what we can expect in the GA version. MongoDB 4.0 will allow transactions to run in a replica set and, in a future release, the MongoDB transaction will be cluster-wide. This is a really big change!

Multi-statement transactions are a big deal for a lot of companies. The transactions feature has been in development since MongoDB version 3.6 when sessions were added. Now, we will be able to see how both sessions and transactions work. In an earlier blog post we highlighted a few details from what was delivered in 3.6 that indicated that 4.0 would have transactions.

There are a few limitations for transactions and some operations are not allowed yet. A detailed list can be found in the MongoDB documentation of the Session.startTransaction() method.

One restriction that we must be aware of is that the collection MUST exist in order to use transactions.

A simple transaction will be declared in a very similar way to that we use for other databases. The caveat is that we need to start a session before starting a transaction. This means that multi-statement transactions are not the default behavior to write to the database.

How to use transactions in MongoDB 4.0

Download MongoDB 4.0 RC (or you can install it from the repositories).

wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1604-4.0.0-rc1.tgz

Uncompress the files:

tar -xvzf mongodb-linux-x86_64-ubuntu1604-4.0.0-rc1.tgz

Rename the folder to mongo4.0 and create the data folder inside of the bin folder:

mv mongodb-linux-x86_64-ubuntu1604-4.0.0-rc1 mongo4.0
cd mongo4.0
cd bin
mkdir data

Start the database process:
Important: in order to have multi-statement transactions replica-set must be enabled

./mongod --dbpath data --logpath data/log.log --fork --replSet foo

Initialize the replica-set:

> rs.initiate()
foo:Primary> use percona
foo:Primary> db.createCollection('test')

Start a session and then a transaction:

session = db.getMongo().startSession()
session.getDatabase("percona").test.insert({today : new Date()})
session.getDatabase("percona").test.insert({some_value : "abc"})

Then you can decide whether to commit the transaction or abort it:


If the startTransaction throws the IllegalOperation error, make sure the database is running with replica set.

Transaction isolation level in in MongoDB 4.0: Snapshot Isolation

MongoDB 4.0 implements snapshot isolation for the transactions. The pending uncommitted changes are only visible inside the session context (the session which has started the transaction) and are not visible outside. Here is an example:

Connection 1:

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY>  db.createCollection('test')
        "ok" : 1,
        "operationTime" : Timestamp(1528903182, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1528903182, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("bdd82af7-ab9d-4cd3-9238-f08ee928f31e") }
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").test.insert({today : new Date()})
WriteResult({ "nInserted" : 1 })
foo:PRIMARY> session.getDatabase("percona").test.insert({some_value : "abc"})
WriteResult({ "nInserted" : 1 })

Connection 2: starting second transaction in its own session:

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.test.find()
foo:PRIMARY> db.test.find()
foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("eb628bfd-425e-450c-a51b-733435474eaa") }
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").test.find()

Connection 1: commit

foo:PRIMARY> session.commitTransaction()

Connection 2: after connection1 commits:

foo:PRIMARY> db.test.find()
{ "_id" : ObjectId("5b21361252bbe6e5b9a70a4e"), "today" : ISODate("2018-06-13T15:19:46.645Z") }
{ "_id" : ObjectId("5b21361252bbe6e5b9a70a4f"), "some_value" : "abc" }

Outside of the session it sees the new values, however inside the opened session it will not see the new values.

foo:PRIMARY> session.getDatabase("percona").test.find()

Now if we commit the transaction inside connection 2 it will commit as well, and we will have 2 rows now (as there are no conflicts).

Sometimes, however, we may see the transient transaction error when committing or even doing find() inside a session:

foo:PRIMARY> session.commitTransaction()
2018-06-14T21:56:29.111+0000 E QUERY    [js] Error: command failed: {
        "errorLabels" : [
        "operationTime" : Timestamp(1529013385, 1),
        "ok" : 0,
        "errmsg" : "Transaction 0 has been aborted.",
        "code" : 251,
        "codeName" : "NoSuchTransaction",
        "$clusterTime" : {
                "clusterTime" : Timestamp(1529013385, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
} :

From the MongoDB doc we can read that we could retry the transaction back when we have this error.

If an operation encounters an error, the returned error may have an errorLabels array field. If the error is a transient error, the errorLabels array field contains “TransientTransactionError” as an element and the transaction as a whole can be retried.

MongoDB transactions: conflict

What about transaction conflicts in MongoDB? Let’s say we are updating the same row. Here is the demo:

First we create a record, trx, in the collection:

use percona
db.test.insert({trx : 0})

Then we create session1 and update trx to change from 0 to 1:

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("0b7b8ce0-919a-401a-af01-69fe90876301") }
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").test.update({trx : 0}, {trx: 1})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Then (before committing) create another session which will try to change from 0 to 2:

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("b312c662-247c-47c5-b0c9-23d77f4e9f6d") }
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").test.update({trx : 0}, {trx: 2})
        "errorLabels" : [
        "operationTime" : Timestamp(1529675754, 1),
        "ok" : 0,
        "errmsg" : "WriteConflict",
        "code" : 112,
        "codeName" : "WriteConflict",
        "$clusterTime" : {
                "clusterTime" : Timestamp(1529675754, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)

As we can see, MongoDB catches the conflict and return the error on the insert (even before the commit).

We hope this post, with its simple example how transactions will work, has been useful. Feedback is welcome: you can comment here, catch Adamo on twitter @AdamoTonete or talk to the team at @percona.

The post MongoDB transactions: your very first transaction with MongoDB 4.0 appeared first on Percona Database Performance Blog.


Chunk Change: InnoDB Buffer Pool Resizing

innodb buffer pool chunk size

Since MySQL 5.7.5, we have been able to resize dynamically the InnoDB Buffer Pool. This new feature also introduced a new variable — innodb_buffer_pool_chunk_size — which defines the chunk size by which the buffer pool is enlarged or reduced. This variable is not dynamic and if it is incorrectly configured, could lead to undesired situations.

Let’s see first how innodb_buffer_pool_size , innodb_buffer_pool_instances  and innodb_buffer_pool_chunk_size interact:

The buffer pool can hold several instances and each instance is divided into chunks. There is some information that we need to take into account: the number of instances can go from 1 to 64 and the total amount of chunks should not exceed 1000.

So, for a server with 3GB RAM, a buffer pool of 2GB with 8 instances and chunks at default value (128MB) we are going to get 2 chunks per instance:

This means that there will be 16 chunks.

I’m not going to explain the benefits of having multiple instances, I will focus on resizing operations. Why would you want to resize the buffer pool? Well, there are several reasons, such as:

  • on a virtual server you can add more memory dynamically
  • for a physical server, you might want to reduce database memory usage to make way for other processes
  • on systems where the database size is smaller than available RAM
  • if you expect a huge growth and want to increase the buffer pool on demand

Reducing the buffer pool

Let’s start reducing the buffer pool:

| innodb_buffer_pool_size | 2147483648 |
| innodb_buffer_pool_instances | 8     |
| innodb_buffer_pool_chunk_size | 134217728 |
mysql> set global innodb_buffer_pool_size=1073741824;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
| Variable_name           | Value      |
| innodb_buffer_pool_size | 1073741824 |
1 row in set (0.00 sec)

If we try to decrease it to 1.5GB, the buffer pool will not change and a warning will be showed:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
| Level   | Code | Message                                                                         |
| Warning | 1210 | InnoDB: Cannot resize buffer pool to lesser than chunk size of 134217728 bytes. |
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
| Variable_name           | Value      |
| innodb_buffer_pool_size | 2147483648 |
1 row in set (0.01 sec)

Increasing the buffer pool

When we try to increase the value from 1GB to 1.5GB, the buffer pool is resized but the requested innodb_buffer_pool_size is considered to be incorrect and is truncated:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
| Level   | Code | Message                                                         |
| Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '1610612736' |
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
| Variable_name           | Value      |
| innodb_buffer_pool_size | 2147483648 |
1 row in set (0.01 sec)

And the final size is 2GB. Yes! you intended to set the value to 1.5GB and you succeeded in setting it to 2GB. Even if you set 1 byte higher, like setting: 1073741825, you will end up with a buffer pool of 2GB.

mysql> set global innodb_buffer_pool_size=1073741825;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_%size' ;
| Variable_name                 | Value      |
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 2147483648 |
2 rows in set (0.01 sec)

Interesting scenarios

Increasing size in the config file

Let’s suppose one day you get up willing to change or tune some variables in your server, and you decide that as you have free memory you will increase the buffer pool. In this example, we are going to use a server with 

innodb_buffer_pool_instances = 16

  and 2GB of buffer pool size which will be increased to 2.5GB

So, we set in the configuration file:

innodb_buffer_pool_size = 2684354560

But then after restart, we found:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
| Variable_name                 | Value      |
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 4294967296 |
2 rows in set (0.00 sec)

And the error log says:

2018-05-02T21:52:43.568054Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 16, chunk size = 128M

So, after we have set innodb_buffer_pool_size in the config file to 2.5GB, the database gives us a 4GB buffer pool, because of the number of instances and the chunk size. What the message doesn’t tell us is the number of chunks, and this would be useful to understand why such a huge difference.

Let’s take a look at how that’s calculated.

Increasing instances and chunk size

Changing the number of instances or the chunk size will require a restart and will take into consideration the buffer pool size as an upper limit to set the chunk size. For instance, with this configuration:

innodb_buffer_pool_size = 2147483648
innodb_buffer_pool_instances = 32
innodb_buffer_pool_chunk_size = 134217728

We get this chunk size:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
| Variable_name                 | Value      |
| innodb_buffer_pool_chunk_size | 67108864   |
| innodb_buffer_pool_size       | 2147483648 |
2 rows in set (0.00 sec)

However, we need to understand how this is really working. To get the innodb_buffer_pool_chunk_size it will make this calculation: innodb_buffer_pool_size / innodb_buffer_pool_instances with the result rounded to a multiple of 1MB.

In our example, the calculation will be 2147483648 / 32 = 67108864 which 67108864%1048576=0, no rounding needed. The number of chunks will be one chunk per instance.

When does it consider that it needs to use more chunks per instance? When the difference between the required size and the innodb_buffer_pool_size configured in the file is greater or equal to 1MB.

That is why, for instance, if you try to set the innodb_buffer_pool_size equal to 1GB + 1MB – 1B you will get 1GB of buffer pool:

innodb_buffer_pool_size = 1074790399
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:26:43.328313Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 16, chunk size = 64M

But if you set the innodb_buffer_pool_size equals to 1GB + 1MB you will get 2GB of buffer pool:

innodb_buffer_pool_size = 1074790400
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:25:48.204032Z 0 [Note] InnoDB: Initializing buffer pool, total size = 2G, instances = 16, chunk size = 64M

This is because it considers that two chunks will fit. We can say that this is how the InnoDB Buffer pool size is calculated:

  if innodb_buffer_pool_size / innodb_buffer_pool_instances < innodb_buffer_pool_chunk_size
    innodb_buffer_pool_chunk_size = roundDownMB(innodb_buffer_pool_size / innodb_buffer_pool_instances)
  innodb_buffer_amount_chunks_per_instance = roundDown(innodb_buffer_pool_size / innodb_buffer_pool_instances / innodb_buffer_pool_chunk_size)
  if innodb_buffer_amount_chunks_per_instance * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size - innodb_buffer_pool_size > 1024*1024
innodb_buffer_pool_size = innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size * innodb_buffer_amount_chunks_per_instance

What is the best setting?

In order to analyze the best setting you will need to know that there is a upper limit of 1000 chunks. In our example with 16 instances, we can have no more than 62 chunks per instance.

Another thing to consider is what each chunk represents in percentage terms. Continuing with the example, each chunk per instance represent 1.61%, which means that we can increase or decrease the complete buffer pool size in multiples of this percentage.

From a management point of view, I think that you might want to consider at least a range of 2% to 5% to increase or decrease the buffer. I performed some tests to see the impact of having small chunks and I found no issues but this is something that needs to be thoroughly tested.

The post Chunk Change: InnoDB Buffer Pool Resizing appeared first on Percona Database Performance Blog.


Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s)

scale high availability

scale high availability
Please join Percona’s CEO, Peter Zaitsev as he presents MySQL: Scaling and High Availability – Production Experience Over the Last Decade(s) on Tuesday, June 19th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).


Percona is known as the MySQL performance experts. With over 4,000 customers, we’ve studied, mastered and executed many different ways of scaling applications. Percona can help ensure your application is highly available. Come learn from our playbook, and leave this talk knowing your MySQL database will run faster and more optimized than before.

Register Now

About Peter Zaitsev, CEO

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 140 professionals in 30 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups. Percona was named to the Inc. 5000 in 2013, 2014, 2015 and 2016.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Database Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization Volume 1 is one of percona.com’s most popular downloads. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

The post Webinar Tues 19/6: MySQL: Scaling and High Availability – Production Experience from the Last Decade(s) appeared first on Percona Database Performance Blog.


What is the Top Cause of Application Downtime Today?

Application outages lurking monster

Application outages lurking monsterI frequently talk to our customer base about what keeps them up at night. While there is a large variance of answers, they tend to fall into one of two categories. The first is the conditioned fear of some monster lurking behind the scenes that could pounce at any time. The second, of course, is the actual monster of downtime on a critical system. Ask most tech folks and they will tell you outages seem to only happen late at night or early in the morning. And that they do keep them up.

Entire companies and product lines have been built around providing those in the IT world with some ability to sleep at night. Modern enterprises have spent millions to mitigate the risk and prevent their businesses from having a really bad day because of an outage. Cloud providers are attuned to the downtime dilemma and spend lots of time, money, and effort to build in redundancy and make “High Availability” (HA) as easy as possible. The frequency of “hardware” or server issues continues to dwindle.

Where does the downtime issue start?

In my discussions, most companies I have talked to say their number one cause of outages and customer interruptions is ultimately related to the deployment of new or upgraded code. Often I hear the operations team has little or no involvement with an application until it’s put into production. It is a bit ironic that this is also the area where companies tend to drastically under-invest. They opt instead to invest in ways to “Scale Out or Up”. Or perhaps how to survive asteroids hitting two out three of their data centers.

Failing over broken or slow code from one server to another does not fix it. Adding more servers to distribute the load can mitigate a problem, but can also escalate the cost dramatically. In most cases, the solutions they apply don’t address the primary cause of the problems.

While there are some fantastic tools out there that can help with getting better visibility into code level issues — such as New Relic, AppDynamics and others — the real problem is that these often end up being used to diagnose issues after they have appeared in production. Most companies carry out some amount of testing before releasing code, but typically it is a fraction of what they should be doing. Working for a company that specializes in open source databases, we get a lot of calls on issues that have prevented companies’ end users from using critical applications. Many of these problems are fixable before they cost a loss of revenue and reputation.

I think it’s time technology companies start to rethink our QA, Testing, and Pre-Deployment requirements. How much time, effort, and money can we save if we catch these “monsters” before they make it into production?

Not to mention how much better our operations team will sleep . . .

The post What is the Top Cause of Application Downtime Today? appeared first on Percona Database Performance Blog.


This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Shortly after the last dispatch, I jetted off for a spot of vacation (which really meant I was checking out the hype behind Blockchain with a database developer lens at the Blockchain Week NYC), and then some customer visits in Seoul, which explains the short hiatus. Here’s to making this more regular as the summer approaches.

I am about to embark on a fairly long trip, covering a few upcoming appearances: Lisbon for the Percona Engineering meeting, SouthEastLinuxFest in Charlotte, the Open Source Data Centre Conference in Berlin and then the DataOps Barcelona event. I have some discount codes: 50% discount for OSDC with the code OSDC_FOR_FRIENDS, and 50% discount for DataOps Barcelona with the code dataopsbcn50. Expect this column to reflect my travels over the next few weeks.

There has been a lot of news on the MariaDB front: MariaDB 10.3.7 went stable/GA! You might have noticed more fanfare around the release name MariaDB TX 3.0, but the reality is you can still get this download from your usual MariaDB Foundation site. It is worth noting that the MariaDB Foundation 2017 financials have also been released. Some may have noticed a couple months back there was a press release titled Report “State of the Open-Source DBMS Market, 2018” by Gartner Includes Pricing Comparison With MariaDB. This led to a Gartner report on the State of the Open-Source DBMS Market, 2018; although the report has since been pulled. Hopefully we see it surface again.

In the meantime, please do try out MariaDB 10.3.7 and it would be great to hear feedback. I also have an upcoming Percona webinar on MariaDB Server 10.3 on June 26 2018 — when the sign up link appears, I will be sure to include it here.

Well written, and something worth discussing: Should Red Hat Buy or Build a Database?. The Twitter discussion is also worth looking at.


Link List

Upcoming appearances


I look forward to receiving feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road appeared first on Percona Database Performance Blog.


Don’t Drown in your Data Lake

Don't drown in your data lake

Don't drown in your data lakeA data lake is “…a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms…”1. Many companies find value in using a data lake but aren’t clear that they need to properly plan for it and maintain it in order to prevent issues.

The idea of a data lake rose from the need to store data in a raw format that is accessible to a variety of applications and authorized users. Hadoop is often used to query the data, and the necessary structures for querying are created through the query tool (schema on read) rather than as part of the data design (schema on write). There are other tools available for analysis, and many cloud providers are actively developing additional options for creating and managing your data lake. The cloud is often viewed as an ideal place for your data lake since it is inherently elastic and can expand to meet the needs of your data.

Data Lake or Data Swamp?

One of the key components of a functioning data lake is the continuing inflow and egress of data. Some data must be kept indefinitely but some can be archived or deleted after a defined period of time. Failure to remove stale data can result in a data swamp, where the out of date data is taking up valuable and costly space and may be causing queries to take longer to complete. This is one of the first issues that companies encounter in maintaining their data lake. Often, people view the data lake as a “final resting place” for data, but it really should be used for data that is accessed often, or at least occasionally.

A natural spring-fed lake can turn into a swamp due to a variety of factors. If fresh water is not allowed to flow into the lake, this can cause stagnation, meaning that plants and animals that previously were not able to be supported by the lake take hold. Similarly, if water cannot exit the lake at some point, the borders will be breached, and the surrounding land will be inundated. Both of these conditions can cause a once pristine lake to turn into a fetid and undesirable swamp. If data is no longer being added to your data lake, the results will become dated and eventually unreliable. Also, if data is always being added to the lake but is not accessed on a regular basis, this can lead to unrestricted growth of your data lake, with no real plan for how the data will be used. This can become an expensive “cold storage” facility that is likely more expensive than archived storage.

If bad or undesirable items, like old cars or garbage, are thrown into a lake, this can damage the ecosystem, causing unwanted reactions. In a data lake, this is akin to simply throwing data into the data lake with no real rules or rationale. While the data is saved, it may not be useful and can cause negative consequences across the whole environment since it is consuming space and may slow response times. Even though a basic concept of a data lake is that the data does not need to conform to a predefined structure, like you would see with a relational database, it is important that some rules and guidelines exist regarding the type and quality of data that is included in the lake. In the absence of some guidelines, it becomes difficult to access the relevant data for your needs. Proper definition and tagging of content help to ensure that the correct data is accessible and available when needed.

Unrestricted Growth Consequences

Many people have a junk drawer somewhere in their house; a drawer that is filled with old receipts, used tickets, theater programs, and the like. Some of this may be stored for sentimental reasons, but a lot of it is put into this drawer since it was a convenient dropping place for things. Similarly, if we look to the data lake as the “junk drawer” for our company, it is guaranteed to be bigger and more expensive than it truly needs to be.

It is important that the data that is stored in your data lake has a current or expected purpose. While you may not have a current use for some data, it can be helpful to keep it around in case a need arises. An example of this is in the area of machine learning. Providing more ancillary data enables better decisions since it provides a deeper view into the decision process. Therefore, maintaining some data that may not have a specific and current need can be helpful. However, there are cases where maintaining a huge volume of data can be counterproductive. Consider temperature information delivered from a switch. If the temperature reaches a specific threshold, the switch should be shut down. Reporting on the temperature in an immediate and timely manner is important to make an informed decision, but stable temperature data from days, week, or months ago could be summarized and stored in a more efficient manner. The granular details can then be purged from the lake.

So, where is the balance? If you keep all the data, it can make your data lake unwieldy and costly. If you only keep data that has a specific current purpose, you may be impairing your future plans. Obviously, the key is to monitor your access and use of the data frequently, and purge or archive some of the data that is not being regularly used.

Uncontrolled Access Concerns

Since much of the data in your data lake is company confidential, it is imperative that access to that data be controlled. The fact that the data in the lake is stored in its raw format means that it is more difficult to control access. The structures of a relational database provide some of the basis for access control, allowing us to limit who has access to specific queries, tables, fields, schemas, databases, and other objects. In the absence of these structures, controlling access requires more finesse. Determining who has access to what parts of the data in the lake must be handled, as well as isolating the data within your own network environment. Many of these restrictions may already be in place in your current environment, but they should be reviewed before being relied on fully, since the data lake may store information that was previously unavailable to some users. Access should be regularly reviewed to identify potential rogue activities. Encryption options also exist to further secure the data from unwanted access, and file system security can be used to limit access. All of these components must be considered, implemented, and reviewed to ensure that the data is secure.

User Considerations

In a relational database, the data structure inherently determines some of the consistencies and format of the data. This enables users to easily query the data and be assured that they are returning valid results. The lack of such structures in the data lake means that users must be more highly skilled at data manipulation. Having users with less skill accessing the data is possible, but it may not provide the best results. A data scientist is better positioned to access and query the complete data set. Obviously, users with a higher skill set are rare and cost more to hire, but the return may be worth it in the long run.

So What Do I Do Now?

This is an area where there are no hard and fast rules. Each company must develop and implement processes and procedures that make sense for their individual needs. Only with a plan for monitoring inputs, outputs, access patterns, and the like are you able to make a solid determination for your company’s needs. Percona can help to determine a plan for reporting usage, assess security settings, and more. As you are using the data in your data lake, we can also provide guidance regarding tools used to access the data.

1 Wikipedia, May 22, 2018

The post Don’t Drown in your Data Lake appeared first on Percona Database Performance Blog.


Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution

PMM on Linode small

In this blog, I will show you how to install PMM on Linode as a low-cost database monitoring solution.

Many of my friends use Linode to run their personal sites, as well as small projects. While Linode is no match for Big Cloud providers in features, it is really wonderful when it comes to cost and simplicity: a Linode “nanode” instance offers 1GB of memory, 1 core, 20GB of storage and 1TB of traffic for just $5 a month.

A single Linode instance is powerful enough to use with Percona Monitoring and Management (PMM) to monitor several systems, so I use Linode a lot when I want to demonstrate PMM deployment through Docker, rather than Amazon Marketplace.

Here are step-by-step instructions to get you started with Percona Monitoring and Management (PMM) on Linode in five minutes (or less):

Step 1:  Pick the Linode Type, Location and launch it.

PMM on Linode

Step 2: Name your Linode

This step is optional and is not PMM-related, but you may want to give your Linode an easy-to-remember name instead of something like “linode7796908”. Click on Linode Name and then on “Settings” and enter a name in “Linode Label”.

PMM on Linode 2

Step 3:  Deploy the Image

Click on Linode Name and then on “Deploy an Image”.

PMM on Linode 3

I suggest choosing the latest Ubuntu LTS version and allocating 512MB for the swap file, especially on a Linode with a small amount of memory. Remember to set a strong root password, as Linode allows root password login by default from any IP.

Step 4: Boot Linode

Now prepare the image you need to boot your Linode. Click on the Boot button for that:

PMM on Linode 4

Step 5: Login to the system and install Docker

Use your favorite SSH client to login to the Linode you created using “root” user and password you set at Step 3, and install Docker:

apt install docker.io

Step 6: Run PMM Server

Here are detailed instructions to install the PMM Server on Docker. Below are the commands to do basic installation:

docker pull percona/pmm-server:latest
docker create
  -v /opt/prometheus/data
  -v /opt/consul-data
  -v /var/lib/mysql
  -v /var/lib/grafana
  --name pmm-data
  percona/pmm-server:latest /bin/true
docker run -d
  -p 80:80
  --volumes-from pmm-data
  --name pmm-server
  --restart always

Note: This deploys PMM Server without authentication. For anything but test usage, you should set a password by following instructions on this page.

You’re done!

You’ve now installed PMM Server and you can see it monitoring itself by going to the server IP with a browser.

PMM on Linode 5

Now you can go ahead and install the PMM Client on the nodes you want to monitor!

The post Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution appeared first on Percona Database Performance Blog.


Percona Live 2018 Community Report

So, after a whirlwind few days, Percona Live 2018 has been and gone. There was a great energy about the conference, and it was fantastic to meet so many open source database enthusiasts and supporters. A few things that I experienced:

  • Your great willingness to share knowledge. It was a fantastic place to learn for those who have experience from a different field of technology. Almost everyone seemed to be very open and generous with their time.
  • The “superstars” from our industry are not so scary. They are as willing to be open and generous with their experience and views as any of the other attendees, and equally as interested in making new discoveries.
  • There aren’t many times you can sit down to a (community) dinner, to share food and anecdotes with people from USA, UK, Germany and Armenia at the same time. I thoroughly enjoyed the company, and wish there were more opportunities for similar encounters. Thanks to Pythian for setting that up.
  • My Percona colleagues are wonderful, committed human beings with more than a passing interest in music – the Percona Sessions have got to happen…
  • That you can run a very long way in a day between the Santa Clara Convention Center and the Hyatt Regency Hotel.

I had very many positive conversations with delegates. You offered any criticisms along with a suggestion of how we should tweak things for the better. Our community is a creative, generous, problem-solving machine, though I shouldn’t be surprised at that.

So, with only a few more duties to complete, I’d like to thank you for your company. For those that did not make it to this year’s event, I hope that you might be persuaded to join us in the future — either at Percona Live Europe 2018 or at Percona Live 2019.

Packt Prizes

Our media sponsor, Packt, generously provided us with three free ebooks and two free instruction videos as prizes for delegates:

  1. Mastering MongoDB 3.x
  2. MySQL 8 Cookbook
  3. MongoDB Administrator’s Guide
  4. Elastic Databases and Data Processing with AWS [Video]
  5. AWS Administration – Database, Networking, and Beyond [Video]

There are another 10 titles for which we can offer delegates a 50% discount: you should have received your emails. Thanks are due again to Packt.

Community Blog

While I have your attention, I’d like to let you know about the forthcoming Percona community blog. Having been some time in the planning, this is starting really soon, and is like a year-round, online, Percona Live. We already have some keen writers for this, but if you would be interested in creating content (whether written, podcast or webcast) for the community blog, then please get in touch. The brief is very wide — as long as your submission is relevant to the open source database community then it would be welcome.

Finally, I would like to invite feedback on how to make the event shine even brighter — please drop me an email if you have suggestions or ideas. Meanwhile, I hope you enjoy these photographs of the MySQL Community Awards Winners, presented at PL18. You can read more about this community initiative.

Perhaps you’ll be able to join us in Frankfurt in November? Time to start thinking about those submissions for the call for papers!

Or perhaps next year at Percona Live Open Source Database Conference in 2019 – wherever it may be!

Photographs: Randy Tunnell Photography

The post Percona Live 2018 Community Report appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com