Jan
14
2019
--

Upcoming Webinar Thurs 1/17: How to Rock with MyRocks

How to Rock with MyRocks

How to Rock with MyRocksPlease join Percona’s Chief Technology Officer, Vadim Tkachenko, as he presents How to Rock with MyRocks on Thursday, January 17th at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

MyRocks is a new storage engine from Facebook and is available in Percona Server for MySQL. In what cases will you want to use it? We will check different workloads and when MyRocks is most suitable for you. Also, as for any new engine, it’s important to set it up and tune it properly. So, we will review the most important settings to pay attention to.

Register for this webinar to learn How to Rock with MyRocks.

Jan
08
2019
--

Upcoming Webinar Wed 1/9: Walkthrough of Percona Server MySQL 8.0

Walkthrough of Percona Server for MySQL 8.0

Walkthrough of Percona Server for MySQL 8.0Please join Percona’s MySQL Product Manager, Tyler Duzan as he presents Walkthrough of Percona Server MySQL 8.0 on Wednesday, January 9th at 11:00 AM PDT (UTC-7) / 2:00 PM (UTC-4).

Register Now

Our Percona Server for MySQL 8.0 software is the company’s free, enhanced, drop-in replacement for MySQL Community Edition. The software includes all of the great features in MySQL Community Edition 8.0. Additionally, it includes enterprise-class features from Percona made available free and open source. Thousands of enterprises trust Percona Server for MySQL to deliver excellent performance and reliability for their databases and mission-critical applications. Furthermore, our open source software meets their need for a mature, proven and cost-effective MySQL solution.

In sum, register for this webinar for a walkthrough of Percona Server for MySQL 8.0.

Jan
03
2019
--

Upcoming Webinar Friday 1/4: High-Performance PostgreSQL, Tuning and Optimization Guide

High-Performance PostgreSQL, Tuning and Optimization Guide

High-Performance PostgreSQL, Tuning and Optimization GuidePlease join Percona’s Senior Software Engineer, Ibrar Ahmed as he presents his High-Performance PostgreSQL, Tuning and Optimization Guide on Friday, January, 4th, at 8:00 AM PDT (UTC-7) / 11:00 AM EDT (UTC-4).

Register Now

PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any workload. Thus, any system with least resources can run it. PostgreSQL does not give optimum performance on high permanence machines because it is not using the all available resource. PostgreSQL provides a system where you can tune your database according to your workload and machine’s specifications. In addition to PostgreSQL, we can also tune our Linux box so that the database load can work optimally.

In this webinar on High-Performance PostgreSQL, Tuning and Optimization, we will learn how to tune PostgreSQL and we’ll see the results of that tuning. We will also touch on tuning some Linux kernel parameters.

 

Dec
11
2018
--

Upcoming Webinar Wed 12/12: MySQL 8 for Developers

MySQL 8 for Developers

MySQL 8 for DevelopersPlease join Percona’s CEO Peter Zaitsev as he presents MySQL 8 for Developers on Wednesday, December 12th, 2018 at 11:00 AM PST (UTC-7) / 2:00 PM EST (UTC-5).

Register Now

There are many great new features in MySQL 8, but how exactly can they help your application? This session takes a practical look at MySQL 8 features. It also details which limitations of previous MySQL versions are overcome by MySQL 8. Lastly, what you can do with MySQL 8 that you could not have done before is discussed.

Register for MySQL 8 for Developers to learn how MySQL’s new features can help your application and more.

Dec
05
2018
--

Nondeterministic Functions in MySQL (i.e. rand) Can Surprise You

MySQL non deterministic functions rand

Working on a test case with sysbench, I encountered this:

mysql> select * from sbtest1 where id = round(rand()*10000, 0);
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|  179 | 499871 | 09833083632-34593445843-98203182724-77632394229-31240034691-22855093589-98577647071-95962909368-34814236148-76937610370 | 62233363025-41327474153-95482195752-11204169522-13131828192 |
| 1606 | 502031 | 81212399253-12831141664-41940957498-63947990218-16408477860-15124776228-42269003436-07293216458-45216889819-75452278174 | 25423822623-32136209218-60113604068-17409951653-00581045257 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
2 rows in set (0.30 sec)

I was really surprised. First, and the most important, id is a primary key and the rand() function should produce just one value. How come it returns two rows? Second, why is the response time 0.30 sec? That seems really high for a primary key access.

Looking further:

CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1
mysql> explain select * from sbtest1 where id = round(rand()*10000, 0);
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | sbtest1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 986400 |    10.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+

So it is a primary key, but MySQL does not use an index, and it returns two rows. Is this a bug?

Deterministic vs nondeterministic functions

Turned out it is not a bug at all. It is pretty logical behavior from MySQL, but it is not what we would expect. First, why a full table scan? Well, rand() is nondeterministic function. That means we do not know what it will return ahead of time, and actually that is exactly the purpose of rand() – to return a random value. In this case, it is only logical to evaluate the function for each row, each time, and compare the results. i.e. in our case

  1. Read row 1, get the value of id, evaluate the value of RAND(), compare
  2. Proceed using the same algorithm with the remaining rows.

In other words, as the value of rand() is not known (not evaluated) beforehand, so we can’t use an index.

And in this case – rand() function – we have another interesting consequence. For larger tables with an auto_increment primary key, the probability of matching the rand() value and the auto_increment value is higher, so we can get multiple rows back. In fact, if we read the whole table from the beginning and keep comparing the auto_inc sequence with “the roll of the dice”, we can get many rows back.

That behavior is totally counter-intuitive. Nevertheless, to me, it’s also the only correct behavior.

We expect to have the rand() function evaluated before running the query.  This can actually be achieved by assigning rand() to a variable:

mysql> set @id=round(rand()*10000, 0); select @id; select * from sbtest1 where id = @id;
Query OK, 0 rows affected (0.00 sec)
+------+
| @id  |
+------+
| 6068 |
+------+
1 row in set (0.00 sec)
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 6068 | 502782 | 84971025350-12845068791-61736600622-38249796467-85706778555-74134284808-24438972515-17848828748-86869270666-01547789681 | 17507194006-70651503059-23792945260-94159543806-65683812344 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain select * from sbtest1 where id = @id;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | sbtest1 | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)

This would meet our expectations.

There are (at least) two bug reports filed, with very interesting discussion:

  1. rand() used in scalar functions returns multiple rows
  2. SELECT on PK with ROUND(RAND()) give wrong errors

Other databases

I wanted to see how it works in other SQL databases. In PostgreSQL, the behavior is exactly the same as MySQL:

postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 4093 | asdasda
 9378 | asdasda
(2 rows)
postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 5988 | asdasda
 6674 | asdasda
(2 rows)
postgres=# explain select * from t2 where id = cast(random()*10000 as int);
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..159837.60 rows=1 width=12)
   Filter: (id = ((random() * '10000'::double precision))::integer)
(2 rows)

And SQLite seems different, evaluating the random() function beforehand:

sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
16239|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
32910|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
58658|asdsadasdsa
sqlite> explain select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    00  Start at 12
1     OpenRead       0     30182  0     2              00  root=30182 iDb=0; t2
2     Function0      0     0     3     random(0)      00  r[3]=func(r[0])
3     Cast           3     69    0                    00  affinity(r[3])
4     Function0      0     3     2     abs(1)         01  r[2]=func(r[3])
5     Divide         4     2     1                    00  r[1]=r[2]/r[4]
6     Cast           1     68    0                    00  affinity(r[1])
7     SeekRowid      0     11    1                    00  intkey=r[1]; pk
8     Copy           1     5     0                    00  r[5]=r[1]
9     Column         0     1     6                    00  r[6]=t2.c
10    ResultRow      5     2     0                    00  output=r[5..6]
11    Halt           0     0     0                    00
12    Transaction    0     0     2     0              01  usesStmtJournal=0
13    Int64          0     4     0     92233720368547  00 r[4]=92233720368547
14    Goto           0     1     0                    00

Conclusion

Be careful when using MySQL nondeterministic functions in  a “where” condition – rand() is the most interesting example – as their behavior may surprise you. Many people believe this to be a bug that should be fixed. Let me know in the comments: do you think it is a bug or not (and why)? I would also be interested to know how it works in other, non-opensource databases (Microsoft SQL Server, Oracle, etc)

PS: Finally, I’ve got a “clever” idea – what if I “trick” MySQL by using the deterministic keyword…

MySQL stored functions: deterministic vs not deterministic

So, I wanted to see how it works with MySQL stored functions if they are assigned “deterministic” and “not deterministic” keywords. First, I wanted to “trick” mysql and pass the deterministic to the stored function but use rand() inside. Ok, this is not what you really want to do!

DELIMITER $$
CREATE FUNCTION myrand() RETURNS INT
    DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;

From MySQL manual about MySQL stored routines we can read:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.

The result is interesting:

mysql> select myrand();
+----------+
| myrand() |
+----------+
|     4202 |
+----------+
1 row in set (0.00 sec)
mysql> select myrand();
+----------+
| myrand() |
+----------+
|     7548 |
+----------+
1 row in set (0.00 sec)
mysql> explain select * from t2 where id = myrand()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE noticed after reading const tables
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------+
| Level | Code | Message                                                                        |
+-------+------+--------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select '2745' AS `id`,'asasdas' AS `c` from `test`.`t2` where 0 |
+-------+------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 4202;
+------+---------+
| id   | c       |
+------+---------+
| 4202 | asasdas |
+------+---------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 2745;
+------+---------+
| id   | c       |
+------+---------+
| 2745 | asasdas |
+------+---------+
1 row in set (0.00 sec)

So MySQL optimizer detected the problem (somehow).

If I use the NOT DETERMINISTIC keyword, then MySQL works the same as when using the rand() function:

DELIMITER $$
CREATE FUNCTION myrand2() RETURNS INT
   NOT DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;
mysql> explain select * from t2 where id = myrand2()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 262208
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

 


Photo by dylan nolte on Unsplash

Dec
04
2018
--

MongoDB 4.0: Using ACID Multi-Document Transactions

mongodb 4.0 acid compliant transactions

mongodb 4.0 acid compliant transactionsMongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.

Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.

MongoDB 4.0 provides fully ACID transactions support but remember:

  • multi-document transactions are available for replica set deployments only
    • you can use transactions even on a standalone server but you need to configure it as a replica set (with just one node)
  • multi-document transactions are not available for sharded cluster
    • hopefully transactions will be available from version 4.2
  • multi-document transactions are available for the WiredTiger storage engine only

ACID transactions in MongoDB 4.0

ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.

  • Atomicity: a group of commands inside the transaction must follow the “all or nothing” paradigm. If only one of the commands fails for any reason, the complete transaction fails as well.
  • Consistency: if a transaction successfully executes, it will take the database from one state that is consistent to another state that is also consistent.
  • Isolation: multiple transactions can run at the same time in the system. Isolation guarantees that each transaction is not able to view partial results of the others. Executing multiple transactions in parallel must have the same results as running them sequentially
  • Durability: it guarantees that a transaction that has committed will remain persistent, even in the case of a system failure

Limitations of transactions

The support for transactions introduced some limitations:

  • a collection MUST exist in order to use transactions
  • a collection cannot be created or dropped inside a transaction
  • an index cannot be created or dropped inside a transaction
  • non-CRUD operations are not permitted inside a transaction (for example, administrative commands like createUser are not permitted )
  • a transaction cannot read or write in config, admin, and local databases
  • a transaction cannot write to system.* collections
  • the size of a transaction is limited to 16MB
    • a single oplog entry is generated during the commit: the writes inside the transaction don’t have single oplog entries as in regular queries
    • the limitation is a consequence of the 16MB maximum size of any BSON document in the oplog
    • in case of larger transactions, you should consider splitting these into smaller transactions
  • by default a transaction that executes for longer then 60 seconds will automatically expire
    • you can change this using the configuration parameter transactionLifetimeLimitSeconds
    • transactions rely on WiredTiger snapshot capability, and having a long running transaction can result in high pressure on WiredTiger’s cache to maintain snapshots, and lead to the retention of a lot of unflushed operations in memory

Sessions

Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.

At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.

Three new commands were introduce for creating, committing, and aborting transactions:

  • session.startTransaction()
    • starts a new transaction in the current session
  • session.commitTransaction()
    • saves consistently and durably the changes made by the operations in the transaction
  • session.abortTransaction()
    • the transaction ends without saving any of the changes made by the operations in the transaction

Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.

Our first transaction

To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:

#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo

Create a new collection, and insert some data.

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.createCollection('people')
{
   "ok" : 1,
   "operationTime" : Timestamp(1538483120, 1),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1538483120, 1),
      "signature" : {
         "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         "keyId" : NumberLong(0)
       }
    }
}
foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])

Create a session

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }

Start a transaction and insert some new documents

foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
WriteResult({ "nInserted" : 2 })

Now read the collection from inside and outside the session and see what happens

foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }

As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.

Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.

foo:PRIMARY> session.commitTransaction()
foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.

Isolation test

Let’s test now the isolation between two concurrent transactions.

Open the first connection, create a session and start a transaction:

//Connection #1
foo:PRIMARY> var session1 = db.getMongo().startSession()
foo:PRIMARY> session1.startTransaction()

do the same on the second connection:

//Connection #2
foo:PRIMARY> var session2 = db.getMongo().startSession()
foo:PRIMARY> session2.startTransaction()

Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.

//Connection #1
foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

Update the same collection on connection #2 to add the same gender field to all the males:

//Connection #2
foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})
WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.

Commit the transaction in connection #1:

//Connection #1
foo:PRIMARY> session1.commitTransaction()
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

In the connection #2 read the collection:

//Connection #2
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M"  }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M"  }
{ "_id" : 5, "name" : "Tom", "gender" : "M"  }

As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.

Now commit the transaction in connection #2 and see the new values of the collection:

//Connection #2
foo:PRIMARY> session2.commitTransaction()
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

Conflicts

When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.

Let’s see an example.

Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.

//Connection #1
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.

//Connection #2
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})
WriteCommandError({
    "errorLabels" : [
       "TransientTransactionError"
    ],
    "operationTime" : Timestamp(1538495683, 1),
    "ok" : 0,
    "errmsg" : "WriteConflict",
    "code" : 112,
    "codeName" : "WriteConflict",
    "$clusterTime" : {
       "clusterTime" : Timestamp(1538495683, 1),
       "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
       }
     }
})

We received an error and the transaction failed. We can retry it later.

Other details

  • the individual writes inside the transaction are not retry-able even if retryWrites is set to true
  • each commit operation is a retry-able write operation regardless of whether retryWrites is set to true. The drivers retry the commit a single time in case of an error.
  • Read Concern supports snapshot, local and majority values
  • Write Concern can be set at the transaction level. The individual operations inside the transaction ignore the write concern. Write concern is evaluated during the commit
  • Read Preference supports only primary value

Conclusions

Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.

Image modified from original photo: by Annie Spratt on Unsplash

Nov
29
2018
--

MySQL High Availability: Stale Reads and How to Fix Them

solutions for MySQL Stale Reads

solutions for MySQL Stale ReadsContinuing on the series of blog posts about MySQL High Availability, today we will talk about stale reads and how to overcome this issue.

The Problem

Stale reads is a read operation that fetches an incorrect value from a source that has not synchronized an update operation to the value (source Wiktionary).

A practical scenario is when your application applies INSERT or UPDATE data to your master/writer node, and has to read it immediately after. If this particular read is served from another server in the replication/cluster topology, the data is either not there yet (in case of an INSERT) or it still provides the old value (in case of an UPDATE).

If your application or part of your application is sensitive to stale reads, then this is something to consider when implementing HA/load balancing.

How NOT to fix stale reads

While working with customers, we have seen a few incorrect attempts to fix the issue:

SELECT SLEEP(X)

The most common incorrect approach that we see in Percona support is when customers add a sleep between the write and the read. This may work in some cases, but it’s not 100% reliable for all scenarios, and it can add latency when there is no need.

Let’s review an example where by the time you query your slave, the data is already applied and you have configured your transaction to start with a SELECT SLEEP(1). In this case, you just added 1000ms latency when there was no need for it.

Another example could be when the slave is lagging behind for more than whatever you configured as the parameter on the sleep command. In this case, you will have to create a login to keep trying the sleep until the slave has received the data: potentially it could take several seconds.

Reference: SELECT SLEEP.

Semisync replication

By default, MySQL replication is asynchronous, and this is exactly what causes the stale read. However, MySQL distributes a plugin that can make the replication semi-synchronous. We have seen customers enabling it hoping the stale reads problem will go away. In fact, that is not the case. The semi-synchronous plugin only ensures that at least one slave has received it (IO Thread has streamed the binlog event to relay log), but the action of applying the event is done asynchronously. In other words, stale reads are still a problem with semi-sync replication.

Reference: Semisync replication.

How to PROPERLY fix stale reads

There are several ways to fix/overcome this situation, and each one has its pros and cons:

1) MASTER_POS_WAIT

Consists of executing a SHOW MASTER STATUS right after your write, getting the binlog file and position, connecting on a slave, and executing the SELECT MASTER_POS_WAIT function, passing the binlog file and position as parameters. The execution will block until the slave has applied the position via the function. You can optionally pass a timeout to exit the function in case of exceeding this timeout.

Pros:

  • Works on all MySQL versions
  • No prerequisites

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, and can add significant latency to queries in cases where a slave/node is too far behind.

Reference: MASTER_POS_WAIT.

2) WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

Requires GTID: this is similar to the previous approach, but in this case, we need to track the executed GTID from the master (also available on SHOW MASTER STATUS).

Pros:

  • Works on all MySQL versions.

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, can add significant latency to queries in cases where a slave/node is too far behind.
  • As it requires GTID, it only works on versions from 5.6 onwards.

Reference: WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

3) Querying slave_relay_log_info

Consists of enabling relay_log_info_repository=TABLE and sync_relay_log_info=1 on the slave, and using a similar approach to option 1. After the write, execute  SHOW MASTER STATUS, connect to the slave, and query mysql.slave_relay_log_info , passing the binlog name and position to verify if the slave is already applying a position after the one you got from SHOW MASTER STATUS.

Pros:

  • This is not a blocking operation.
  • In cases where the slave is missing the position you require, you can try to connect to another slave and repeat the process. There is even an option to fail over back to the master if none of the slaves have the said position.

Cons:

  • Requires an application code rewrite.
  • In cases of checking multiple slaves, this can add significant latency.

Reference: slave_relay_log_info.

4) wsrep-sync-wait

Requires Galera/Percona XtraDB Cluster: Consists of setting a global/session variable to enforce consistency. This will block execution of subsequent queries until the node has applied all write-sets from it’s applier queue. It can be configured to trigger on multiple commands, such as SELECT, INSERT, and so on.

Pros:

  • Easy to implement. Built-in as a SESSION variable.

Cons:

  • Requires an application code rewrite in the event that you want to implement the solution on per session basis.
  • It’s a blocking operation, and can add significant latency to queries if a slave/node is too far behind.

Reference: wsrep-sync-wait

5) ProxySQL 2.0 GTID consistent reads

Requires MySQL 5.7 and GTID: MySQL 5.7 returns the GTID generated by a commit as part of the OK package. ProxySQL with the help of binlog readers installed on MySQL servers can keep track of which GTID the slave has already applied. With this information + the GTID received from the OK package at the moment of the write, ProxySQL will decide if it will route a subsequent read to one of the slaves/read nodes or if the master/write node will serve the read.

Pros:

  • Transparent to the application – no code changes are required.
  • Adds minimal latency.

Cons:

  • This still a new feature of ProxySQL 2.0, which is not yet GA.

Referece: GTID consistent reads.

Conclusions

Undesirable issues can arise from adding HA and distributing the load across multiple servers. Stale reads can cause an impact on applications sensitive to them. We have demonstrated various approaches you can use to overcome them.


Photo by Tim Evans on Unsplash

Nov
21
2018
--

Identifying Unused Indexes in MongoDB

mongodb index usage stats PMM visualization

Like MySQL, having too many indexes on a MongoDB collection not only affects overall write performance, but disk and memory resources as well. While MongoDB holds predictably well in scaling both reads and writes options, maintaining a heathly schema design should always remain a core character of a good application stack.

Aside from knowing when to add an index to improve query performance, and how to modify indexes to satisfy changing query complexities, we also need to know how to identify unused indexes and cut their unnecessary overhead.

First of all, you can already identify access operation counters from each collection using the

$indexStats

  (

indexStats

  command before 3.0) aggregation command. This command provides two important pieces of information: the

ops

  counter value, and

since

 , which is when the ops counter first iterated to one. It is reset when the

mongod

  instance is restarted.

m34:PRIMARY> db.downloads.aggregate( [ { $indexStats: { } } ] ).pretty()
{
	"name" : "_id_",
	"key" : {
		"_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:53:31.429Z")
	}
}
{
	"name" : "h_id_1",
	"key" : {
		"h_id" : 1
	},
	"host" : "mongodb:27018",
	"accesses" : {
		"ops" : NumberLong(0),
		"since" : ISODate("2018-11-10T15:54:57.634Z")
	}
}

From this information, if the ops counter is zero for any index, then we can assume it has not been used either since the index was added or since the server was restarted, with a few exceptions. An index might be unique and not used at all (a uniqueness check on INSERT does not increment the ops counter). The documentation also indicates that index stats counter does not get updated by TTL indexes expiration or chunk split and migration operations.

Be aware of occasional index use

One golden rule, however, is that this type of observation based on type is subjective – before you decide to drop the index, make sure that the counter has collected for a considerable amount of time. Dropping an index that is only used once a month for some heavy reporting can be problematic.

The same information from

$indexStats

  can also be made available to PMM. By default, the

mongo_exporter

  does not include this this information but it can be enabled as an additional collection parameter.

sudo pmm-admin add mongodb:metrics --uri 127.0.0.1:27018 -- -collect.indexusage

Once enabled, we can create a custom graph for this information from any PMM dashboard, as shown below. As mentioned above, any index(es) that has zero values will not have been used for the current time range in the graph. One minor issue with the collector is that each metric does not come with the database and collection information. Consequently, we cannot filter to the collection level yet, we have an improvement request open for that.

MongoDB index usage dashboard report from percona monitoring and management

An alternative view to this information from Grafana PMM is available from the Time Series to Aggregation table panel, shown below. One advantage of having these metrics in PMM is that the data survives an instance restart. Of course, to be useful for identifying unused indexes, the retention period has to match or exceed your complete application “cycle” period.MongoDB index usage stats from PMM

Given that in a MongoDB replicaset, you can delegate data bearing member nodes to different roles, perhaps with tags and priorities. You can also have nodes with different sets of indexes. Being able to identify the sets of indexes needed at the node level allows you to optimize replication, queries, and resource usage.

More Resources

We have an introductory series of posts on MongoDB indexes available on this blog. Read Part 1 here.

You can download Percona Server for MongoDB – all Percona software is open source and free.

Nov
19
2018
--

Installing and Configuring JIT in PostgreSQL 11

JIT with PostgreSQL

JIT in PostgreSQLJust-in-time (JIT in PostgreSQL) compilation of SQL statements is one of the highlighted features in PostgreSQL 11. There is great excitement in the community because of the many claims of up to a 30% jump in performance. Not all queries and workloads get the benefit of JIT compilation. So you may want to test your workload against this new feature.

However, It is important to have a general understanding of what it does and where we can expect the performance gains. Installing PostgreSQL 11 with the new JIT compilation feature requires few extra steps and packages. Taking the time and effort to figure out how to do this shouldn’t be a reason to shy away from trying these cutting-edge features and testing a workload against the JIT feature. This blog post is for those who want to try it.

What is JIT and What it does in PostgreSQL

Normal SQL execution in any DBMS software is similar to what an interpreted language does to the source code. No machine code gets generated out of your SQL statement. But we all know that how dramatic the performance gains can be from a JIT compilation and execution of the machine code it generates. We saw the magic Google V8 engine did to JavaScript language. The quest for doing a similar thing with SQL statement was there for quite some time. But it is a challenging task.

It is challenging because we don’t have the source code (SQL statement) ready within the PostgreSQL server. The source code that needs to undergo JIT need to come from client connections and there could be expressions/functions with a different number of arguments, and it may be dealing with tables of different number and type of columns.

Generally, a computer program won’t get modified at this level while it is running, so branching-predictions are possible. The unpredictability and dynamic nature of SQL statements coming from client connections and hitting the database from time-to-time give no scope for doing advance prediction or compilation in advance. That means the JIT compiler should kick in every time the database gets an SQL statement. For this reason, PostgreSQL needs the help of compiler infrastructure like LLVM  continuously available behind. Even though there were a couple of other options, the main developer of this feature (Andres Freund) had a strong reason why LLVM was the right choice.

. In PostgreSQL 11, the JIT feature currently does:

  1. Accelerating expression evaluation: Expressions in  WHERE clauses, target lists, aggregates and projections
  2. Tuple deforming: Converting on-disk image to corresponding memory representation.
  3. In-lining: bodies of small custom functions, operators and user-defined data types are inline-ed into the expressions using them
  4. You can use compiler optimizations provided by LLVM for preparing optimized machine code.

In this blog, we are going to see how to install PostgreSQL with JIT. Just like regular PostgreSQL installations, we have two options:

  1. Get PostgreSQL from the packages in the PGDG repository
  2. Build PostgreSQL from source

Option 1. Install from PGDG repository.

Compiling from source requires us to install all compilers and tools. We might want to avoid this for various reasons. Installing packages from a PGDG repository is straightforward. On production systems or a container, you might want to install only the bare minimum required packages. Additional packages you don’t really use are always a security concern. Distributions like Ubuntu provide more recent versions of libraries and tool-sets in their default repos. However, distributions like CentOS / RHEL are quite conservative — their priority is stability and proven servers rather than cutting-edge features. So In this section of the post is mostly relevant for CentOS7/RHEL 7.

Here are the steps for the bare minimum installation of PostgreSQL with JIT feature on CentOS7

Step 1. Install PGDG repo and Install PostgreSQL server package.

This is usually the bare minimum installation if we don’t need the JIT feature.

sudo yum install https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm
sudo yum install postgresql11-server

At this stage, we can initialize the data directory and start the service if we don’t need JIT:

sudo /usr/pgsql-11/bin/postgresql*-setup initdb
sudo systemctl start postgresql-11

Step 2. Install EPEL repository

sudo yum install epel-release

Step 3. Install package for PostgreSQL with llvmjit

sudo yum install postgresql11-llvmjit

Since we have already added the EPEL repository, now the dependancy can be resolved by YUM and it can pull and install the necessary package from EPEL. Installation message contains the necessary packages.

...
Installing:
postgresql11-llvmjit      x86_64     11.1-1PGDG.rhel7     pgdg11    9.0 M
Installing for dependencies:
llvm5.0                   x86_64     5.0.1-7.el7          epel      2.6 M
llvm5.0-libs              x86_64     5.0.1-7.el7          epel      13 M
...

As we can see, there are two packages: llvm5.0 and llvm5.0-libs get installed.

Note for Ubuntu users:

As we already mentioned, Repositories of recent versions of Ubuntu contains recent versions of LLVM libraries. For example, Ubuntu 16.04 LTS repo contains libllvm6.0 by default. Moreover, PostgreSQL server package is not divided to have a separate package for jit related files. So default installation of PostgreSQL 11 can get you JIT feature also.

wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt install postgresql-11

Option 2. Building from Source

The primary means of distributing PostgreSQL is the source code. Building a minimal PostgreSQL instance requires just a C compiler. But building JIT options requires a few more things. One of the challenges you can run into is different errors during the build process due to older versions of LLVM and Clang present in the system.

Step 1. Download PostgreSQL source tarball and unpack

Tarballs are available in the repository. We can grab and unpack the latest:

curl -LO https://ftp.postgresql.org/pub/source/v11.0/postgresql-11.0.tar.bz2
tar -xvf postgresql-11.0.tar.bz2

Step 2.  Get SCL Repository and Install toolset

Latest versions of LLVM, CLang and GCC are available in SCL. We can get everything in a stretch:

sudo yum install centos-release-scl
sudo yum install llvm-toolset-7 llvm-toolset-7-llvm-devel.x86_64

Now either you can set or edit your PATH to have all new tools in PATH. I would prefer to put that into my profile file:

PATH=/opt/rh/devtoolset-7/root/usr/bin/:/opt/rh/llvm-toolset-7/root/usr/bin/:$PATH

Alternatively, we can open a shell with SCL enabled:

scl enable devtoolset-7 llvm-toolset-7 bash

We should attempt to compile the source from a shell with all these paths set.

Step 3. Install Additional libraries/tools

Based on the configuration options you want, this list may change. Consider this as a sample for demonstration purposes:

sudo yum install  readline-devel zlib-devel libxml2-devel openssl-devel

Step 4. Configure with –with-llvm option and make

Now we should be able to configure and make with our preferred options. The JIT feature will be available if the 

--with-llvm

 option is specified. For this demonstration, I am using an installation directory with my home (/home/postgres/pg11):

./configure --prefix=/home/postgres/pg11 --with-openssl --with-libxml --with-zlib --with-llvm
make
make install

Enabling JIT

You may observe that there is a new directory under the PostgreSQL’s lib folder with name

bit code

Which contains lots of files with .bc extension these are pre-generated bytecodes for LLVM for facilitating features like in-lining.

By default, the JIT feature is disabled in PostgreSQL 11. If you want to test it, you may have to enable the parameter

jit

:

postgres=# ALTER SYSTEM SET jit=on;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)
postgres=# show jit;
 jit
-----
 on
(1 row)

By default, most of the simple queries won’t use JIT because of the cost. The cost is high when JIT kicks in. In case we want to test if JIT is properly configured, we can lie to PostgreSQL that that cost is very low by adjusting the parameter value. However, we should keep in mind that we are accepting negative performance gains. Let me show a quick example:

postgres=# SET jit_above_cost=5;
SET
postgres=# create table t1 (id int);
CREATE TABLE
postgres=# insert into t1 (SELECT (random()*100)::int FROM generate_series(1,800000) as g);
INSERT 0 800000
postgres=# analyze t1;
ANALYZE
postgres=# explain select sum(id) from t1;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=8706.88..8706.89 rows=1 width=8)
   ->  Gather  (cost=8706.67..8706.88 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=7706.67..7706.68 rows=1 width=8)
               ->  Parallel Seq Scan on t1  (cost=0.00..6873.33 rows=333333 width=4)
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)

As we can see in the above example, a separate JIT section comes up in the explain plan.

We expect JIT compilation to make a difference in complex analytical queries because the overhead in JIT compilation gets compensated only if the code runs for the duration. Here is a simple aggregate query for demonstration. (I know this is not a complex query, and not the perfect example for demonstrating JIT feature):

postgres=# EXPLAIN ANALYZE SELECT COMPANY_ID,
      SUM(SHARES) TOT_SHARES,
      SUM(SHARES* RATE) TOT_INVEST,
      MIN(SHARES* RATE) MIN_TRADE,
      MAX(SHARES* RATE) MAX_TRADE,
      SUM(SHARES* RATE * 0.002) BROKERAGE
FROM TRADING
GROUP BY COMPANY_ID;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=757298.72..758741.91 rows=5005 width=138) (actual time=16992.290..17011.395 rows=5000 loops=1)
   Group Key: company_id
   ->  Gather Merge  (cost=757298.72..758466.64 rows=10010 width=138) (actual time=16992.270..16996.919 rows=15000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=756298.70..756311.21 rows=5005 width=138) (actual time=16983.900..16984.356 rows=5000 loops=3)
               Sort Key: company_id
               Sort Method: quicksort  Memory: 1521kB
               Worker 0:  Sort Method: quicksort  Memory: 1521kB
               Worker 1:  Sort Method: quicksort  Memory: 1521kB
               ->  Partial HashAggregate  (cost=755916.09..755991.16 rows=5005 width=138) (actual time=16975.997..16981.354 rows=5000 loops=3)
                     Group Key: company_id
                     ->  Parallel Seq Scan on trading  (cost=0.00..287163.65 rows=12500065 width=12) (actual time=0.032..1075.833 rows=10000000 loops=3)
 Planning Time: 0.073 ms
 Execution Time: 17013.116 ms
(15 rows)

We can switch on the JIT parameter at the session level and retry the same query:

postgres=# SET JIT=ON;
SET
postgres=# EXPLAIN ANALYZE SELECT COMPANY_ID,
      SUM(SHARES) TOT_SHARES,
      SUM(SHARES* RATE) TOT_INVEST,
      MIN(SHARES* RATE) MIN_TRADE,
      MAX(SHARES* RATE) MAX_TRADE,
      SUM(SHARES* RATE * 0.002) BROKERAGE
FROM TRADING
GROUP BY COMPANY_ID;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=757298.72..758741.91 rows=5005 width=138) (actual time=15672.809..15690.901 rows=5000 loops=1)
   Group Key: company_id
   ->  Gather Merge  (cost=757298.72..758466.64 rows=10010 width=138) (actual time=15672.781..15678.736 rows=15000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=756298.70..756311.21 rows=5005 width=138) (actual time=15661.144..15661.638 rows=5000 loops=3)
               Sort Key: company_id
               Sort Method: quicksort  Memory: 1521kB
               Worker 0:  Sort Method: quicksort  Memory: 1521kB
               Worker 1:  Sort Method: quicksort  Memory: 1521kB
               ->  Partial HashAggregate  (cost=755916.09..755991.16 rows=5005 width=138) (actual time=15653.390..15658.581 rows=5000 loops=3)
                     Group Key: company_id
                     ->  Parallel Seq Scan on trading  (cost=0.00..287163.65 rows=12500065 width=12) (actual time=0.039..1084.820 rows=10000000 loops=3)
 Planning Time: 0.072 ms
 JIT:
   Functions: 28
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 5.844 ms, Inlining 137.226 ms, Optimization 201.152 ms, Emission 125.022 ms, Total 469.244 ms
 Execution Time: 15696.092 ms
(19 rows)

Here we see a 7.7% improvement in performance. I executed this several times and found that the performance gain is consistently 7-8% for this simple query (which takes 15 seconds to execute). The gains are higher for queries with more calculations/expressions.

Summary

It is fairly simple to install and configure JIT with PostgreSQL as demonstrated above. One point we would like to highlight is that installing JIT packages and enabling the JIT feature can be done online while the database is up and running. This is because all JIT related parameters are dynamic in nature. Parameter changes can be loaded a SIGHUP signal or

SELECT pg_reload_conf()

 by the superuser. If it is not helping our workload, we can turn it off anytime. Nothing stops you from trying it in a non-production environment. We might not see a gain in small and simple queries that take less time for execution because the overhead in doing the JIT compilation can become more than executing the SQL statement. But we should expect a good gain in OLAP workload with complex queries that run for a longer duration.

Oct
30
2018
--

Percona Server for MySQL 8.0 Delivers Increased Reliability, Performance and Security

Percona Server for MySQL

Percona Server for MySQLPercona released a Release Candidate (RC) version of Percona Server for MySQL 8.0, the company’s free, enhanced, drop-in replacement for MySQL Community Edition. Percona Server for MySQL 8.0 includes all the features of MySQL Community Edition 8.0, along with enterprise-class features from Percona that make it ideal for enterprise production environments. The latest release offers increased reliability, performance and security.

Percona Server for MySQL 8.0 General Availability (GA) will be available later this year. You learn how to install the release candidate software here. Please note this release candidate is not intended nor recommended for production environments

MySQL databases remain a pillar of enterprise data centers. But as the amount of data collected continues to soar, and as the types of databases deployed for specific applications continue to expand, organizations require a powerful and cost-effective MySQL solution for their production environments. Percona meets this need with its mature, proven open source alternative to MySQL Community Edition. Percona also backs MySQL 8.0 with the support and consulting services enterprises need to achieve optimal performance and maximize the value they obtain from the software – all with lower cost and complexity.

With more than 4,550,000 downloads, Percona Server for MySQL offers self-tuning algorithms and support for extremely high-performance hardware, delivering excellent performance and reliability for production environments. Percona Server for MySQL is trusted by thousands of enterprises to provide better performance. The following capabilities are unique to Percona Server for MySQL 8.0:

  • Greater scalability and availability, enhanced backups, and increased visibility to improve performance, reliability and usability
  • Parallel doublewrite functionality for greatly improved write performance, resulting in increased speed
  • Additional write-optimized storage engine, MyRocks, which takes advantage of modern hardware and database technology to reduce storage requirements, maintenance costs, and increase ROI in both on-premises and cloud-based applications, delivered with a MySQL-compatible interface.
  • Enhanced encryption functionality – including integration with Hashicorp Vault to simplify the management of encryption keys – for increased security
  • Advanced PAM-based authentication, audit logging, and threadpool scalability – enterprise-grade features available in Percona Server for MySQL without a commercial license

Percona Server for MySQL 8.0 also contains all the new features introduced in MySQL Community Edition 8.0, including:

  • Greater Reliability – A new transactional data dictionary makes recovery from failure easier, providing users with a higher level of comfort that data will not be lost. The transactional data dictionary is now crash-safe and centralized. In addition, support for atomic Data Definition Language (DDL) statements ensures that all operations associated with a DDL transaction are committed or rejected as a unit.
  • Enhanced Performance – New functions and expressions (along with Percona’s parallel doublewrite buffer) improve overall performance, allowing users to see their data more quickly. Enhanced JSON functionality improves document storage and query capabilities, and the addition of Window functions provides greater flexibility for aggregation queries. Common Table Expressions (CTE) enable improved query syntax for complex queries.
  • Increased Security – The ability to collect a typical series of permission grant statements for a user into a defined role and then apply that role to a user in MySQL makes the database environment more secure from outside attack by allowing administrators to better manage access permissions for authorized users. SQL Roles also enable administrators to more easily and efficiently determine a set of privileges for multiple associated users, saving time and reducing errors.
  • Expanded Queries – Percona Server for MySQL 8.0 provides support for spatial data types and indexes, as well as for Spatial Reference System (SRS), the industry-standard method for geospatial lookup.

Learn more about the Percona Server for MySQL 8.0 RC release here.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com