Jun
25
2018
--

Webinar Tues 6/26: MariaDB Server 10.3

MariaDB 10.3 Webinar

MariaDB 10.3 WebinarPlease join Percona’s Chief Evangelist, Colin Charles on Tuesday, June 26th, 2018, as he presents MariaDB Server 10.3 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

MariaDB Server 10.3 is out. It has some interesting features around system versioned tables, Oracle compatibility, column compression, an integrated SPIDER engine, as well as MyRocks. Learn about what’s new, how you can use it, and how it is different from MySQL.

Register Now

Colin Charles

Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, and had worked at MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within open source communities in APAC, and has spoken at many conferences. Experienced technologist, well known in the open source world for work that spans nearly two decades within the community. Pays attention to emerging technologies from an integration standpoint. Prolific speaker at many industry-wide conferences delivering talks and tutorials with ease. Interests: application development, systems administration, database development, migration, Web-based technologies. Considered expert in Linux and Mac OS X usage/administration/roll-out’s. Specialties: MariaDB, MySQL, Linux, Open Source, Community, speaking & writing to technical audiences as well as business stakeholders.

The post Webinar Tues 6/26: MariaDB Server 10.3 appeared first on Percona Database Performance Blog.

Nov
20
2017
--

InnoDB Page Compression: the Good, the Bad and the Ugly

InnoDB Page CompressionIn this blog post, we’ll look at some of the facets of InnoDB page compression.

Somebody recently asked me about the best way to handle JSON data compression in MySQL. I took a quick look at InnoDB page compression and wanted to share my findings.

There is also some great material on this topic that was prepared and presented by Yura Sorokin at Percona Live Europe 2017: https://www.percona.com/live/e17/sessions/percona-xtradb-compressed-columns-with-dictionaries-an-alternative-to-innodb-table-compression. Yura also implemented Compressed Columns in Percona Server.

First, the good part.

InnoDB page compression is actually really easy to use and provides a decent compression ratio. To use it, I just ran

CREATE TABLE commententry (...) COMPRESSION="zlib";

 – and that’s all. By the way, for my experiment I used the subset of Reddit comments stored in JSON (described here: Big Dataset: All Reddit Comments – Analyzing with ClickHouse).

This method got me a compressed table of 3.9GB. Compare this to 8.4GB for an uncompressed table and it’s about a 2.15x compression ratio.

Now, the bad part.

As InnoDB page compression uses “hole punching,” the standard Linux utils do not always properly support files created this way. In fact, to see the size “3.9GB” I had to use

du --block-size=1 tablespace_name.ibd

 , as the standard

ls -l tablespace_name.ibd

 shows the wrong size (8.4GB). There is a similar limitation on copying files. The standard way

cp old_file new_file

 may not always work, and to be sure I had to use

cp --sparse=always old_file new_file

.

Speaking about copying, here’s the ugly part.

The actual time to copy the sparse file was really bad.

On a fairly fast device (a Samsung SM863), copying the sparse file mentioned above in its compressed size of 3.9GB took 52 minutes! That’s shocking, so let me repeat it again: 52 minutes to copy a 3.9GB file on an enterprise SATA SSD.

By comparison, copying regular 8.4GB file takes 9 seconds! Compare 9 sec and 52 mins.

To be fair, the NMVe device (Intel® SSD DC D3600) handles sparse files much better. It took only 12 seconds to copy the same sparse file on this device.

Having considered all this, it is hard to recommend that you use InnoDB page compression for serious production. Well, unless you power your database servers with NVMe storage.

For JSON data, the Compressed Columns in Percona Server for MySQL should work quite well using Dictionary to store JSON keys – give it a try!

May
05
2015
--

MongoDB’s flexible schema: How to fix write amplification

Being schemaless is one of the key features of MongoDB. On the bright side this allows developers to easily modify the schema of their collections without waiting for the database to be ready to accept a new schema. However schemaless is not free and one of the drawbacks is write amplification. Let’s focus on that topic.

Write amplification?

The link between schema and write amplification is not obvious at first sight. So let’s first look at a table in the relational world:

mysql> SELECT * FROM user LIMIT 2;
+----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+
| id | login | first_name | last_name | city      | country                          | zipcode | address                           | password   | birth_year |
+----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+
|  1 | arcu  | Vernon     | Chloe     | Paulista  | Cook Islands                     | 28529   | P.O. Box 369, 1464 Ac Rd.         | SSC44GZL5R |       1970 |
|  2 | quis  | Rogan      | Lewis     | Nashville | Saint Vincent and The Grenadines | H3T 3S6 | P.O. Box 636, 5236 Elementum, Av. | TSY29YRN6R |       1983 |
+----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+

As all records have exactly the same fields, the field names are stored once in a separate file (.frm file). So the field names is metadata while the value of each field for each record is of course data.

Now let’s look at an equivalent collection in MongoDB:

{
        {
                "login": "arcu",
                "first_name": "Vernon",
                "last_name": "Chloe",
                "city": "Paulista",
                "country": "Cook Islands",
                "zipcode": "28529",
                "address": "P.O. Box 369, 1464 Ac Rd.",
                "password": "SSC44GZL5R",
                "birth_year": 1970
        },
        {
                "login": "quis",
                "first_name": "Rogan",
                "last_name": "Lewis",
                "city": "Nashville",
                "country": "Saint Vincent and The Grenadines",
                "zipcode": "H3T 3S6",
                "address": "P.O. Box 636, 5236 Elementum, Av.",
                "password": "TSY29YRN6R",
                "birth_year": 1983
        }
}

One difference with a table in the relational world is that MongoDB doesn’t know which fields each document will have. Therefore field names are data, not metadata and they must be stored with each document.

Then the question is: how large is the overhead in terms of disk space? To have an idea, I inserted 10M such records in an InnoDB table (adding an index on password and on birth_year to make the table look like a real table): the size on disk is around 1.4GB.

I also inserted the exact same 10M records in a MongoDB collection using the regular MMAPv1 storage engine, again adding an index on password and on birth_year, and this time the size on disk is … 2.97GB!

Of course it is not an apples-to-apples comparison as the InnoDB storage format and the MongoDB storage format are not identical. However a 100% difference is still significant.

Compression

One way to deal with write amplification is to use compression. With MongoDB 3.0, the WiredTiger storage engine is available and one of its benefits is compression (default algorithm: snappy). Percona TokuMX also has built-in compression using zlib by default.

Rebuilding the collection with 10M documents and the 2 indexes now gives the following results:
WiredTiger: 1.14GB
TokuMX: 736MB

This is a 2.5x to 4x data size reduction, pretty good!

WiredTiger also provides zlib compression and in this case the collection is only 691MB. However CPU usage is much higher compared to snappy so zlib will not be usable in all situations.

Conclusion

MongoDB schemaless design is attractive but it comes with several tradeoffs. Write amplification is one of them and using either WiredTiger with MongoDB 3.0 or Percona TokuMX is a very simple way to fix the issue.

The post MongoDB’s flexible schema: How to fix write amplification appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com