Aug
30
2013
--

Stop using exclamation points!!!!

exclamationAh, the exclamation point, or exclamation mark as some call it… I believe it is seriously overused, and I like to think that most editors would agree with me. Don’t you hate reading a book where every piece of dialog ends in one? Commonly it us used to indicate a raised voice, shouting, surprise, alarm, etc. Lazy! (See what I did there?) As a writer, if you cannot show the reader a character’s emotions or tone of voice in any other way, then you aren’t being creative. Context should indicate whether they are whispering or shouting.

Am I being unfair? As a reader, do you believe that is exactly what the exclamation point is for – to indicate an exclamation? The dictionary would support this claim:

Exclamation Point
noun
1. the sign (!) used in writing after an exclamation.
2. this mark sometimes used in writing two or more times in succession to indicate intensity of emotion, loudness, etc.: Long live the Queen!!
3. this mark sometimes used without accompanying words in writing direct discourse to indicate a speaker’s dumbfounded astonishment: “His wife just gave birth to quintuplets.” ( ! )

All right, but where does it stop? Often I see double marks, e.g. “Stop it!!” So how do we interpret this? If one mark is shouting, what does two mean? Believe it or not I have read a published book (whose author shall remain nameless) who frequently used up to 5 (yes five!!!!!) exclamation points. Now we’re just getting silly. Some authors fall into grammar traps too, such as “What the hell are you doing here?!” I do believe those two punctuation marks should never appear together.

There is even a term for the overuse of the exclamation point: Bangorrhea:

1. Overusing exclamation points in a vain and failing attempt to make your writing sound more exciting. Trying to put more “bang” in your prose, but looking instead like you have exclamation point diarrhea.

My goal is to limit myself to one a page, and then only during intense dialog. I can happily go chapters at a time without using one. In the editing stage I search for them and play a game of seeing how many I can remove and still get the meaning across. Does this mean the reader has to work harder at understanding? Yes, but I don’t regard that as a bad thing, since your writing can be more nuanced without resorting to the loud “bang”. Avoid using them frequently, or they diminish in effect. Ask yourself whether the sentence is a true exclamation or is just a statement. In doubt err on the side of the period.

The simple truth is:

If everything is emphasized, nothing is.

Let’s end on a quote from an author who knows a thing or two:

“Cut out all those exclamation marks. An exclamation mark is like laughing at your own jokes.”
— F. Scott Fitzgerald —

What do you think? Am I being unfair to a perfectly acceptable element of punctuation. Please share your comments below.

Aug
29
2013
--

Considering TokuDB as an engine for timeseries data

TokuDBI am working on a customer’s system where the requirement is to store a lot of timeseries data from different sensors.

For performance reasons we are going to use SSD, and therefore there is a list of requirements for the architecture:

  • Provide high insertion rate
  • Provide a good compression rate to store more data on expensive SSDs
  • Engine should be SSD friendly (less writes per timeperiod to help with SSD wear)
  • Provide a reasonable response time (within ~50 ms) on SELECT queries on hot recently inserted data

Looking on these requirements I actually think that TokuDB might be a good fit for this task.

There are several aspects to consider. This time I want to compare TokuDB vs InnoDB on an initial load time and space consumption.

Let’s assume the schema is following

CREATE TABLE `sensordata` (
  `ts` int(10) unsigned NOT NULL DEFAULT '0',
  `sensor_id` int(10) unsigned NOT NULL,
  `data1` double NOT NULL,
  `data2` double NOT NULL,
  `data3` double NOT NULL,
  `data4` double NOT NULL,
  `data6` double NOT NULL,
  `cnt` int(10) unsigned NOT NULL,
  PRIMARY KEY (`sensor_id`,`ts`)
)

where sensor_id is in a range from 1 to about 1000 and ts is monotonically increasing timestamp.

This schema exploits both TokuDB and InnoDB clustering primary key, and all inserts are “almost” sequential, which guarantee that all inserts will not require disk access and work with data in memory.
The same for SELECTS – select queries on the most recent time periods will be executed only by a memory access.

I am doing this research on the Dell PowerEdge R420 box with 48GB of memory (40GB for InnoDB buffer pool size, and default memory allocation for TokuDB, which is 24GB for tokudb cache). The storage is a very fast PCI-e Flash card.

The test export CSV file, suitable for LOAD DATA INFILE is 40GB in size and contains over 1 bln records (1.238.201.948 exactly)

MySQL Versions:

  • For InnoDB tests I used Percona Server 5.6-RC2
  • For TokuDB tests I used mariadb-5.5.30-tokudb-7.0.4 from Tokutek website

So, first, let’s load data into InnoDB, again, I am using LOAD DATA INFILE statement

  • InnoDB, no compression. Load time is 1 hour 26 min 25.77 sec, final table size is 90GB
  • InnoDB, 8K compression. Load time 3 hours 26 min 17.06 sec, the table size is 45GB
  • InnoDB, 4K compression. Load time 17 hours 23 min 43.48 sec, the table size is 26GB

Now for TokuDB:

  • TokuDB, default compression. Load time 33 min 1.18 sec, the table size on disk is 10GB
  • TokuDB, tokudb_small table format. Load time 37 min 2.34 sec, the table size is 4.6GB

So TokuDB is the obvious leader in both load time and compression. Of course just these are not enough, and now we need to see the performance of further INSERTs and SELECTs queries. This is what I am running right now and will post the results when I have them.

The post Considering TokuDB as an engine for timeseries data appeared first on MySQL Performance Blog.

Aug
28
2013
--

Testing Intel, Samsung & SanDisk SATA SSD

While working on the service architecture for one of our projects, I considered several SATA SSD options as the possible main storage for the data. The system will be quite write intensive, so the main interest is the write performance on capacities close to full-size storage.

After some research I picked several candidates (I show prices obviously actual for the date of publishing this post):

  • Samsung 840 Pro, 512GB, the current price $450
  • Samsung 840 Evo, 750GB, price $525. I consider these new SSD because of availability of models with big capacity, 750GB-1TB
  • Intel DC S3500, 480GB, price $650. This model is more expensive in $/GB, as Intel positions these devices for data center usage
  • SanDisk Extreme II 480 GB, price $450. This device caught my attention by good results in some third-party benchmarks

The devices are all attached to a LSI MegaRAID SAS 9260-4i raid controller with 512MB cache, and configured as individual RAID0 virtual devices.

Testing workload is: random writes, 16KiB block size sending in 8 threads as asynchronous IO on ext4 file system. Asynchronous IO is used in the most recent MySQL/InnoDB and theoretically it shows the best possible throughput.

For the Samsung 840 Evo I used 600GiB file size, and for the rest 350GiB, to emulate 70-80% filling

The results in a timeline form to see the stability of the performance:
timeline

The results in a jitter form with a median throughput:
jitter

My thoughts on these results:

  • I frankly expected more from Samsung devices, but they ended up below 50MiB/sec in random writes
  • SanDisk is definitely attractive for absolute throughput, but significant instability is a concern
  • Intel shows a decent performance, but also it is the most expensive

So I am going to run further tests on Intel and SanDisk to find out which one is more suitable for our needs.

For the reference the script I used for tests:

sz=350G
sysbench --test=fileio --file-total-size=$sz --file-num=64 prepare
sysbench --test=fileio --file-total-size=$sz --file-test-mode=rndwr --max-time=180000 --max-requests=0 --num-threads=8 --rand-init=on --file-num=64 --file-io-mode=async --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=10 run

The post Testing Intel, Samsung & SanDisk SATA SSD appeared first on MySQL Performance Blog.

Aug
27
2013
--

Talking Drupal #012 – Content Approval Workflow

Show Topics

  • Content authoring, access & approval
  • Authoring: permissions by content type, taxonomy, role
  • Content Access
  • What does Drupal do out of the box?
  • Taxonomy Access Control
  • Content Access
  • Node Access
  • Field Permissions 
  • Organic Groups
  • Tips
  • Approval: editorial state, draft mode, live mode

Module of the Week 

  • Masquerade – https://drupal.org/project/masquerade

Modules

  • Contact Access Control – http://drupal.org/project/content_access
  • Field Permissions – https://drupal.org/project/field_permissions
  • Taxonomy Access Control – https://drupal.org/project/taxonomy_access
  • Node Access – https://drupal.org/project/node_access
  • Fast Permission Administration- https://drupal.org/project/fpa
  • Organic Groups – https://drupal.org/project/OG
  • Revisioning – https://drupal.org/project/revisioning
  • View unpublished – https://drupal.org/project/view_unpublished
  • Node Access User Reference – https://drupal.org/project/nodeaccess_userreference
  • Devel Module – https://drupal.org/project/devel
  • Role Assign – https://drupal.org/project/roleassign
  • Custom Permissions – https://drupal.org/project/config_perms
  • Masquerade – https://drupal.org/project/masquerade

Hosts

  • Stephen Cross – www.ParallaxInfoTech.com @stephencross
  • Jason Pamental – www.hwdesignco.com @jpamental
  • John Picozzi – www.RubicDesign.com @johnpicozzi 
  • Nic Laflin – www.nLightened.net @nicxvan
Aug
27
2013
--

Talking Drupal #012 – Content Approval Workflow

Show Topics

  • Content authoring, access & approval
  • Authoring: permissions by content type, taxonomy, role
  • Content Access
  • What does Drupal do out of the box?
  • Taxonomy Access Control
  • Content Access
  • Node Access
  • Field Permissions 
  • Organic Groups
  • Tips
  • Approval: editorial state, draft mode, live mode

Module of the Week 

  • Masquerade – https://drupal.org/project/masquerade

Modules

  • Contact Access Control – http://drupal.org/project/content_access
  • Field Permissions – https://drupal.org/project/field_permissions
  • Taxonomy Access Control – https://drupal.org/project/taxonomy_access
  • Node Access – https://drupal.org/project/node_access
  • Fast Permission Administration- https://drupal.org/project/fpa
  • Organic Groups – https://drupal.org/project/OG
  • Revisioning – https://drupal.org/project/revisioning
  • View unpublished – https://drupal.org/project/view_unpublished
  • Node Access User Reference – https://drupal.org/project/nodeaccess_userreference
  • Devel Module – https://drupal.org/project/devel
  • Role Assign – https://drupal.org/project/roleassign
  • Custom Permissions – https://drupal.org/project/config_perms
  • Masquerade – https://drupal.org/project/masquerade

Hosts

  • Stephen Cross – www.ParallaxInfoTech.com @stephencross
  • Jason Pamental – www.hwdesignco.com @jpamental
  • John Picozzi – www.RubicDesign.com @johnpicozzi 
  • Nic Laflin – www.nLightened.net @nicxvan
Aug
27
2013
--

Percona Server 5.1.71-14.9 is now available!

Percona Server version 5.1.71-14.9

Percona Server version 5.1.71-14.9

Percona is glad to announce the release of Percona Server 5.1.71-14.9 on August 27th, 2013 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.1.71, including all the bug fixes in it, Percona Server 5.1.71-14.9 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.71-14.9 milestone at Launchpad.

Bugs Fixed:

  • The buffer pool mutex split patch implemented in Percona Server could cause a race condition, involving a dirty compressed page block for which there is an uncompressed page image in the buffer pool, that could lead to a server crash. Bug fixed #1086680.
  • The buffer pool mutex split patch implemented in Percona Server could cause server crash with an assertion error on read-write workloads with compressed tables in debug builds. Bug fixed #1103850.
  • If binary log was enabled, Fake Changes transactions were binlogged. This could lead to data corruption issues with deeper replication topologies. Bug fixed #1190580.
  • Changes made to the RPM scripts for previous Percona Server version caused installer to fail if there were different datadir options in multiple configuration files. Bug fixed #1201036.
  • Percona Server used to acquire the buffer pool LRU list mutex in the I/O completion routine, for the compressed page flush list flushes where it was not necessary. Bug fixed #1181269.

Other bug fixes: bug fixed #1188162 and bug fixed #1203308.

Release notes for Percona Server 5.1.71-14.9 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.1.71-14.9 is now available! appeared first on MySQL Performance Blog.

Aug
27
2013
--

Percona Server 5.5.33-31.1 is now available

Percona Server version 5.5.33-31.1

Percona Server version 5.5.33-31.1

Percona is glad to announce the release of Percona Server 5.5.33-31.1 on August 27th, 2013 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.33, including all the bug fixes in it, Percona Server 5.5.33-31.1 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.33-31.1 milestone at Launchpad.

Bugs Fixed:

  • The buffer pool mutex split patch implemented in Percona Server could cause a race condition, involving a dirty compressed page block for which there is an uncompressed page image in the buffer pool, that could lead to a server crash. Bug fixed #1086680.
  • Changes made to the RPM scripts for previous Percona Server version caused installer to fail if there were different datadir options in multiple configuration files. Bug fixed #1201036.
  • If binary log was enabled, Fake Changes transactions were binlogged. This could lead to data corruption issues with deeper replication topologies. Bug fixed #1190580.
  • Percona Server shared-compat package was being built with the 5.1.66 version of the client, which didn’t work with OpenSSL. Fixed by building the shared-compat package with a more recent version. Bug fixed #1201393.
  • Fixed the upstream bug #69639 which caused compile errors for Percona Server with DTrace version Sun D 1.11 provided by recent SmartOS versions.. Bug fixed #1196460.
  • Fixed the regression introduced in Percona Server 5.5.32-31.0, where server wouldn’t be able to start if Atomic write support for Fusion-io devices was enabled. Bug fixed #1214735.
  • Percona Server used to acquire the buffer pool LRU list mutex in the I/O completion routine for the compressed page flush list flushes where it was not necessary. Bug fixed #1181269.

Other bugs fixed: bug fixed #1189743, bug fixed #1188162 and bug fixed #1203308.

Release notes for Percona Server 5.5.33-31.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.33-31.1 is now available appeared first on MySQL Performance Blog.

Aug
26
2013
--

MySQL Security Webinar: Follow-up Q&A

MySQL Security Webinar: Follow-up Q&AThanks to everyone who attended last week’s webinar on MySQL security; hopefully you’ve all gone out and set SELinux to enforcing mode if you weren’t already running that way. If you weren’t able to attend, the recording and slides are available for viewing/download. But now, without further ado, here are the questions which we didn’t have time to cover during the presentation.

Q: Do you have a favorite software firewall you recommend that I can run on an EC2 instance in front of my MySQL server?
A: I’d probably just do this with iptables. Any of the other Linux-based software firewall packages are all going to be wrappers around iptables anyway. However, if your MySQL server is already in EC2, you’re going to be better served by Amazon security groups as your front-line firewall. You can also run inside a VPC for some additional isolation.

Q: What do you use for disk encryption in the cloud environment?
A: I might use GPG if I need to encrypt specific files, or just encrypt data in the database via the MySQL functions or in my application, and there’s also no reason why you can’t use dm-crypt/LUKS in the cloud if you’re willing to accept that mounting the device will require some manual intervention, but for the most part, I think disk encryption for servers is pretty useless, because once you enter the key to unlock and mount the volume, anyone that can get access to that machine has full access to the data. I’ve heard of Gazzang being a possible solution here, but I’ve not personally looked at it, so I can’t speak to its performance or suitability.

Q: How much overhead does SSL put on a heavily-loaded MySQL box?
A: That depends on a lot of factors, such as the cipher in use and the kinds of load you’re running. The slowest part of SSL is the connection setup, so you’re going to see a lot less overhead for replication, connection pooling, or long-running transactions than you would from an application with rapidly connects to the database and then disconnects. There’s a graph on the yaSSL site which shows roughly a 25% penalty at 32 threads, but their benchmarks are from 2011 and they were run on someone’s laptop, so I have my doubts as to how well that translates to modern server-grade hardware.

Q: Can MySQL do LDAP/Kerberos login?
A: You can use the PAM authentication plugin (either the commercial one from Oracle or Percona’s open source version) to authenticate against an LDAP database.

Thanks again for attending and submitting your questions; security is one of those massive topics where it’s only possible to scratch the surface in a one-hour webinar. Later this year I may do a follow-on presentation wherein we skip over all of the system/network/application details and do a deeper dive only on MySQL security-related tweaks and best/worst practices, so stay tuned!

The post MySQL Security Webinar: Follow-up Q&A appeared first on MySQL Performance Blog.

Aug
25
2013
--

Myth: Select * is bad

This is one of the most persistent myths I’ve seen in the field. It’s there for decades. If a myth is alive that long there must be some truth behind it. So, what could be bad about select *? Let’s have a closer look.

We all know that selecting “*” is just a short-hand for selecting all columns. Believe it or not, this makes a big difference to many people. So, lets first rephrase the question using this “finding”:

Why is it bad to select all columns?

In fact, there are a few very good reasons it is bad to select all columns if you don’t need them. And they all boil down to performance. What is surprising, however, is that the performance impact can be huge.

Up to 100x slower when preventing an Index-Only Scan

Broadly speaking, the less columns you ask for, the less data must be loaded from disk when processing your query. However, this relationship is non-linear.

Quite often, selecting from a table involves two steps: (1) use an index to find the address where the selected rows are stored; (2) load the selected rows from the table. Now imagine that you are just selecting columns that are present in the index. Why should the database still perform the second step? In fact, most databases don’t. They can process your query just with the information stored in the index—hence index-only scan.

But why should an index-only scan be 100 times faster? Simple: an ideal index stores the selected rows next to each other. It’s not uncommon that each index page holds about 100 rows—a ballpark figure; it depends on the size of the indexed columns. Nonetheless, it means that one IO operation might fetch 100 rows. The table data, on the other hand, is not organized like that (exceptions). Here it is quite common that a page just contains one of the selected rows—along with many other rows that are of no interest for the particular query. So, the reason an Index-Only Scan can be 100 times faster is that an index access can easily deliver 100 rows per IO while the table access typically just fetches a few rows per IO.

If you select a single column that’s not in the index, the database cannot do an index-only scan. If you select all columns, … , well I guess you know the answer.

Further, some databases store large objects in a separate place (e.g., LOBs in Oracle). Accessing those causes an extra IO too.

Up to 5x slower when bloating server memory footprint

Although databases avoid to store the result in the server’s main memory—instead the deliver each row after loading and forget about it again—it is sometimes inevitable. Sorting, for example, needs to keep all rows—and all selected columns—in memory to do the job. Once again, the more columns you select, the more memory the database needs. In the worst case, the database might even need to do an external sort on disk.

However, most database are extremely well tuned for this kind of workload. Although I’ve seen a sorting speed-up of factor two quite often—just by removing a few unused columns—I cannot remember having got more than factor five. However, it’s not just sorting, hash joins are rather sensitive to memory bloat too. Don’t know what that is? Please read this article.

These are just the two top issues from database perspective. Remember that the client needs to process the data too—which might put a considerable load on garbage collection.

Now that we have established a common understanding of why selecting everything is bad for performance, you may ask why it is listed as a myth? It’s because many people think the star is the bad thing. Further they believe they are not committing this crime because their ORM lists all columns by name anyway. In fact, the crime is to select all columns without thinking about it—and most ORMs readily commit this crime on behalf of their users.

The reason select * actually is bad—hence the reason the myth is very resistant—is because the star is just used as an allegory for “selecting everything without thinking about it”. This is the bad thing. But if you need a more catch phrase to remember the truth behind this myth, take this:

It’s not about the star, stupid!

If you like my way to explain things, you’ll love SQL Performance Explained.

Update 2013-11-03 – Is the star itself also bad?

Besides the performance issues mentioned above that are not caused by the star (asterisk) itself, the star itself might still cause other trouble. E.g. with software that expects the columns in a specific order when you add or drop a column. However, from my observation I’d say these issues are rather well understood in the field and usually easily identify (software stops working) fixed.

The focus of the article is on very subtle issues which are hardly understood, hard to find, and often even hard to fix (e.g. when using ORM tools). The main goal of this article is to stop people thinking about the star itself. Once people start to name the wanted columns explicitly to gain the performance benefit explained above, the issues caused by the star itself are also gone. Hence, I’ve felt no reason to add a discussion about these issues here—that’s just a distraction from the arguments that I wanted to explain with the article.

Original title and author: “Myth: Select * is bad” by Markus Winand.

Aug
20
2013
--

Amazon RDS with MySQL 5.6 – Configuration Variables

One longstanding complaint I have heard for the past several years, and still hear today, is that Amazon’s Relational Database Service (RDS) does not allow the configuration flexibility as running MySQL in an ec2 instance. While true, this ignores the consistent work that Amazon has done to provide access to the most important configuration variables needed to tune a MySQL instance (after all, how relevant is it for a customer to set bind_address in an RDS instance).

Let’s take a look visually:

Screen Shot 2013-08-18 at 11.39.50 AM

MySQL provides 523 options (35 of them NDB specific, so aren’t relevant to RDS), while RDS provides (via the web UI) 283, with 58 of those being immutable (things like basedir, datadir, and a variety of other variables).

So, what’s missing from the RDS configuration? The system variables can be roughly grouped into the following categories:

  • Audit Logs
  • Memcached Daemon
  • Binary Log Settings
  • Performance Schema
  • Relay Log Settings
  • Semi-Sync Replication
  • SSL
  • Thread Pool
  • Other

Let’s look at the relevance of these individually:

Audit Logs

The Audit log PlugIn is a commercial extension not available in the MySQL Community Edition offered by Amazon, so it’s not relevant.

Memcached Daemon

RDS is designed for relational database access, not key-value store access. If you need Memcached functionality, check out Amazon’s ElastiCache

Binary Log Settings

Binary logging is enabled by default on RDS, you just lose the ability to:

  • Use the old version of binary logging (pre-5.6.6)
  • Specify where the binlogs are saved or their base name
  • Control the maximum binary log size

The flexibility of controlling the maximum binary log size would be helpful in some workloads, but isn’t something that is generally tuned in the majority of engagements that I have been a part of.

Performance Schema

That these configuration parameters are not available via the Web UI is a bit of a misnomer. It is possible to enable/disable the Performance Schema and then control the collection via SQL as usual.

Relay Log Settings

Like the Binary Log settings, there is not much that we would want to tune here. The standard settings are appropriate for general workloads.

Semi-Sync Replication

Amazon RDS has a proprietary failover solution and block level replication across availability zones. It is not surprising that this functionality is not provided by default in the Web UI, but certainly something that could be useful for a small cross section of workloads.

SSL

For companies with strict security needs, the lack of SSL may be a deal breaker for using RDS. But, depending upon the security policies in place, can be worked around by using Amazon’s VPC with SSL. For many companies, though, this may not play a role in the decision process. I find it hard to believe that, with Amazon’s resources, providing this is an insurmountable technical challenge. Perhaps we’ll see this becoming available in future RDS releases.

Thread Pool

The Thread Pool PlugIn is a commercial extension not available in the Community Edition of MySQL, so is not relevant to what RDS provides. There are, however, solutions in both Percona Server and MariaDB that Amazon may choose to port in the future.

Conclusion

Amazon still has a ways to go to be fully compatible with configuration variables, but by and large the important ones are available to customers, with minor exception (I’m looking at you, innodb_log_file_size).

I’ll be talking about this topic in more detail, as well as a variety of other RDS 5.6-specific issues, in my upcoming Webinar on August 28 titled “Running MySQL 5.6 on Amazon RDS.”

The post Amazon RDS with MySQL 5.6 – Configuration Variables appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com