May
08
2018
--

Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution

PMM on Linode small

In this blog, I will show you how to install PMM on Linode as a low-cost database monitoring solution.

Many of my friends use Linode to run their personal sites, as well as small projects. While Linode is no match for Big Cloud providers in features, it is really wonderful when it comes to cost and simplicity: a Linode “nanode” instance offers 1GB of memory, 1 core, 20GB of storage and 1TB of traffic for just $5 a month.

A single Linode instance is powerful enough to use with Percona Monitoring and Management (PMM) to monitor several systems, so I use Linode a lot when I want to demonstrate PMM deployment through Docker, rather than Amazon Marketplace.

Here are step-by-step instructions to get you started with Percona Monitoring and Management (PMM) on Linode in five minutes (or less):

Step 1:  Pick the Linode Type, Location and launch it.

PMM on Linode

Step 2: Name your Linode

This step is optional and is not PMM-related, but you may want to give your Linode an easy-to-remember name instead of something like “linode7796908”. Click on Linode Name and then on “Settings” and enter a name in “Linode Label”.

PMM on Linode 2

Step 3:  Deploy the Image

Click on Linode Name and then on “Deploy an Image”.

PMM on Linode 3

I suggest choosing the latest Ubuntu LTS version and allocating 512MB for the swap file, especially on a Linode with a small amount of memory. Remember to set a strong root password, as Linode allows root password login by default from any IP.

Step 4: Boot Linode

Now prepare the image you need to boot your Linode. Click on the Boot button for that:

PMM on Linode 4

Step 5: Login to the system and install Docker

Use your favorite SSH client to login to the Linode you created using “root” user and password you set at Step 3, and install Docker:

apt install docker.io

Step 6: Run PMM Server

Here are detailed instructions to install the PMM Server on Docker. Below are the commands to do basic installation:

docker pull percona/pmm-server:latest
docker create
  -v /opt/prometheus/data
  -v /opt/consul-data
  -v /var/lib/mysql
  -v /var/lib/grafana
  --name pmm-data
  percona/pmm-server:latest /bin/true
docker run -d
  -p 80:80
  --volumes-from pmm-data
  --name pmm-server
  --restart always
  percona/pmm-server:latest

Note: This deploys PMM Server without authentication. For anything but test usage, you should set a password by following instructions on this page.

You’re done!

You’ve now installed PMM Server and you can see it monitoring itself by going to the server IP with a browser.

PMM on Linode 5

Now you can go ahead and install the PMM Client on the nodes you want to monitor!

The post Deploying PMM on Linode: Your $5-Per-Month Monitoring Solution appeared first on Percona Database Performance Blog.

May
04
2018
--

Percona Live 2018 Community Report

So, after a whirlwind few days, Percona Live 2018 has been and gone. There was a great energy about the conference, and it was fantastic to meet so many open source database enthusiasts and supporters. A few things that I experienced:

  • Your great willingness to share knowledge. It was a fantastic place to learn for those who have experience from a different field of technology. Almost everyone seemed to be very open and generous with their time.
  • The “superstars” from our industry are not so scary. They are as willing to be open and generous with their experience and views as any of the other attendees, and equally as interested in making new discoveries.
  • There aren’t many times you can sit down to a (community) dinner, to share food and anecdotes with people from USA, UK, Germany and Armenia at the same time. I thoroughly enjoyed the company, and wish there were more opportunities for similar encounters. Thanks to Pythian for setting that up.
  • My Percona colleagues are wonderful, committed human beings with more than a passing interest in music – the Percona Sessions have got to happen…
  • That you can run a very long way in a day between the Santa Clara Convention Center and the Hyatt Regency Hotel.

I had very many positive conversations with delegates. You offered any criticisms along with a suggestion of how we should tweak things for the better. Our community is a creative, generous, problem-solving machine, though I shouldn’t be surprised at that.

So, with only a few more duties to complete, I’d like to thank you for your company. For those that did not make it to this year’s event, I hope that you might be persuaded to join us in the future — either at Percona Live Europe 2018 or at Percona Live 2019.

Packt Prizes

Our media sponsor, Packt, generously provided us with three free ebooks and two free instruction videos as prizes for delegates:

  1. Mastering MongoDB 3.x
  2. MySQL 8 Cookbook
  3. MongoDB Administrator’s Guide
  4. Elastic Databases and Data Processing with AWS [Video]
  5. AWS Administration – Database, Networking, and Beyond [Video]

There are another 10 titles for which we can offer delegates a 50% discount: you should have received your emails. Thanks are due again to Packt.

Community Blog

While I have your attention, I’d like to let you know about the forthcoming Percona community blog. Having been some time in the planning, this is starting really soon, and is like a year-round, online, Percona Live. We already have some keen writers for this, but if you would be interested in creating content (whether written, podcast or webcast) for the community blog, then please get in touch. The brief is very wide — as long as your submission is relevant to the open source database community then it would be welcome.

Finally, I would like to invite feedback on how to make the event shine even brighter — please drop me an email if you have suggestions or ideas. Meanwhile, I hope you enjoy these photographs of the MySQL Community Awards Winners, presented at PL18. You can read more about this community initiative.

Perhaps you’ll be able to join us in Frankfurt in November? Time to start thinking about those submissions for the call for papers!

Or perhaps next year at Percona Live Open Source Database Conference in 2019 – wherever it may be!







Photographs: Randy Tunnell Photography

The post Percona Live 2018 Community Report appeared first on Percona Database Performance Blog.

Apr
27
2018
--

MySQL 8.0 GA: Quality or Not?

MySQL 8.0 GA

MySQL 8.0 GAWhat does Anton Ego – a fictional restaurant critic from the Pixar movie Ratatouille – have to do with MySQL 8.0 GA?

When it comes to being a software critic, a lot.

In many ways, the work of a software critic is easy. We risk very little and thrive on negative criticism, which is fun to read and write.

But what about those who give their many hours of code development, and those who have tested such code before release? How about the many people behind the scenes who brought together packaging, documentation, multiple hours of design, marketing, online resources and more?

And all of that, I might add, is open source! Free for the world to take, copy, adapt and even incorporate in full or in part into their own open development.

It is in exactly that area that the team at MySQL shines once again – they have from their humble beginnings build up a colossally powerful database software that handles much of the world’s data, fast.

Used in every area of life – aerospace, defense, education, finances, government, healthcare, pharma, manufacturing, media, retail, telecoms, hospitality, and finally the web – it truly is a community effort.

My little contribution to this effort is first and foremost to say: well done! Well done for such an all-in-all huge endeavor. When I tested MySQL 8.0, I experienced something new: an extraordinarily clean bug report screen when I unleashed our bug hunting rats, ahem, I mean tools. This was somewhat unexpected. Usually, new releases are a fun playground even for seasoned QA engineers who look for the latest toy to break.

I have a suspicion that the team at Oracle either uses newly-improved bug-finding tools or perhaps they included some of our methods and tools in their setup. In either case, it is, was and will be welcome.

When the unexpected occurs, a fight or flight syndrome happens. I tend to be a fighter, so I upped the battle and managed to find about 30 bugs, with 21 bugs logged already. Quite a few of them are Sig 11’s in release builds. Signal 11 exceptions are unexpected crashes, and release builds are the exact same build you would download at dev.mysql.com.

The debug build also had a number of issues, but less than expected, leading me to the conclusions drawn above. Since Oracle engineers marked many of the issues logged as security bugs, I didn’t list them here. I’ll give Oracle some time to fix them, but I might add them later.

In summary, my personal recommendation is this: unless you are a funky new web company thriving on the latest technology, give Oracle the opportunity to make a few small point bugfix releases before adapting MySQL 8.0 GA. After that, providing upgrade prerequisites are matched, and that your software application is compatible, go for it and upgrade.

Before that, this is a great time to start checking out the latest and greatest that MySQL 8.0 GA has to offer!

All in all, I like what I saw, and I expect MySQL 8.0 GA to have a bright future.

Signed, a seasoned software critic.

The post MySQL 8.0 GA: Quality or Not? appeared first on Percona Database Performance Blog.

Apr
20
2018
--

The Final Countdown: Are You Ready for Percona Live 2018?

Are you ready for Percona Live

Are you ready for Percona Live 2018It’s hard to believe Percona Live 2018 starts on Monday! We’re looking forward to seeing everyone in Santa Clara next week! Here are some quick highlights to remember:

  • In addition to all the amazing sessions and keynotes we’ve announced, we’ll be hosting the MySQL Community Awards and the Lightning Talks on Monday during the Opening Reception.
  • We’ve also got a great lineup of demos in the exhibit hall all day Tuesday and Wednesday – be sure to stop by and learn more about open source database products and tools.
  • On Monday, we have a special China Track now available from Alibaba Cloud, PingCAP and Shannon Systems. We’ve just put a $20.00 ticket on sale for that track, and if you have already purchased any of our other tickets, you are also welcome to attend those four sessions.
  • Don’t forget to make your reservation at the Community Dinner. It’s a great opportunity to socialize with everyone and Pythian is always a wonderful host!

Thanks to everyone who is sponsoring, presenting and attending! The community is who makes this event successful and so much fun to be a part of!

The post The Final Countdown: Are You Ready for Percona Live 2018? appeared first on Percona Database Performance Blog.

Mar
16
2018
--

This Week in Data with Colin Charles 32: Stack Overflow Developer Survey, SCALE16x and Interesting MySQL 8 Version Numbers

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

SCALE16x is over. Percona had a large showing — both Peter Zaitsev & myself had talks, and the booth in the expo hall saw Rick Golba, Marc Sherwood, and Dave Avery also pop by. The lead scanner suggests we had several hundred conversations ? — read Dave Avery’s summary. My talk went well, though during Q&A the number of questions I got about MariaDB Server was quite interesting (considering it wasn’t part of my talk!). It is clear people are concerned about compatibility (because I spent close to 45 minutes after my talk answering Q&A outside too).

I got to catch up with Dave Stokes and asked him why there were version numbers being skipped in MySQL 8 (as noted in last week’s column). Now there’s a blog post explaining it: MySQL 8.0: It Goes to 11!. It has to do with version number alignment across the product line.

This week we saw something cool come out of Stack Overflow: their Developer Survey Results 2018. There were over 100,000 developers participating in this survey, a marked increase from 2017 when they only had 64,000.

About 66,264 respondents answered the question about what databases they use. MySQL is by far the most popular with 58.7% of the respondents saying they use it. This is followed by PostgreSQL getting 32.9%, MongoDB getting 25.9%, and MariaDB 13.4%. I’m surprised that Amazon RDS/Aurora got 5.1%. In 2017, the first year they introduced the database component, only 29,452 respondents participated, with 55.6% using MySQL, 26.5% using PostgreSQL, and 21% using MongoDB (MariaDB was not broken out last year).

When it came to the most “loved/dread/wanted” databases, apparently 62% of respondents loved PostgreSQL, with 58.8% loving Amazon RDS/Aurora, 55.1% MongoDB, 53.3% MariaDB Server, and 48.7% only loving MySQL. In terms of dread, 51.3% dread MySQL, while only 46.7% dread MariaDB; MongoDB has 44.9% dreading it, and PostgreSQL only 38%. As for the most wanted databases? 18.6% for MongoDB, 11.4% for PostgreSQL, 7.5% for MySQL, and 3.4% for MariaDB Server. It’s clear MongoDB topping the list ensures they have a lot to celebrate, as evidenced by this: Stack Overflow Research of 100,000 Developers Finds MongoDB is the Most Wanted Database. (In 2017, 60.8% loved PostgreSQL, 55% MongoDB, and 49.6% for MySQL; MySQL was the 3rd most dreaded database with 50.4%, followed by 45% for MongoDB, and 39.2% for PostgreSQL; as for the most wanted, MongoDB won with 20.8%, PostgreSQL got second at 11.5%, and MySQL 8.5%).

So if Stack Overflow surveys are an indication of usage, MySQL is still way more popular than anything else, including MariaDB Server regardless of its current distribution. Speaking of MariaDB, the MariaDB Foundation now accepts donations in cryptocurrencies.

MongoDB Evolved is something you should totally check out. I wish something like this exists for MySQL, since tonnes of people ask questions, e.g. “Does MySQL support transactions?”, etc.

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

Feb
16
2018
--

This Week in Data with Colin Charles 28: Percona Live, MongoDB Transactions and Spectre/Meltdown Rumble On

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

In case you missed last week’s column, don’t forget to read the fairly lengthy FOSDEM MySQL & Friends DevRoom summary.

From a Percona Live Santa Clara 2018 standpoint, beyond the tutorials getting picked and scheduled, the talks have also been picked and scheduled (so you were very likely getting acceptance emails from the Hubb.me system by Tuesday). The rejections have not gone out yet but will follow soon. I expect the schedule to go live either today (end of week) or early next week. Cheapest tickets end March 4, so don’t wait to register!

Amazon Relational Database Service has had a lot of improvements in 2017, and the excellent summary from Jeff Barr is worth a read: Amazon Relational Database Service – Looking Back at 2017. Plenty of improvements for the MySQL, MariaDB Server, PostgreSQL and Aurora worlds.

Spectre/Meltdown and its impact are still being discovered. You need to read Brendan Gregg’s amazing post: KPTI/KAISER Meltdown Initial Performance Regressions. And if you visit Percona Live, you’ll see an amazing keynote from him too! Are you still using MyISAM? MyISAM and KPTI – Performance Implications From The Meltdown Fix suggests switching to Aria or InnoDB.

Probably the biggest news this week though? Transactions are coming to MongoDB 4.0. From the site, “MongoDB 4.0 will add support for multi-document transactions, making it the only database to combine the speed, flexibility, and power of the document model with ACID guarantees. Through snapshot isolation, transactions will provide a globally consistent view of data, and enforce all-or-nothing execution to maintain data integrity.”. You want to read the blog post, MongoDB Drops ACID (the title works if you’re an English native speaker, but maybe not quite if you aren’t). The summary diagram was a highlight for me because you can see the building blocks, plus future plans for MongoDB 4.2.

Releases

Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Feb
12
2018
--

Webinar Thursday, February 15, 2018: Basic Internal Troubleshooting Tools for MySQL Server

Troubleshooting Tools for MySQL

Troubleshooting Tools for MySQLPlease join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents “Basic Internal Troubleshooting Tools for MySQL Server” on Thursday, February 15, 2018, at 10:00 am PST (UTC-8) / 1:00 pm EST (UTC-5).

 

MySQL Server has many built-in troubleshooting tools. They are always available and can provide many insights on what is happening internally. Many graphical tools, such as Percona Monitoring and Management (PMM), use built-ins to get data for their nice graphs.

Even if you are only going to use graphical tools, it is always good to know what data they can collect. This way, you can see their limitations and won’t have incorrect expectations in the heat of battle. Built-in troubleshooting tools are accessible via SQL commands. Most of them are standard across the server, but details are component-specific.

In this webinar, I will discuss how to use them, how to troubleshoot component-specific issues and how to find additional information. I will cover SHOW commands, Information Schema, status variables and few component-specific syntaxes. I will NOT cover Performance Schema (there will be a separate webinar on that), and I will use PMM graphs to illustrate the topics whenever possible.

Register for the Basic Internal Troubleshooting Tools for MySQL Server webinar now.

Internal Troubleshooting for MySQLSveta Smirnova, Principal Technical Services Engineer

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can solve typical issues quicker. She likes teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

Feb
06
2018
--

Announcing Experimental Percona Monitoring and Management (PMM) Functionality via Percona Labs

Experimental Percona Monitoring and Management

Experimental Percona Monitoring and ManagementIn this blog post, we’ll introduce how you can look at some experimental Percona Monitoring and Management (PMM) features using Percona Labs builds on GitHub.

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. While not covered by Percona support or services agreements, these handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements. 

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

This month we’re announcing access to Percona Labs builds of Percona Monitoring and Management so that you can experiment with new functionality that’s not yet in our mainline product. You can identify the unique builds at:

https://hub.docker.com/r/perconalab/pmm-server/tags/

Most of the entries here are the pre-release candidate images we use for QA, and they follow a format of all integers (for example “201802061627”). You’re fine to use these images, but they aren’t the ones that have the experimental functionality.

Today we have two builds of note (these DO have the experimental functionality):

  • 1.6.0-prom2.1
  • 1.5.3-prometheus2

We’re highlighting Prometheus 2.1 on top of our January 1.6 release (1.6.0-prom2.1), available in Docker format. Some of the reasons you might want to deploy this experimental build to take advantage of the Prometheus 2 benefits are:

  • Reduced CPU usage by Prometheus, meaning you can add more hosts to your PMM Server
  • Performance improvements, meaning dashboards load faster
  • Reduced disk I/O, disk space usage

Please keep in mind that as this is a Percona Labs build (see our note above), so in addition note the following two criteria:

  • Support is available from our Percona Monitoring and Management Forums
  • Upgrades might not work – don’t count on upgrading out of this version to a newer release (although it’s not guaranteed to block upgrades)

How to Deploy an Experimental Build from Percona Labs

The great news is that you can follow our Deployment Instructions for Docker, and the only change is where you specify a different Docker container to pull. For example, the standard way to deploy the latest stable PMM Server release with Docker is:

docker pull percona/pmm-server:latest

To use the Percona Labs build 1.6.0-prom2.1 with Prometheus 2.1, execute the following:

docker pull perconalab/pmm-server:1.6.0-prom2.1

Please share your feedback on this build on our Percona Monitoring and Management Forums.

If you’re looking to deploy Percona’s officially released PMM Server (not the Percona Labs release, but our mainline version which currently is release 1.7) into a production environment, I encourage you to consider a Percona Support contract, which includes PMM at no additional charge!

Feb
05
2018
--

Four Ways MySQL Executes GROUP BY

MySQL GROUP BY

MySQL GROUP BYIn this blog post, I’ll look into four ways MySQL executes GROUP BY. 

In my previous blog post, we learned that indexes or other means of finding data might not be the most expensive part of query execution. For example, MySQL GROUP BY could potentially be responsible for 90% or more of the query execution time. 

The main complexity when MySQL executes GROUP BY is computing aggregate functions in a GROUP BY statement. How this works is shown in the documentation for UDF Aggregate Functions. As we see, the requirement is that UDF functions get all values that constitute the single group one after another. That way, it can compute the aggregate function value for the single group before moving to another group.

The problem, of course, is that in most cases the source data values aren’t grouped. Values coming from a variety of groups follow one another during processing. As such, we need a special step to handle MySQL GROUP BY.

Let’s look at the same table we looked at before:

mysql> show create table tbl G
*************************** 1. row ***************************
      Table: tbl
Create Table: CREATE TABLE `tbl` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `k` int(11) NOT NULL DEFAULT '0',
 `g` int(10) unsigned NOT NULL,
 PRIMARY KEY (`id`),
 KEY `k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

And the same GROUP BY statements executed in different ways:

1: Index Ordered GROUP BY in MySQL

mysql> select k, count(*) c from tbl group by k order by k limit 5;
+---+---+
| k | c |
+---+---+
| 2 | 3 |
| 4 | 1 |
| 5 | 2 |
| 8 | 1 |
| 9 | 1 |
+---+---+
5 rows in set (0.00 sec)
mysql> explain select k, count(*) c from tbl group by k order by k limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: index
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 5
    filtered: 100.00
       Extra: Using index
1 row in set, 1 warning (0.00 sec)

In this case, we have an index on the column we use for GROUP BY. This way, we can just scan data group by group and perform GROUP BY on the fly (inexpensively).

It works especially well when we use LIMIT to restrict the number of groups we retrieve or when a “covering index” is in use, as a sequential index-only scan is a very fast operation.

If you have a small number of groups though, and no covering index, index order scans can cause a lot of IO. So this might not be the most optimal plan.

2: External Sort GROUP BY in MySQL

mysql> explain select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 100.00
       Extra: Using filesort
1 row in set, 1 warning (0.00 sec)
mysql> select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5;
+---+---+
| g | c |
+---+---+
| 0 | 1 |
| 1 | 2 |
| 4 | 1 |
| 5 | 1 |
| 6 | 2 |
+---+---+
5 rows in set (0.88 sec)

If we do not have an index that allows us to scan the data in group order, we can instead get data sorted through an external sort (also referred to as “filesort” in MySQL).

You may notice I’m using an SQL_BIG_RESULT hint here to get this plan. Without it, MySQL won’t choose this plan in this case.

In general, MySQL prefers to use this plan only if we have a large number of groups, because in this case sorting is more efficient than having a temporary table (which we will talk about next).

3: Temporary Table GROUP BY in MySQL

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 100.00
       Extra: Using temporary
1 row in set, 1 warning (0.00 sec)
mysql> select  g, sum(g) s from tbl group by g order by null limit 5;
+---+------+
| g | s    |
+---+------+
| 0 |    0 |
| 1 |    2 |
| 4 |    4 |
| 5 |    5 |
| 6 |   12 |
+---+------+
5 rows in set (7.75 sec)

In this case, MySQL also does a full table scan. But instead of running additional sort passes, it creates a temporary table instead. This temporary table contains one row per group, and with each incoming row the value for the corresponding group is updated. Lots of updates! While this might be reasonable in-memory, it becomes very expensive if the resulting table is so large that updates are going to cause a lot of disk IO. In this case, external sort plans are usually better.

Note that while MySQL selects this plan by default for this use case, if we do not supply any hints it is almost 10x slower than the plan we get using the SQL_BIG_RESULT hint.

You may notice I added “ORDER BY NULL” to this query. This is to show you “clean” the temporary table only plan. Without it, we get this plan:

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 100.00
       Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)

In it, we get the “worst of both worlds” with Using Temporary Table and filesort.  

MySQL 5.7 always returns GROUP BY results sorted in group order, even if this the query doesn’t require it (which can then require an expensive additional sort pass). ORDER BY NULL signals the application doesn’t need this.

You should note that in some cases – such as JOIN queries with aggregate functions accessing columns from different tables – using temporary tables for GROUP BY might be the only option.

If you want to force MySQL to use a plan that does temporary tables for GROUP BY, you can use the SQL_SMALL_RESULT  hint.

4:  Index Skip-Scan-Based GROUP BY in MySQL

The previous three GROUP BY execution methods apply to all aggregate functions. Some of them, however, have a fourth method.

mysql> explain select k,max(id) from tbl group by k G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 2
    filtered: 100.00
       Extra: Using index for group-by
1 row in set, 1 warning (0.00 sec)
mysql> select k,max(id) from tbl group by k;
+---+---------+
| k | max(id) |
+---+---------+
| 0 | 2340920 |
| 1 | 2340916 |
| 2 | 2340932 |
| 3 | 2340928 |
| 4 | 2340924 |
+---+---------+
5 rows in set (0.00 sec)

This method applies only to very special aggregate functions: MIN() and MAX(). These do not really need to go through all the rows in the group to compute the value at all.

They can just jump to the minimum or maximum group value in the group directly (if there is such an index).

How can you find MAX(ID) value for each group if the index is only built on (K) column? This is an InnoDB table. Remember InnoDB tables effectively append the PRIMARY KEY to all indexes. (K) becomes (K,ID), allowing us to use Skip-Scan optimization for this query.

This optimization is only enabled if there is a large number of rows per group. Otherwise, MySQL prefers more conventional means to execute this query (like Index Ordered GROUP BY detailed in approach #1).

While we’re on MIN()/MAX() aggregate functions, other optimizations apply to them as well. For example, if you have an aggregate function with no GROUP BY (effectively  having one group for all tables), MySQL fetches those values from indexes during a statistics analyzes phase and avoids reading tables during the execution stage altogether:

mysql> explain select max(k) from tbl G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: NULL
  partitions: NULL
        type: NULL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: NULL
    filtered: NULL
       Extra: Select tables optimized away
1 row in set, 1 warning (0.00 sec)

Filtering and Group By

We have looked at four ways MySQL executes GROUP BY.  For simplicity, I used GROUP BY on the whole table, with no filtering applied. The same concepts apply when you have a WHERE clause:

mysql> explain select  g, sum(g) s from tbl where k>4 group by g order by NULL limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 1
    filtered: 100.00
       Extra: Using index condition; Using temporary
1 row in set, 1 warning (0.00 sec)

For this case, we use the range on the K column for data filtering/lookup and do a GROUP BY when there is a temporary table.

In some cases, the methods do not conflict. In others, however, we have to choose either to use one index for GROUP BY or another index for filtering:

mysql> alter table tbl add key(g);
Query OK, 0 rows affected (4.17 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select  g, sum(g) s from tbl where k>1 group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: index
possible_keys: k,g
         key: g
     key_len: 4
         ref: NULL
        rows: 16
    filtered: 50.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> explain select  g, sum(g) s from tbl where k>4 group by g limit 5 G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k,g
         key: k
     key_len: 4
         ref: NULL
        rows: 1
    filtered: 100.00
       Extra: Using index condition; Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)

Depending on specific constants used in this query, we can see that we either use an index ordered scan for GROUP BY (and  “give up”  benefiting from the index to resolve the WHERE clause), or use an index to resolve the WHERE clause (but use a temporary table to resolve GROUP BY).

In my experience, this is where MySQL GROUP BY does not always make the right choice. You might need to use FORCE INDEX to execute queries the way you want them to.

Summary

I hope this article provides a good overview of how MySQL executes GROUP BY.  In my next blog post, we will look into techniques you can use to optimize GROUP BY queries.

Jan
22
2018
--

Webinar Wednesday, January 24, 2018: Differences between MariaDB and MySQL

MariaDB and MySQL

MariaDB and MySQLJoin Percona’s Chief Evangelist, Colin Charles as he presents Differences Between MariaDB and MySQL on Wednesday, January 24, 2018, at 7:00 am PDT / 10:00 am EDT (UTC-7).

Tags: MariaDB, MySQL, Percona Server for MySQL, DBA, SysAdmin, DevOps
Experience Level: Novice

MariaDB and MySQL. Are they syntactically similar? Where do these two query languages differ? Why would I use one over the other?

MariaDB is on the path of gradually diverging from MySQL. One obvious example is the internal data dictionary currently under development for MySQL 8.0. This is a major change to the way metadata is stored and used within the server. MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

There are also non-technical differences between MySQL and MariaDB, including:

  • Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL because they derive their work from the MySQL source code under the terms of that license.
  • Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people will prefer working with smaller companies, as traditionally it affords them more leverage as a customer.
  • Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB and MySQL and help answer some of the common questions our Database Performance Experts get about the two databases.

Register for the webinar now.

Colin CharlesColin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team for MariaDB Server in 2009, worked in MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, Colin worked actively on the Fedora and OpenOffice.org projects. He’s well-known within many open source communities and speaks on the conference circuit.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com