Modern SQL in PostgreSQL [and other databases]

“SQL has gone out of fashion lately—partly due to the NoSQL movement, but mostly because SQL is often still used like 20 years ago. As a matter of fact, the SQL standard continued to evolve during the past decades resulting in the current release of 2011. In this session, we will go through the most important additions since the widely known SQL-92, explain how they work and how PostgreSQL supports and extends them. We will cover common table expressions and window functions in detail and have a very short look at the temporal features of SQL:2011 and the related features of PostgreSQL.”

This is the abstract for the talk I’ve given at FOSDEM in Brussels on Saturday. The PostgreSQL community was so kind to host this talk in their (way too small) devroom—thus the references to PostgreSQL. However, the talk is build upon standard SQL and covers features that are commonly available in DB2, Oracle, SQL Server and SQLite. MySQL does not yet support any of those features except OFFSET, which is evil.

One last thing before going on to the slides: Use The Index, Luke has a shop. Stickers, coasters, books, mugs. Have a look.

Find the slides on SlideShare.

Original title and author: “Modern SQL in PostgreSQL [and other databases]” by Markus Winand.


Finding All the Red M&Ms: A Story of Indexes and Full‑Table Scans

In this guest post, Chris Saxon explains a very important topic using an analogy with chocolates: When does a database use an index and when is it better not using it. Although Chris explanation has the Oracle database in mind, the principles apply to other databases too.

A common question that comes up when people start tuning queries is “why doesn’t this query use the index I expect?”. There are a few myths surrounding when database optimizers will use an index. A common one I’ve heard is that an index will be used when accessing 5% or less of the rows in a table. This isn’t the case however – the basic decision on whether or not to use an index comes down to its cost.

How do databases determine the cost of an index?

Before getting into the details, let’s talk about chocolate! Imagine you have 100 packets of M&M’s. You also have a document listing the colour of each M&M and the bag it’s stored in. This is ordered by colour, so we have all the blue sweets first, then the brown, green and so on like so:

You’ve been asked to find all the red M&M’s. There’s a couple of basic ways you could approach this task:

Method 1

Get your document listing the colour and location of each M&M. Go to the top of the “red” section. Lookup the location of the first red M&M, pick up the bag it states, and get the sweet. Go back to your document and repeat the process for the next red chocolate. Keep going back-and-forth between your document and the bags until you’ve reached the end of the red section.

Method 2

Pick up a number of bags at a time (e.g. 10), empty their contents out, pick out the red chocolates and return the others (back to their original bag).

Which approach is quicker?

Intuitively the second approach appears to be faster. You only select each bag once and then do some filtering of the items inside. Whereas with the first approach you have done a lot of back-and-forth between the document and the bags. This means you have to look into each bag multiple times.

We can be a bit more rigorous than this though. Let’s calculate how many operations we need to do to get all the red chocolates in each case.

When going between the document and the bags (method 1), each time you lookup the location of a new sweet and fetch that bag that’s a new operation. You have 100 bags with around 55 sweets in each. This means you’re doing roughly 920 (100 bags x 55 sweets / 6 colours) operations (plus some work to find the red section in your document). So the “cost” of using the document is around 920.

With the second approach you collect 10 bags in one step. This means you do ( 100 bags / 10 bags per operation = ) 10 operations (plus some filtering of the chocolates in them), giving a “cost” of 10.

Comparing these costs (920 vs. 10), method 2 is the clear winner.

Let’s imagine another scenario. Mars have started doing a promotion where around 1 in 100 bags contain a silver M&M. If you get the silver sweet, you win a prize. You want to find the silver chocolate!

In this case, using method 1, you go to the document to find the location of the single sweet. Then you go to that bag and retrieve the sweet. One operation (well two, including going to the document to find location of the silver chocolate), so we have a cost of two.

With method 2, you still need to pick up every single bag (and do some filtering) just to find one sweet – the cost is fixed at 10. Clearly method 1 is far superior in this case.

What have M&M’s got to do with databases?

When Oracle stores a record to the database, it is placed in a block. Just like there are many M&Ms in a bag, (normally) there are many rows in a block. When accessing a particular row, Oracle fetches the whole block and retrieves the requested row from within it. This is analogous to us picking up a bag of M&Ms and then picking a single chocolate out.

When doing an index-range scan, Oracle will search the index (the document) to find the first value matching your where clause. It then goes back-and-forth between the index and the table blocks, fetching the records from the location pointed to by the index. This is similar to method 1, where you continually switch between the document and the M&M bags.

As the number of rows accessed by an index increases, the database has to do more work. Therefore the cost of using an index increases in line with the number of records it is expected to fetch.

When doing a full table scan (FTS), Oracle will fetch several blocks at once in a multi-block read. The data fetched is then filtered so that only rows matching your where clause are returned (in this case the red M&Ms) – the rest are discarded. Just like in method 2.

The expected number of rows returned has little impact on the work a FTS does. Its basic cost is fixed by the size of the table and how many blocks you fetch at once.

When fetching a “high” percentage of the rows from a table, it becomes far more efficient to get several blocks at once and do some filtering than it is to visit a single block multiple times.

When does an index scan become more efficient than a FTS?

In our M&M example above, the “full-table scan” method fetches all 100 bags in 10 operations. Whereas with “index” approach requires a separate operation for each sweet. So an index is more efficient when it points to 10 M&Ms or less.

Mars puts around 55 M&M’s in each bag, so as a percentage of the “table” that’s just under ( 10 M&M’s / (100 bags * 55 sweets) * 100 = ) 0.2%!

What if Mars releases some “giant” M&Ms with only 10 sweets in a bag? In this case there’s fewer sweets in total, so the denominator in the equation above decreases. Our FTS approach is still fixed at a “cost” of 10 for the 100 bags. This means the point at which an index is better is when accessing approximately ( 10/1000*100 = ) 1% of the “rows”. A higher percentage, but still small in real terms.

If they released “mini” M&Ms with 200 in a bag, the denominator would increase. This means that the index is more efficient when accessing a very small percentage of the table!

So as you increase the space required to store a row, an index becomes more effective than a FTS. The number of rows accessed by the index remains fixed. The number of blocks required to store the data increases however, making the FTS more expensive and leading to it having a higher cost.

There’s a big assumption made in the above reasoning however. It’s that there’s no correlation between the order M&M’s are listed in the document and which bag they are in. So, for example, the first red M&M (in the document) may be in bag 1, the second in bag 56, the third in bag 20, etc.

Let’s make a different assumption – that the order of red chocolates in the document corresponds to the order they appear in the bags. So the first 9 red sweets are in bag 1, the next 9 in bag 2 etc. While you still have to visit all 100 bags, you can keep the last bag accessed in your hand, only switching bags every 9 or so sweets. This reduces the number of operations you do, making the index approach more efficient.

We can take this further still. What if Mars changes the bagging process so that only one colour appears in each bag?

Now, instead of having to visit every single bag to get all the red sweets, you only have to visit around ( 100 bags / 6 colours) 16 bags. If the sweets are also placed in the bags in the same order they are listed in the document (so M&M’s 1-55 are all blue and in bag 1, bag 2 has the blue M&M’s 56-100, and so on until bag 100, which has yellow M&M’s 496-550) you get the benefits of not switching bags compounded with the effect of having fewer bags to fetch.

This principle – how closely the order of records in a table matches the order they’re listed in a corresponding index – is referred to as the clustering factor. This figure is lower when the rows appear in the same physical order in the table as they do in the index (all sweets in a bag are the same colour) and higher when there’s little or no correlation.

The closer the clustering factor is to the number of blocks in a table the more likely it is that the index will be used (it is assigned a lower cost). The closer it is to the number of rows in a table, the more likely it is a FTS will be chosen (the index access is given a higher cost).

Bringing it all together

To sum up, we can see the cost-based optimizer decides whether to use an index or FTS by:

  • Taking the number of blocks used to store the table and dividing this by the number of blocks read in a multi-block read to give the FTS cost.

  • For each index on the table available to the query:

    • Finding the percentage of the rows in the table it expects a query to return (the selectivity)

    • This is then used to determine the percentage of the index expected to be accessed

    • The selectivity is also multiplied by the clustering factor to estimate the number of table blocks it expects to access to fetch these rows via an index

    • Adding these numbers together to give the expected cost (of the index)

  • The cost of the FTS is then compared to each index inspected and the access method with the lowest cost used.

This is just an overview of how the (Oracle) cost-based optimizer works. If you want to see the formulas the optimizer uses have a read of Wolfgang Breitling’s “Fallacies of the Cost Based Optimizer” paper or Jonathan Lewis’ Cost-Based Oracle Fundamentals book. The blogs of Jonathan Lewis, Richard Foote and of course Markus’ articles on this site also contain many posts going into this subject in more detail.


Thank You MySQL, We’ll Miss You!

Dear MySQL,

Thank you for introducing me to SQL. It must have been 1998 when we first met I and fell in love with the simplicity of SQL immediately. Before that I’ve been using C structs all the time; I had to do my joins programmatically and also create and maintain my indexes manually. It was even hard to combine several search conditions via and and or. But then there was the shiny new world of SQL you were showing me…

Everything was easily. Just write a where clause, no matter how complex, you found the right rows. Joins were equally easy to write and you took all the effort to combine the data from several tables as I needed them. I also remember how easy it became to manage the schema. Instead of writing a program to copy my data from one C struct to another, I just say alter table now—in the meanwhile it even works online in many cases! I didn’t take long until I used SQL for stuff I wouldn’t have thought a database could do for me. So I was quickly embracing group by and co.

But I haven’t spent a lot of time with you lately. It’s not because I was too busy. I’m still practicing what you have shown me! And I’ve moved on. Now I’m using common table expressions to organize complex queries and I use window functions to calculate running totals or just do a ranking. I’m also using joins more efficiently because I know about hash and sort/merge joins. A while ago I was wondering why you didn’t tell me about these things. But then I realized that you don’t know them.

I know it was not always nice what I said about you recently. But now that Oracle announced your retirement to focus on Oracle NoSQL, I realized how right this comment on reddit was. Sure you are neither the most advanced nor the most widely deployed open source SQL database, but you introduced millions of people to SQL. There should be no tears when some of them move away because they want to see what’s next. Isn’t that the greatest compliment a teacher can get? You can be proud of what you have accomplished.

Thank you!

Markus Winand

Markus Winand
April 1, 2014

ps.: Even when fooling around, I can’t resist to inform my fellow readers. Besides the claim that Oracle retires MySQL, everything is true. Initially I thought Oracle NoSQL is an Aprils fool’s joke but it isn’t.

Original title and author: “Thank You MySQL, We’ll Miss You!” by Markus Winand.


Myth: Select * is bad

This is one of the most persistent myths I’ve seen in the field. It’s there for decades. If a myth is alive that long there must be some truth behind it. So, what could be bad about select *? Let’s have a closer look.

We all know that selecting “*” is just a short-hand for selecting all columns. Believe it or not, this makes a big difference to many people. So, lets first rephrase the question using this “finding”:

Why is it bad to select all columns?

In fact, there are a few very good reasons it is bad to select all columns if you don’t need them. And they all boil down to performance. What is surprising, however, is that the performance impact can be huge.

Up to 100x slower when preventing an Index-Only Scan

Broadly speaking, the less columns you ask for, the less data must be loaded from disk when processing your query. However, this relationship is non-linear.

Quite often, selecting from a table involves two steps: (1) use an index to find the address where the selected rows are stored; (2) load the selected rows from the table. Now imagine that you are just selecting columns that are present in the index. Why should the database still perform the second step? In fact, most databases don’t. They can process your query just with the information stored in the index—hence index-only scan.

But why should an index-only scan be 100 times faster? Simple: an ideal index stores the selected rows next to each other. It’s not uncommon that each index page holds about 100 rows—a ballpark figure; it depends on the size of the indexed columns. Nonetheless, it means that one IO operation might fetch 100 rows. The table data, on the other hand, is not organized like that (exceptions). Here it is quite common that a page just contains one of the selected rows—along with many other rows that are of no interest for the particular query. So, the reason an Index-Only Scan can be 100 times faster is that an index access can easily deliver 100 rows per IO while the table access typically just fetches a few rows per IO.

If you select a single column that’s not in the index, the database cannot do an index-only scan. If you select all columns, … , well I guess you know the answer.

Further, some databases store large objects in a separate place (e.g., LOBs in Oracle). Accessing those causes an extra IO too.

Up to 5x slower when bloating server memory footprint

Although databases avoid to store the result in the server’s main memory—instead the deliver each row after loading and forget about it again—it is sometimes inevitable. Sorting, for example, needs to keep all rows—and all selected columns—in memory to do the job. Once again, the more columns you select, the more memory the database needs. In the worst case, the database might even need to do an external sort on disk.

However, most database are extremely well tuned for this kind of workload. Although I’ve seen a sorting speed-up of factor two quite often—just by removing a few unused columns—I cannot remember having got more than factor five. However, it’s not just sorting, hash joins are rather sensitive to memory bloat too. Don’t know what that is? Please read this article.

These are just the two top issues from database perspective. Remember that the client needs to process the data too—which might put a considerable load on garbage collection.

Now that we have established a common understanding of why selecting everything is bad for performance, you may ask why it is listed as a myth? It’s because many people think the star is the bad thing. Further they believe they are not committing this crime because their ORM lists all columns by name anyway. In fact, the crime is to select all columns without thinking about it—and most ORMs readily commit this crime on behalf of their users.

The reason select * actually is bad—hence the reason the myth is very resistant—is because the star is just used as an allegory for “selecting everything without thinking about it”. This is the bad thing. But if you need a more catch phrase to remember the truth behind this myth, take this:

It’s not about the star, stupid!

If you like my way to explain things, you’ll love SQL Performance Explained.

Update 2013-11-03 – Is the star itself also bad?

Besides the performance issues mentioned above that are not caused by the star (asterisk) itself, the star itself might still cause other trouble. E.g. with software that expects the columns in a specific order when you add or drop a column. However, from my observation I’d say these issues are rather well understood in the field and usually easily identify (software stops working) fixed.

The focus of the article is on very subtle issues which are hardly understood, hard to find, and often even hard to fix (e.g. when using ORM tools). The main goal of this article is to stop people thinking about the star itself. Once people start to name the wanted columns explicitly to gain the performance benefit explained above, the issues caused by the star itself are also gone. Hence, I’ve felt no reason to add a discussion about these issues here—that’s just a distraction from the arguments that I wanted to explain with the article.

Original title and author: “Myth: Select * is bad” by Markus Winand.


Pagination Done the Right Way

Here is another slide deck for my “Pagination Done the Right Way” talk that I’ve given at many occassions.

Please also have a look at this blog post by Gary Millsap about “The Ramp”. Do you see how using OFFSET implements this anti-pattern?

Original title and author: “Pagination Done the Right Way ” by Markus Winand.


Indexes: The neglected performance all-rounder

I think I’ve actually never shared the slides of my talk given in Paris at Dalibo’s PostgreSQL Session about Performance. So, here they are.

Original title and author: “Indexes: The neglected performance all-rounder” by Markus Winand.


Afraid of SSD?

When clients tell me about their plans to invest in SSD storage for their database, they often look at me like a doctor telling a patient about his deadly disease. I didn’t discuss this with my clients until recently, when one client just asked me straight away: “As SQL tuning guy, are you afraid of SSD because it kills your job?” Here is what I told that client.

Generally, people seem to believe that SSD are just much faster than HDD. As a matter of fact, this is only partially true because—as I also mentioned in Chapter 3—performance has two dimensions: response time and throughput. Although SSDs tend to deliver more performance on both axes, it has to be considered that the throughput delivered by SSD is “only” about five times as high as that of HDDs. That’s because HDDs are not bad at sequential read/write operations anyway.

Sure enough a five times faster storage makes many performance problems go away…for a while…until you have five times more data. For a decently growing startup it might just take a few months until you have the same problem again. However, this is not the crucial point here. The crucial point is that SSDs essentially fix the one performance issue where HDDs are really bad at: the response time. Due to the lack of moving parts, the response time of SSDs is about fifty times faster as that of HDDs. Well, that really helps solving problems for a while.

However, there is a catch—maybe even a Catch-22: If you want to get the factor 50 speed-up of SSDs, you’d better avoid reading large chunks of sequential data, because that’s where you can only gain a factor five improvement. To put that into database context: if you are doing many full table scans, you won’t get the full potential of SSD. On the other hand, index lookups have a tendency to cause many random IO operations and can thus benefit from the fast response time of SSDs. The fun part is that properly indexed databases get better benefits from SSD than poorly indexed ones. But guess who is most desperately betting on SSD to solve their performance problems? People having proper indexes or those who don’t have them?

The story goes on: which database operation do you think causes most random IO operations? Of course it’s our old friend the join—it is the sole purpose of joins to gather many little data fragments from different places and combine them into the result we want. Joins can also greatly benefit from SSDs. SSDs actually voids one of arguments often brought up by NoSQL folks against relational databases: with SSD it doesn’t matter that much if you fetch data from one place or from many places.

To conclude what I said to my client: No, as an indexing-focused SQL performance guy, I’m absolutely not afraid of SSD.

Original title and author: “Afraid of SSD?” by Markus Winand.


Training and Conference Dates

A few weeks ago I invited you to take part in a survey about your interest in SQL performance training for developers. In the meanwhile there is a schedule for German and English trainings available. In particular, I’d like to point out four online courses I’m giving during summer time. Hope to see you there.

Another opportunity for a short get-together are conferences I’ll attend and/or speak at. The next one is the Prague PostgreSQL Developers’ Day (p2d2) next Thursday. You can buy SQL Performance Explained there (CZK 700; in the breaks and after the conference). The next conference that I can already confirm is this years PostgreSQL Conference Europe in Dublin end of October. You might have noticed that I attended a lot of PostgreSQL conferences recently (Brussels in February, Paris in March). I do plan to attend other conferences too and I’ve just filed some proposals for talks at other conferences. I’ll let you know if they are accepted.

One more thing: in case you are involved in the organization of developer and/or database centric event—no matter how small—you might want to have a look at the small sponsoring I can offer.

Original title and author: “Training and Conference Dates” by Markus Winand.


Party time

Good news everybody. I’m in a good mood today ;) I’ll explain. First of all, the French edition of SQL Performance Explained is almost done and will ship at the end of the month (pre-order now ;). It has a different name (SQL : Au cœur des performances) but is otherwise the same. So, that triggered some kind of "project done" feeling.

Further, I’ve given a few good trainings recently and looking forward for some more in the next weeks. It’s just awesome when the audience comes up with great questions, but it’s even more awesome if they come up with great conclusions that proof their correct understanding. That triggers a "missing accomplished" feeling.

So, as the pre-sale of the French edition is running on a 10% discount, I though why not giving a general 10% discount to celebrate the French edition? And here it is: ’aucoeur’. It’s only valid for order placed directly at http://sql-performance-explained.com/ (not valid on Amazon) and expires on March 28 (the release date of the French edition).

Further, I’ll repeat the ’retweet to win’ campaign every week during March. It goes like this: I’ll tweet something like ’Retweet this for a chance to win’. After a while, I’ll select one of the retweeters as winner. You can’t win twice (sorry @d_gustafsson). So, watch out, the first giveaway tweet comes soon.

Original title and author: “Party time” by Markus Winand.


Better Performance with PostgreSQL

Dalibo, the company where Guillaume Lelarge works (he does the French translation of SQL Performance Explained) invited me to give a talk at their (mostly French) for-free conference “Better Performance with PostgreSQL” on 28th March in Paris. Seats are limited: if you’d like to go there, register now!

And if you don’t speak French, there are other good news for you. SQL Performance Explained is currently on sale at Amazon.co.uk for £25.00 (you save £1.99, -7%). Please note that this is the free-delivery threshold for these countries: Belgium, Denmark, Luxembourg, Netherlands, Andorra, Finland, Gibraltar, Greece, Iceland, Ireland, Italy, Liechtenstein, Norway, Portugal, San Marino, Spain, Sweden, Vatican city and Poland. They will also ship for free to UK of course.

Original title and author: “Better Performance with PostgreSQL” by Markus Winand.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com