A row generator is a method to generate numbered rows on demand. It is possible to build a row generator with pure standard SQL—using recursive common table expressions (CTE), the with clause. If you never heard of the with clause, it’s probably because MySQL doesn’t implement that specific part of the SQL-99 standard (feature request from 2006). This article introduces generator views for MySQL. Not as powerful as recursive CTEs, but good enough in most cases. But before going into the implementation details, I’ll show you a use case for a row generator.
Row generators are useful to fill gaps in results. Consider the following query:
SELECT COUNT(*), sale_date
FROM sales
WHERE sale_date > CURDATE() - INTERVAL 1 MONTH
GROUP BY sale_date
ORDER BY sale_date;
The result will not contain rows for days where no sales record exists at all. You can complete the result with a row generator.
Imagine a view, GENERATOR
, that returns numbered rows like this:
SELECT n
FROM generator
WHERE n < 31;
+-----+
| n |
+-----+
| 0 |
| 1 |
. . . .
| 29 |
| 30 |
+-----+
31 rows in set (0.00 sec)
This is the basic functionality of a row generator. Having that, it is quite simple to list all days since a month ago:
SELECT CURDATE() - INTERVAL n DAY dt
FROM generator
WHERE CURDATE() - INTERVAL n DAY
> CURDATE() - INTERVAL 1 MONTH;
+------------+
| dt |
+------------+
| 2011-07-29 |
| 2011-07-28 |
. . . . . . .
| 2011-07-01 |
| 2011-06-30 |
+------------+
30 rows in set (0.00 sec)
Finally, we can use an outer join to combine the original result with the generated dates:
SELECT IFNULL(sales, 0) sales, dt sale_date
FROM
( SELECT CURDATE() - INTERVAL n DAY dt
FROM generator
WHERE CURDATE() - INTERVAL n DAY
> CURDATE() - INTERVAL 1 MONTH
) dates
LEFT OUTER JOIN
( SELECT COUNT(*) sales, sale_date
FROM sales
WHERE sale_date > CURDATE() - INTERVAL 1 MONTH
GROUP BY sale_date
) sales
ON (sales.sale_date = dates.dt)
ORDER BY dt;
The left side of the join has all the generated dates so that the outer join pads the right wide with NULL
, if there were no sale on that day. The IFNULL
turns the missing sales count into a zero.
So far, so good. But the problem is that MySQL has no row generator that produces an arbitrary number of rows as needed. Still there is a technique that is good enough in most cases.
It starts with something rather ridiculous:
CREATE OR REPLACE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL
SELECT 15;
It is a generator to produces 16 rows. You can use the same technique for larger generators as well. E.g., to produce one million rows, like shown at the end of this article. Just kidding. There is, of course, a better way:
CREATE OR REPLACE VIEW generator_256
AS SELECT ( hi.n * 16 + lo.n ) AS n
FROM generator_16 lo
, generator_16 hi;
This view builds the cross join (Cartesian product) of the previous view with itself. That means it pairs every row from the "lo" side with all rows from the "hi" side of the join. The result has 265 rows (16×16).
Considering the introductory example, it is not only important to generate an arbitrary number of rows, we need numbered rows to make use of it. For that purpose, I threat the two sides of the cross join like digits in a hexadecimal number. The "lo" side building the least significant, the "hi" side the most significant digit. Explained by SQL:
SELECT CONCAT(LPAD(hi.n,2,' '), ' * 16 + '
, LPAD(lo.n,2,' '), ' = '
, LPAD(hi.n*16+lo.n, 3,' '))
AS "HI LO SEQ"
FROM generator_16 lo, generator_16 hi;
+--------------------+
| HI LO SEQ |
+--------------------+
| 0 * 16 + 0 = 0 |
| 0 * 16 + 1 = 1 |
| 0 * 16 + 2 = 2 |
| 0 * 16 + 3 = 3 |
| 0 * 16 + 4 = 4 |
| 0 * 16 + 5 = 5 |
| 0 * 16 + 6 = 6 |
| 0 * 16 + 7 = 7 |
| 0 * 16 + 8 = 8 |
| 0 * 16 + 9 = 9 |
| 0 * 16 + 10 = 10 |
| 0 * 16 + 11 = 11 |
| 0 * 16 + 12 = 12 |
| 0 * 16 + 13 = 13 |
| 0 * 16 + 14 = 14 |
| 0 * 16 + 15 = 15 |
| 1 * 16 + 0 = 16 |
| 1 * 16 + 1 = 17 |
. . . . . . . . . . .
| 15 * 16 + 14 = 254 |
| 15 * 16 + 15 = 255 |
+--------------------+
256 rows in set (0.00 sec)
I guess you know how to build a larger generator now?
There are, of course, some limitations:
-
The views are bounded
They cannot produce an arbitrary number of rows. But you can prepare views that generate large number of rows (see below).
-
The views will always build the full Cartesian product
Although you can limit the result with the where clause, the work to produce all rows is still done. It’s just filtering the unneeded. That means, it is very inefficient to use a large generator view if you need just a few rows.
Another aspect is that the result order is undefined—although the example above produces the numbers is ascending order. That’s just because the way MySQL joins the two views. If you build another meta view, e.g., GENERATOR_64K
, you’ll note that the order is not ascending anymore—because of MySQLs Block Nested-Loop Join. But that doesn’t mean that the generator doesn’t work, each number is still generated exactly once. If you need a particular order, just use the ORDER BY
clause on the outermost select.
Warning
Don’t use LIMIT
to cut the view as needed because you might receive unexpected numbering. Please note that using ORDER BY
and LIMIT
in combination returns the correct result, but requires the intermediate result to be stored temporarily.
Use where to filter on the returned numbers. There is no need to store the intermediate result in that case.
Finally, here are some handy generator views up to 1 "mega-row". They use a minor improvement: bit-shift operations instead of arithmetics.
CREATE OR REPLACE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL
SELECT 15;
CREATE OR REPLACE VIEW generator_256
AS SELECT ( ( hi.n << 4 ) | lo.n ) AS n
FROM generator_16 lo, generator_16 hi;
CREATE OR REPLACE VIEW generator_4k
AS SELECT ( ( hi.n << 8 ) | lo.n ) AS n
FROM generator_256 lo, generator_16 hi;
CREATE OR REPLACE VIEW generator_64k
AS SELECT ( ( hi.n << 8 ) | lo.n ) AS n
FROM generator_256 lo, generator_256 hi;
CREATE OR REPLACE VIEW generator_1m
AS SELECT ( ( hi.n << 16 ) | lo.n ) AS n
FROM generator_64k lo, generator_16 hi;
Even the last view, producing 220 rows, returns the result in a seconds on current hardware. However, the only use I have for the GENERATOR_1M
view is producing test data.
Always use the smallest possible generator for best performance.