
In this blog, I will provide answers to the Q & A for the Troubleshooting Slow Queries webinar.
First, I want to thank you for attending the April 28 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:
Q: I’ve heard that is a bad idea to use
select *
; what do you recommend?
A: When I used
SELECT *
in my slides, I wanted to underline the idea that sometimes you need to select all columns from the table. There is nothing bad about it if you need them.
SELECT *
is bad when you need only a few columns from the table. In this case, you retrieve more data than needed, which affects performance. Another issue that
SELECT *
can cause is if you hard-code the statement into your application, then change table definition; the application could start retrieving columns in wrong order and output (e.g., email instead of billing address). Or even worse, it will try to access a non-existent index in the result set array. The best practice is to explicitly enumerate all columns that your application needs.
Q: I heard that using
index_field
length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?
A: I assume you are asking about the ability to create an index with lengths smaller than the column length? They work as follows:
Assume you have a
TEXT
field which contains these user questions:
- I’ve heard that is a bad idea to use select * what do you recommend?
- I heard that using
index_field
length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as indexing, what happens?
- ….
Since this is a
TEXT
field you cannot create and index on it without specifying its length, so you need to make the index as minimal as possible to uniquely identify questions. If you create an index with length 10 it will contain:
- I’ve heard
- I heard th
- …
You will index only those parts of questions that are not very distinct from each other, and do not contain useful information about what the question is. You can create index of length 255:
- I’ve heard that is a bad idea to use select * what do you recommend?
- I heard that using index_field length will affect the indexing principle during query execution (e.g., one field is varchar and its length is not fixed, some values have short text, some values have long text. at this time). If we use this field as index
- …
In this case, the index includes the whole first question and almost all the second question. This makes the index too large and requires us to use more disk space (which causes more IO). Also, information from the second question is probably too much.
If make index of length 75, we will have:
- I’ve heard that is a bad idea to use select * what do you recommend?
- I heard that using index_field length will affect the indexing principle du
- …
This is more than enough for the first question and gives a good idea of what is in the second question. It also potentially will have enough unique entries to make its cardinality look more like the cardinality of real data distribution.
To conclude: choosing the correct index length is something that requires practice and analysis of your actual data. Try to make them as short as possible, but long enough so that the number of unique entries in the index will be similar to a number of unique entries in the table.
Q: Which view can we query to see stats?
A: Do you mean index statistics?
SHOW INDEX FROM table_name
will do it.
Q: We have an InnoDB table with 47 fields (mostly text); some are ft-indexed. I tried to do an alter table, and it ran for 24 hours. What is the best way to run an alter table to add one extra field? The table has 1.9 M rows and 47 columns with many indexes.
A: Adding a column requires a table copy. Therefore, the speed of this operation depends on the table size and speed of your disk. If you are using version 5.6 and later, adding a column would not block parallel queries (and therefore is not a big deal). If you are using an older version, you can always use the pt-online-schema-change utility from Percona Toolkit. However, it will run even more slowly than the regular
ALTER TABLE
. Unfortunately, you cannot speed up the execution of
ALTER TABLE
much. The only thing that you can do is to use a faster disk (with options, tuned to explore speed of the disk).
However, if you do not want to have this increased IO affect the production server, you can alter the table on the separate instance, then copy tablespace to production and then apply all changes to the original table from the binary logs. The steps will be something like:
- Ensure you use option
innodb_file_per_table
and the big table has individual tablespace
- Ensure that binary log is enabled
- Start a new server (you can also use an existent stand-by slave).
- Disable writes to the table
- Record the binary log position
- Copy the tablespace to the new server as described here.
- Enable writes on the production server
- Run
ALTER TABLE
on the new server you created in step 2 (it will still take 24 hours)
- Stop writes to the table on the production server
- Copy the tablespace, altered in step 7
- Apply all writes to this table, which are in the binary logs after position, recorded in step 4.
- Enable writes to the table
This scenario will take even more time overall, but will have minimal impact on the production server
Q: If there is a compound index like index1(emp_id,date), will the following query be able to use index? “select * from table1 where emp_id = 10”
A: Yes. At least it should.
Q: Are
filesort
and
temporary
in extended info for explain not good?
A: Regarding
filesort
: it depends. For example, you will always have the word
filesort
” for tables which perform
ORDER BY
and cannot use an index for
ORDER BY
. This is not always bad. For example, in this query:
mysql> explain select emp_no, first_name from employees where emp_no <20000 order by first_nameG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: employees
partitions: NULL
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 18722
filtered: 100.00
Extra: Using where; Using filesort
1 row in set, 1 warning (0,01 sec)
the primary key used to resolve rows and
filesort
were necessary and not avoidable. You can read about different
filesort
algorithms here.
Regarding
Using temporary
: this means what during query execution temporary table will be created. This is can be not good, especially if the temporary table is large and cannot fit into memory. In this case, it would be written to disk and slow down operations. But, again, sometimes creating temporary tables in not avoidable, for example, if you have both
GROUP BY
and
ORDER BY
clauses which list columns differently as stated in the user manual.
Q: Is
key_len
length more of a good thing for query execution?
A:
key_len
field is not
NULL
for all queries that use and index, and just shows the length of the key part used. It is not good or bad, it is just for information. You can use this information, for example, to identify which part of combined index is used to resolve the query.
Q: Does an alter query go for an optimizer check?
A: No. You can check it either by enabling optimizer trace, running
ALTER
and find what trace is empty. Or by enabling the debug option and searching the resulting trace for
optimize
.
Q: A query involves four columns that are all individually covered by an index. The optimizer didn’t merge indexes because of cost, and even didn’t choose the composite index I created.
A: This depends on the table definition and query you used. I cannot provide a more detailed answer based only on this information.
Q cont.: Finally, only certain composite indexes were suitable, the column order in the complex index mattered a lot. Why couldn’t the optimizer merge the four individual single column indexes, and why did the order of the columns in the composite index matter?
A: Regarding why the optimizer could not merge four indexes, I need to see how the table is defined and which data is in these indexed columns. Regarding why the order of the columns in the composite index matters, it depends on the query. Why the optimizer can use an index, say, on
(col1, col2)
where the conditions
col1=X AND col2=Y
and
col2=Y AND col2=X
for the case when you use
OR
, the order is important. For example, for the condition
col1=X OR col2=Y
, where the part
col1=X
is always executed and the part
col2=Y
is executed only when
col1=X
is false. The same logic applies to queries like
SELECT col1 WHERE col2=Y ORDER BY col3
. See the user manual for details.
Q: When I try to obtain the optimizer trace on the console, the result is cut off. Even if I redirect the output to a file, how to overcome that?
A: Which version of MySQL Server do you use? The
TRACE
column is defined as
longtext NOT NULL
, and should not cause such issues. If it does with a newer version, report a bug at http://bugs.mysql.com/.
Q: Are there any free graphical visualizers for either EXPLAIN or the optimizer TRACE output?
A: There is graphical visualizer for
EXPLAIN
in MySQL Workbench. But it works with online data only: you cannot run it on
EXPLAIN
output, saved into a file. I don’t know about any visualizer for the optimizer
TRACE
output. However, since it is
JSON
you can simply save it to file and open in web browser. It will allow a better view than if opened in simple text editor.
Q: When do you use force index instead of
use index
hints?
A: According to user manual “
USE INDEX (index_list)
hint tells MySQL to use only one of the named indexes to find rows in the table” and “
FORCE INDEX
hint acts like
USE INDEX (index_list)
, with the addition that a table scan is assumed to be very expensive . . . a table scan is used only if there is no way to use one of the named indexes to find rows in the table.” This means that when you use
USE INDEX
, you are giving a hint for the optimizer to prefer a particular index to others, but not enforcing index usage if the optimizer prefers a table scan, while
FORCE INDEX
requires using the index. I myself use only
FORCE
and
IGNORE
index hints.
Q: Very informative session. I missed the beginning part. Are you going to distribute the recoded session later?
A: Yes. As usual slides and recording available here.