Jul
03
2023
--

PostgreSQL Partitioning Using Traditional Methods

PostgreSQL Partitioning

Partitioning is the concept of splitting large tables logically into smaller pieces for better performance of the database.

Methods of built-in PostgreSQL partition techniques

  • Range partitioning
  • List partitioning
  • Hash partitioning

When to use partitioning

  • Bulk operations like data loads and deletes can be performed using the partition feature of ATTACH and DETACH effectively.
  • The exact point at which a table will benefit from partitioning depends on the application. However, a rule of thumb is that the size of the table should exceed the physical memory of the database server.
  • As data is growing, sub-partitions can be created, which enhances the performance, and also old partitions can be deleted either by making them standalone or dropping them entirely.

Benefits of partitioning

  • Query performance for DDL and DML operations can be improved in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions which is explained below.
  • When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.
  • Dropping the partition table or truncating the partition table can be done using DROP TABLE and TRUNCATE TABLE, respectively, reducing the load through DELETE operations.

Range partitioning

Database partition that is based on a specific range of columns with data like dates and Numeric values.

Here, as an example, I created a table with range partitioning and partition tables for each quarter on a Date column.

CREATE TABLE employees (id INT NOt NULL , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL , 
joined DATE NOT NULL) PARTITION BY RANGE (joined);
CREATE TABLE employees_q1 PARTITION of employees for VALUES FROM ('2022-01-01') to ('2022-04-01');
CREATE TABLE employees_q2 PARTITION of employees for VALUES FROM ('2022-04-01') to ('2022-07-01');
CREATE TABLE employees_q3 PARTITION of employees for VALUES FROM ('2022-07-01') to ('2022-10-01');
CREATE TABLE employees_q4 PARTITION of employees for VALUES FROM ('2022-10-01') to ('2023-01-01');

Range partitions are seen below in the table structure.

d+ employees
                                       Partitioned table "public.employees"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id       | integer               |             | not null | | plain | | |
fname    | character varying(20) |             |          | | extended | | |
lname    | character varying(20) |             |          | | extended | | |
dob      | date                  |             | not null | | plain | | |
joined   | date                  |             | not null | | plain | | |
Partition key: RANGE (joined)
    Partitions: employees_q1 FOR VALUES FROM ('2022-01-01') TO ('2022-04-01'),
                employees_q2 FOR VALUES FROM ('2022-04-01') TO ('2022-07-01'),
                employees_q3 FOR VALUES FROM ('2022-07-01') TO ('2022-10-01'),
                employees_q4 FOR VALUES FROM ('2022-10-01') TO ('2023-01-01')

Inserted some random data for entries with 365 days a year.

INSERT INTO employees (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 365) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], 
(array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,
generate_series('1/1/2022'::date, '12/31/2022'::date, '1 day'));

Range partitioned data is seen as below distributed among its partitions.

SELECT employees_q1 , employees_q2 , employees_q3 , employees_q4 , employees_totalcnt from
( SELECT COUNT(*) FROM employees_q1 ) AS employees_q1,
( SELECT COUNT(*) FROM employees_q2 ) AS employees_q2, 
( SELECT COUNT(*) FROM employees_q3 ) AS employees_q3,
( SELECT COUNT(*) FROM employees_q3 ) AS employees_q4 ,
( SELECT COUNT(*) FROM employees ) AS employees_totalcnt ;
employees_q1 | employees_q2 | employees_q3 | employees_q4 | employees_totalcnt
--------------+--------------+--------------+--------------+--------------------
(90)         | (91)         | (92)         | (92)         | (365)
(1 row)

Performance of DDL operations

Here, I created a table without a partition and inserted the same data, similar to the partitioned table.

A query plan is seen better for DDL operations when performed on data with a single partition or fewer partitions.

CREATE TABLE employees_nopartition (id INT NOt NULL , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL , joined DATE NOT NULL) ;

INSERT INTO employees_nopartition (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 365) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], 
(array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,
generate_series('1/1/2022'::date, '12/31/2022'::date, '1 day'));

EXPLAIN select * from employees_nopartition where joined >= '2022-05-12' and joined < '2022-06-10'; 
                                QUERY PLAN 
------------------------------------------------------------------------------ 
Seq Scan on employees_nopartition (cost=0.00..8.47 rows=29 width=22) 
         Filter: ((joined >= '2022-05-12'::date) AND (joined < '2022-06-10'::date)) 
(2 rows)

Here we can see a better query plan when data is fetched from the partitioned table than data fetched from the non-partitioned table.

EXPLAIN select * from employees where joined >= '2022-05-12' and joined < '2022-06-10'; 
                                QUERY PLAN 
------------------------------------------------------------------------------ 
Seq Scan on employees_q2 (cost=0.00..2.37 rows=29 width=22) 
    Filter: ((joined >= '2022-05-12'::date) AND (joined < '2022-06-10'::date)) 
(2 rows)

List partitioning

Database partition that is based on key value(s) or discrete values and partition can also be done with the expression of the column like (RANGE BY LIST(expression)), which is explained below:

For example, I created a table with a list partition and a few list-partitioned tables and inserted some random data with 1,000 rows.

CREATE TABLE sales (id INT NOT NULL , branch VARCHAR(3),type text, Amount int ) PARTITION BY LIST (branch);

CREATE TABLE HYD_sales PARTITION of sales for VALUES IN ('HYD');
CREATE TABLE BLR_sales PARTITION of sales for VALUES IN ('BLR');
CREATE TABLE DEL_sales PARTITION of sales for VALUES IN ('DEL');
CREATE TABLE TPT_sales PARTITION of sales for VALUES IN ('TPT');

INSERT into sales (id , branch ,type , amount ) VALUES ( generate_series(1, 1000) , (array['HYD', 'BLR', 'DEL', 'TPT'])[floor(random() * 4 + 1)] , 
(array['Laptops', 'Printers', 'Hardisks', 'Desktops' ,'Monitors'])[floor(random() * 5 + 1)], (random()*200000)::int );

List partitions are seen in the table definition as below:

d+ sales
                                    Partitioned table "public.sales"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+----------------------+-----------+----------+---------+----------+--------------+-------------
id     | integer              |   | not null |         | plain    | |
branch | character varying(3) |   |          |         | extended | |
type   | text                 |   |          |         | extended | |
amount | integer              |   |          |         | plain    | |
Partition key: LIST (branch)
       Partitions: blr_sales FOR VALUES IN ('BLR'),
                   del_sales FOR VALUES IN ('DEL'),
                   hyd_sales FOR VALUES IN ('HYD'),
                   tpt_sales FOR VALUES IN ('TPT')

Partitioned data distributed among its partitions is seen below:

SELECT blr_sales , del_sales , hyd_sales,tpt_sales, total_cnt from
( SELECT COUNT(*) FROM blr_sales ) AS blr_sales, ( SELECT COUNT(*) FROM del_sales ) AS del_sales, 
( SELECT COUNT(*) FROM hyd_sales ) AS hyd_sales, ( SELECT COUNT(*) FROM tpt_sales ) AS tpt_sales ,
( SELECT COUNT(*) FROM sales ) AS total_cnt;
blr_sales | del_sales | hyd_sales | tpt_sales | total_cnt
-----------+-----------+-----------+-----------+-----------
(262)     | (258)     | (228)     | (252)     | (1001)
(1 row)

List partitioning using expression

For example, I created a table with list partitioning using the expression of a column.

CREATE TABLE donors (id INT NOt NULL , name VARCHAR(20) , bloodgroup VARCHAR (15) , last_donated DATE , 
contact_num VARCHAR(10)) PARTITION BY LIST (left(upper(bloodgroup),3));

CREATE TABLE A_positive PARTITION of donors for VALUES IN ('A+ ');
CREATE TABLE A_negative PARTITION of donors for VALUES IN ('A- ');
CREATE TABLE B_positive PARTITION of donors for VALUES IN ('B+ ');
CREATE TABLE B_negative PARTITION of donors for VALUES IN ('B- ');
CREATE TABLE AB_positive PARTITION of donors for VALUES IN ('AB+');
CREATE TABLE AB_negative PARTITION of donors for VALUES IN ('AB-');
CREATE TABLE O_positive PARTITION of donors for VALUES IN ('O+ ');
CREATE TABLE O_negative PARTITION of donors for VALUES IN ('O- ');

List partitions are seen in the table definition below:

d+ donors
                                      Partitioned table "public.donors"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id           | integer               | | not null | | plain | | |
name         | character varying(20) | |          | | extended | | |
bloodgroup   | character varying(15) | |          | | extended | | |
last_donated | date                  | |          | | plain | | |
contact_num  | character varying(10) | |          | | extended | | |
Partition key: LIST ("left"(upper((bloodgroup)::text), 3))
          Partitions: a_negative FOR VALUES IN ('A- '),
                      a_positive FOR VALUES IN ('A+ '),
                      ab_negative FOR VALUES IN ('AB-'),
                      ab_positive FOR VALUES IN ('AB+'),
                      b_negative FOR VALUES IN ('B- '),
                      b_positive FOR VALUES IN ('B+ '),
                      o_negative FOR VALUES IN ('O- '),
                      o_positive FOR VALUES IN ('O+ ')

Here, I inserted some random 100 rows.

INSERT INTO donors (id , name , bloodgroup , last_donated , contact_num) VALUES (generate_series(1, 100) ,'user_' || trunc(random()*100) ,
(array['A+ group', 'A- group', 'O- group', 'O+ group','AB+ group','AB- group','B+ group','B- group'])[floor(random() * 8 + 1)] , '2022-01-01'::date + trunc(random() * 366 * 1)::int,
CAST(1000000000 + floor(random() * 9000000000) AS bigint));

List partitioned data with expression distributed among its partitions is seen below:

SELECT a_negative , a_positive , ab_negative , ab_positive , b_negative ,b_positive ,o_negative , o_positive , total_cnt from
( SELECT COUNT(*) FROM a_negative ) AS a_negative, ( SELECT COUNT(*) FROM a_positive ) AS a_positive, 
( SELECT COUNT(*) FROM ab_negative ) AS ab_negative, ( SELECT COUNT(*) FROM ab_positive ) AS ab_positive ,
( SELECT COUNT(*) FROM b_negative ) AS b_negative, ( SELECT COUNT(*) FROM b_positive ) AS b_positive , 
( SELECT COUNT(*) FROM o_positive ) AS o_positive , ( SELECT COUNT(*) FROM o_negative ) AS o_negative,
( SELECT COUNT(*) FROM donors ) AS total_cnt;
 a_negative | a_positive | ab_negative | ab_positive | b_negative | b_positive | o_negative | o_positive | total_cnt
------------+------------+-------------+-------------+------------+------------+------------+------------+-----------
(9)         | (19)       | (10)        | (12)        | (12)       | (10)       | (18)       | (10)       | (100)
(1 row)

Performance of DML operations

Here is an example shown with the table, which is created without partitions and inserted the same data similar to that of the partitioned table.

Below I created a table without a partition and inserted some random data with 1,000 rows to show query performance.

CREATE TABLE sales_nopartition (id INT NOT NULL , branch VARCHAR(3),type text, Amount int );

INSERT into sales_nopartition (id , branch ,type , amount ) VALUES ( generate_series(1, 1000) ,
(array['HYD', 'BLR', 'DEL', 'TPT'])[floor(random() * 4 + 1)] , 
(array['Laptops', 'Printers', 'Hardisks', 'Desktops' ,'Monitors'])[floor(random() * 5 + 1)], (random()*200000)::int );

UPDATE Query Performance

EXPLAIN update sales_nopartition set type = 'Smart Watches' where branch = 'HYD';
                          QUERY PLAN
---------------------------------------------------------------------------
Update on sales_nopartition (cost=0.00..19.50 rows=229 width=50)
       -> Seq Scan on sales_nopartition (cost=0.00..19.50 rows=229 width=50)
          Filter: ((branch)::text = 'HYD'::text)
(3 rows)

EXPLAIN update sales set type = 'Smart Watches' where branch = 'HYD';
                           QUERY PLAN
------------------------------------------------------------------
Update on sales (cost=0.00..5.10 rows=248 width=50)
Update on hyd_sales
       -> Seq Scan on hyd_sales (cost=0.00..5.10 rows=248 width=50)
          Filter: ((branch)::text = 'HYD'::text)
(4 rows)

DELETE Query Performance

EXPLAIN DELETE from sales_nopartition where branch='HYD';
                           QUERY PLAN
--------------------------------------------------------------------------
Delete on sales_nopartition (cost=0.00..19.50 rows=229 width=6)
       -> Seq Scan on sales_nopartition (cost=0.00..19.50 rows=229 width=6)
          Filter: ((branch)::text = 'HYD'::text)
(3 rows)

EXPLAIN DELETE from sales where branch='HYD';
                          QUERY PLAN
-----------------------------------------------------------------
Delete on sales (cost=0.00..5.10 rows=248 width=6)
Delete on hyd_sales
      -> Seq Scan on hyd_sales (cost=0.00..5.10 rows=248 width=6)
         Filter: ((branch)::text = 'HYD'::text)
(4 rows)

The above examples show the performance of DELETE and UPDATE operations with data fetched from a single partitioned table having a better query plan than the one with no partitions.

Hash partitioning

Hash partitioning table is defined as the table partitioned by specifying a modulus and a remainder for each partition.

  • Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
  • Hash partitioning is best used when each partition is on different table spaces residing on separate physical disks, so the IO is equally divided by more devices.

For example, I created a table with hash partitioning and a few partitioned tables with modulus five.

CREATE TABLE students ( id int NOT NULL, name varchar(30) NOT NULL , course varchar(100) ,joined date ) PARTITION BY hash(id);

CREATE TABLE student_0 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 0);
CREATE TABLE student_1 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 1);
CREATE TABLE student_2 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 2);
CREATE TABLE student_3 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 3);
CREATE TABLE student_4 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 4);

The table structure looks like the one below with five created partitions:

d+ students
                              Partitioned table "public.students"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------------------------+-----------+----------+---------+----------+--------------+-------------
id     | integer                | | not null | | plain | |
name   | character varying(30)  | | not null | | extended | |
course | character varying(100) | |          | | extended | |
joined | date                   | |          | | plain | |
Partition key: HASH (id)
        Partitions: student_0 FOR VALUES WITH (modulus 5, remainder 0),
                    student_1 FOR VALUES WITH (modulus 5, remainder 1),
                    student_2 FOR VALUES WITH (modulus 5, remainder 2),
                    student_3 FOR VALUES WITH (modulus 5, remainder 3),
                    student_4 FOR VALUES WITH (modulus 5, remainder 4)

Here, I Inserted some random data with 100,000 rows.

INSERT into students (id , name , course ,joined ) VALUES (generate_series(1, 100000) , 'student_' || trunc(random()*1000) , 
(array['Finance & Accounts', 'Business Statistics', 'Environmental Science'])[floor(random() * 3 + 1)],'2019-01-01'::date + trunc(random() * 366 * 3)::int);

We see below the hash partitioned data among its partitioned tables.

SELECT relname,reltuples as rows FROM pg_class WHERE relname IN ('student_0','student_1','student_2','student_3','student_4') ORDER BY relname;
relname | rows
-----------+-------
student_0 | 19851
student_1 | 20223
student_2 | 19969
student_3 | 19952
student_4 | 20005
(5 rows)

Benefits of hash partitioning

  • The primary benefit is to ensure an even distribution of data among a predetermined number of partitions.
  • Hash keys are used effectively and efficiently in cases where ranges are not applicable, like employee number, product ID, etc.

What if the data is out of range or list?

For this purpose, we use default partitions on range and list partitioned tables.

For both range and list partitions, data can be stored temporarily, which is out-of-range, by creating a default partition and later creating an appropriate partition.

Hash-partitioned tables may not have a default partition, as the creation of a default partition for hash partitioning does not make any sense and is not needed.

We see here what happens when I try to insert data for which a partition doesn’t exist and how the default partition helps in this case.

INSERT into sales VALUES ( 1001 , 'MYS' , 'Scanners' , 190000);
ERROR: no partition of relation "sales" found for row
DETAIL: Partition key of the failing row contains (branch) = (MYS).

CREATE TABLE sales_default PARTITION of sales DEFAULT;
CREATE TABLE

INSERT into sales VALUES ( 1001 , 'MYS' , 'Scanners' , 190000);
INSERT 0 1

select * from sales_default ;
id | branch | type | amount
------+--------+----------+--------
1001 | MYS | Scanners | 190000
(1 row)

So the data we inserted is sent to the default partition, and partitions can be created later based on the data in the default table and available partitions.

Conclusion

Here we discussed default partitioning techniques in PostgreSQL using single columns, and we can also create multi-column partitioning. PostgreSQL Partition Manager(pg_partman) can also be used for creating and managing partitions effectively. Further details will be explained in upcoming blogs.

Also, please find below the related blogs for reference:

PostgreSQL Sharding: An Overview and MongoDB Comparison

Performing ETL Using Inheritance in PostgreSQL

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Feb
21
2019
--

Parallel queries in PostgreSQL

parallel queries in postgresql

PostgreSQL logoModern CPU models have a huge number of cores. For many years, applications have been sending queries in parallel to databases. Where there are reporting queries that deal with many table rows, the ability for a query to use multiple CPUs helps us with a faster execution. Parallel queries in PostgreSQL allow us to utilize many CPUs to finish report queries faster. The parallel queries feature was implemented in 9.6 and helps. Starting from PostgreSQL 9.6 a report query is able to use many CPUs and finish faster.

The initial implementation of the parallel queries execution took three years. Parallel support requires code changes in many query execution stages. PostgreSQL 9.6 created an infrastructure for further code improvements. Later versions extended parallel execution support for other query types.

Limitations

  • Do not enable parallel executions if all CPU cores are already saturated. Parallel execution steals CPU time from other queries, and increases response time.
  • Most importantly, parallel processing significantly increases memory usage with high WORK_MEM values, as each hash join or sort operation takes a work_mem amount of memory.
  • Next, low latency OLTP queries can’t be made any faster with parallel execution. In particular, queries that returns a single row can perform badly when parallel execution is enabled.
  • The Pierian spring for developers is a TPC-H benchmark. Check if you have similar queries for the best parallel execution.
  • Parallel execution supports only SELECT queries without lock predicates.
  • Proper indexing might be a better alternative to a parallel sequential table scan.
  • There is no support for cursors or suspended queries.
  • Windowed functions and ordered-set aggregate functions are non-parallel.
  • There is no benefit for an IO-bound workload.
  • There are no parallel sort algorithms. However, queries with sorts still can be parallel in some aspects.
  • Replace CTE (WITH …) with a sub-select to support parallel execution.
  • Foreign data wrappers do not currently support parallel execution (but they could!)
  • There is no support for FULL OUTER JOIN.
  • Clients setting max_rows disable parallel execution.
  • If a query uses a function that is not marked as PARALLEL SAFE, it will be single-threaded.
  • SERIALIZABLE transaction isolation level disables parallel execution.

Test environment

The PostgreSQL development team have tried to improve TPC-H benchmark queries’ response time. You can download the benchmark and adapt it to PostgreSQL by using these instructions. It’s not an official way to use the TPC-H benchmark, so you shouldn’t use it to compare different databases or hardware.

  1. Download TPC-H_Tools_v2.17.3.zip (or newer version) from official TPC site.
  2. Rename makefile.suite to Makefile and modify it as requested at https://github.com/tvondra/pg_tpch . Compile the code with make command
  3. Generate data: ./dbgen -s 10 generates 23GB database which is enough to see the difference in performance for parallel and non-parallel queries.
  4. Convert tbl files to csv with for + sed
  5. Clone pg_tpch repository and copy csv files to pg_tpch/dss/data
  6. Generate queries with qgen command
  7. Load data to the database with ./tpch.sh command.

Parallel sequential scan

This might be faster not because of parallel reads, but due to scattering of data across many CPU cores. Modern OS provides good caching for PostgreSQL data files. Read-ahead allows getting a block from storage more than just the block requested by PG daemon. As a result, query performance is not limited due to disk IO. It consumes CPU cycles for:

  • reading rows one by one from table data pages
  • comparing row values and WHERE conditions

Let’s try to execute simple select query:

tpch=# explain analyze select l_quantity as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..1964772.00 rows=58856235 width=5) (actual time=0.014..16951.669 rows=58839715 loops=1)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 1146337
Planning Time: 0.203 ms
Execution Time: 19035.100 ms

A sequential scan produces too many rows without aggregation. So, the query is executed by a single CPU core.

After adding SUM(), it’s clear to see that two workers will help us to make the query faster:

explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms

The more complex query is 2.2X faster compared to the plain, single-threaded select.

Parallel Aggregation

A “Parallel Seq Scan” node produces rows for partial aggregation. A “Partial Aggregate” node reduces these rows with SUM(). At the end, the SUM counter from each worker collected by “Gather” node.

The final result is calculated by the “Finalize Aggregate” node. If you have your own aggregation functions, do not forget to mark them as “parallel safe”.

Number of workers

We can increase the number of workers without server restart:

alter system set max_parallel_workers_per_gather=4;
select * from pg_reload_conf();
Now, there are 4 workers in explain output:
tpch=# explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1440213.58..1440213.59 rows=1 width=32) (actual time=5152.072..5152.072 rows=1 loops=1)
-> Gather (cost=1440213.15..1440213.56 rows=4 width=32) (actual time=5151.807..5153.900 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=1439213.15..1439213.16 rows=1 width=32) (actual time=5147.238..5147.239 rows=1 loops=5)
-> Parallel Seq Scan on lineitem (cost=0.00..1402428.00 rows=14714059 width=5) (actual time=0.037..3601.882 rows=11767943 loops=5)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 229267
Planning Time: 0.218 ms
Execution Time: 5153.967 ms

What’s happening here? We have changed the number of workers from 2 to 4, but the query became only 1.6599 times faster. Actually, scaling is amazing. We had two workers plus one leader. After a configuration change, it becomes 4+1.

The biggest improvement from parallel execution that we can achieve is: 5/3 = 1.66(6)X faster.

How does it work?

Processes

Query execution always starts in the “leader” process. A leader executes all non-parallel activity and its own contribution to parallel processing. Other processes executing the same queries are called “worker” processes. Parallel execution utilizes the Dynamic Background Workers infrastructure (added in 9.4). As other parts of PostgreSQL uses processes, but not threads, the query creating three worker processes could be 4X faster than the traditional execution.

Communication

Workers communicate with the leader using a message queue (based on shared memory). Each process has two queues: one for errors and the second one for tuples.

How many workers to use?

Firstly, the max_parallel_workers_per_gather parameter is the smallest limit on the number of workers. Secondly, the query executor takes workers from the pool limited by max_parallel_workers size. Finally, the top-level limit is max_worker_processes: the total number of background processes.

Failed worker allocation leads to single-process execution.

The query planner could consider decreasing the number of workers based on a table or index size. min_parallel_table_scan_size and min_parallel_index_scan_size control this behavior.

set min_parallel_table_scan_size='8MB'
8MB table => 1 worker
24MB table => 2 workers
72MB table => 3 workers
x => log(x / min_parallel_table_scan_size) / log(3) + 1 worker

Each time the table is 3X bigger than min_parallel_(index|table)_scan_size, postgres adds a worker. The number of workers is not cost-based! A circular dependency makes a complex implementation hard. Instead, the planner uses simple rules.

In practice, these rules are not always acceptable in production and you can override the number of workers for the specific table with ALTER TABLE … SET (parallel_workers = N).

Why parallel execution is not used?

Besides to the long list of parallel execution limitations, PostgreSQL checks costs:

parallel_setup_cost to avoid parallel execution for short queries. It models the time spent for memory setup, process start, and initial communication

parallel_tuple_cost : The communication between leader and workers could take a long time. The time is proportional to the number of tuples sent by workers. The parameter models the communication cost.

Nested loop joins

PostgreSQL 9.6+ could execute a “Nested loop” in parallel due to the simplicity of the operation.

explain (costs off) select c_custkey, count(o_orderkey)
                from    customer left outer join orders on
                                c_custkey = o_custkey and o_comment not like '%special%deposits%'
                group by c_custkey;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Finalize GroupAggregate
   Group Key: customer.c_custkey
   ->  Gather Merge
         Workers Planned: 4
         ->  Partial GroupAggregate
               Group Key: customer.c_custkey
               ->  Nested Loop Left Join
                     ->  Parallel Index Only Scan using customer_pkey on customer
                     ->  Index Scan using idx_orders_custkey on orders
                           Index Cond: (customer.c_custkey = o_custkey)
                           Filter: ((o_comment)::text !~~ '%special%deposits%'::text)

Gather happens in the last stage, so “Nested Loop Left Join” is a parallel operation. “Parallel Index Only Scan” is available from version 10. It acts in a similar way to a parallel sequential scan. The

c_custkey = o_custkey

condition reads a single order for each customer row. Thus it’s not parallel.

Hash Join

Each worker builds its own hash table until PostgreSQL 11. As a result, 4+ workers weren’t able to improve performance. The new implementation uses a shared hash table. Each worker can utilize WORK_MEM to build the hash table.

select
        l_shipmode,
        sum(case
                when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                else 0
        end) as high_line_count,
        sum(case
                when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                else 0
        end) as low_line_count
from
        orders,
        lineitem
where
        o_orderkey = l_orderkey
        and l_shipmode in ('MAIL', 'AIR')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1996-01-01'
        and l_receiptdate < date '1996-01-01' + interval '1' year
group by
        l_shipmode
order by
        l_shipmode
LIMIT 1;
                                                                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1964755.66..1964961.44 rows=1 width=27) (actual time=7579.592..7922.997 rows=1 loops=1)
   ->  Finalize GroupAggregate  (cost=1964755.66..1966196.11 rows=7 width=27) (actual time=7579.590..7579.591 rows=1 loops=1)
         Group Key: lineitem.l_shipmode
         ->  Gather Merge  (cost=1964755.66..1966195.83 rows=28 width=27) (actual time=7559.593..7922.319 rows=6 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Partial GroupAggregate  (cost=1963755.61..1965192.44 rows=7 width=27) (actual time=7548.103..7564.592 rows=2 loops=5)
                     Group Key: lineitem.l_shipmode
                     ->  Sort  (cost=1963755.61..1963935.20 rows=71838 width=27) (actual time=7530.280..7539.688 rows=62519 loops=5)
                           Sort Key: lineitem.l_shipmode
                           Sort Method: external merge  Disk: 2304kB
                           Worker 0:  Sort Method: external merge  Disk: 2064kB
                           Worker 1:  Sort Method: external merge  Disk: 2384kB
                           Worker 2:  Sort Method: external merge  Disk: 2264kB
                           Worker 3:  Sort Method: external merge  Disk: 2336kB
                           ->  Parallel Hash Join  (cost=382571.01..1957960.99 rows=71838 width=27) (actual time=7036.917..7499.692 rows=62519 loops=5)
                                 Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                                 ->  Parallel Seq Scan on lineitem  (cost=0.00..1552386.40 rows=71838 width=19) (actual time=0.583..4901.063 rows=62519 loops=5)
                                       Filter: ((l_shipmode = ANY ('{MAIL,AIR}'::bpchar[])) AND (l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND (l_receiptdate >= '1996-01-01'::date) AND (l_receiptdate < '1997-01-01 00:00:00'::timestamp without time zone))
                                       Rows Removed by Filter: 11934691
                                 ->  Parallel Hash  (cost=313722.45..313722.45 rows=3750045 width=20) (actual time=2011.518..2011.518 rows=3000000 loops=5)
                                       Buckets: 65536  Batches: 256  Memory Usage: 3840kB
                                       ->  Parallel Seq Scan on orders  (cost=0.00..313722.45 rows=3750045 width=20) (actual time=0.029..995.948 rows=3000000 loops=5)
 Planning Time: 0.977 ms
 Execution Time: 7923.770 ms

Query 12 from TPC-H is a good illustration for a parallel hash join. Each worker helps to build a shared hash table.

Merge Join

Due to the nature of merge join it’s not possible to make it parallel. Don’t worry if it’s the last stage of the query execution—you can still can see parallel execution for queries with a merge join.

-- Query 2 from TPC-H
explain (costs off) select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
from    part, supplier, partsupp, nation, region
where
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 36
        and p_type like '%BRASS'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'AMERICA'
        and ps_supplycost = (
                select
                        min(ps_supplycost)
                from    partsupp, supplier, nation, region
                where
                        p_partkey = ps_partkey
                        and s_suppkey = ps_suppkey
                        and s_nationkey = n_nationkey
                        and n_regionkey = r_regionkey
                        and r_name = 'AMERICA'
        )
order by s_acctbal desc, n_name, s_name, p_partkey
LIMIT 100;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Limit
   ->  Sort
         Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
         ->  Merge Join
               Merge Cond: (part.p_partkey = partsupp.ps_partkey)
               Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
               ->  Gather Merge
                     Workers Planned: 4
                     ->  Parallel Index Scan using part_pkey on part
                           Filter: (((p_type)::text ~~ '%BRASS'::text) AND (p_size = 36))
               ->  Materialize
                     ->  Sort
                           Sort Key: partsupp.ps_partkey
                           ->  Nested Loop
                                 ->  Nested Loop
                                       Join Filter: (nation.n_regionkey = region.r_regionkey)
                                       ->  Seq Scan on region
                                             Filter: (r_name = 'AMERICA'::bpchar)
                                       ->  Hash Join
                                             Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
                                             ->  Seq Scan on supplier
                                             ->  Hash
                                                   ->  Seq Scan on nation
                                 ->  Index Scan using idx_partsupp_suppkey on partsupp
                                       Index Cond: (ps_suppkey = supplier.s_suppkey)
               SubPlan 1
                 ->  Aggregate
                       ->  Nested Loop
                             Join Filter: (nation_1.n_regionkey = region_1.r_regionkey)
                             ->  Seq Scan on region region_1
                                   Filter: (r_name = 'AMERICA'::bpchar)
                             ->  Nested Loop
                                   ->  Nested Loop
                                         ->  Index Scan using idx_partsupp_partkey on partsupp partsupp_1
                                               Index Cond: (part.p_partkey = ps_partkey)
                                         ->  Index Scan using supplier_pkey on supplier supplier_1
                                               Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
                                   ->  Index Scan using nation_pkey on nation nation_1
                                         Index Cond: (n_nationkey = supplier_1.s_nationkey)

The “Merge Join” node is above “Gather Merge”. Thus merge is not using parallel execution. But the “Parallel Index Scan” node still helps with the part_pkey segment.

Partition-wise join

PostgreSQL 11 disables the partition-wise join feature by default. Partition-wise join has a high planning cost. Joins for similarly partitioned tables could be done partition-by-partition. This allows postgres to use smaller hash tables. Each per-partition join operation could be executed in parallel.

tpch=# set enable_partitionwise_join=t;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
                    QUERY PLAN
---------------------------------------------------
 Append
   ->  Hash Join
         Hash Cond: (t2.b = t1.a)
         ->  Seq Scan on prt2_p1 t2
               Filter: ((b >= 0) AND (b <= 10000))
         ->  Hash
               ->  Seq Scan on prt1_p1 t1
                     Filter: (b = 0)
   ->  Hash Join
         Hash Cond: (t2_1.b = t1_1.a)
         ->  Seq Scan on prt2_p2 t2_1
               Filter: ((b >= 0) AND (b <= 10000))
         ->  Hash
               ->  Seq Scan on prt1_p2 t1_1
                     Filter: (b = 0)
tpch=# set parallel_setup_cost = 1;
tpch=# set parallel_tuple_cost = 0.01;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
                        QUERY PLAN
-----------------------------------------------------------
 Gather
   Workers Planned: 4
   ->  Parallel Append
         ->  Parallel Hash Join
               Hash Cond: (t2_1.b = t1_1.a)
               ->  Parallel Seq Scan on prt2_p2 t2_1
                     Filter: ((b >= 0) AND (b <= 10000))
               ->  Parallel Hash
                     ->  Parallel Seq Scan on prt1_p2 t1_1
                           Filter: (b = 0)
         ->  Parallel Hash Join
               Hash Cond: (t2.b = t1.a)
               ->  Parallel Seq Scan on prt2_p1 t2
                     Filter: ((b >= 0) AND (b <= 10000))
               ->  Parallel Hash
                     ->  Parallel Seq Scan on prt1_p1 t1
                           Filter: (b = 0)

Above all, a partition-wise join can use parallel execution only if partitions are big enough.

Parallel Append

Parallel Append partitions work instead of using different blocks in different workers. Usually, you can see this with UNION ALL queries. The drawback – less parallelism, because every worker could ultimately work for a single query.

There are just two workers launched even with four workers enabled.

tpch=# explain (costs off) select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day union all select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '2000-12-01' - interval '105' day;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Gather
   Workers Planned: 2
   ->  Parallel Append
         ->  Aggregate
               ->  Seq Scan on lineitem
                     Filter: (l_shipdate <= '2000-08-18 00:00:00'::timestamp without time zone)
         ->  Aggregate
               ->  Seq Scan on lineitem lineitem_1
                     Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)

Most important variables

  • WORK_MEM limits the memory usage of each process! Not just for queries: work_mem * processes * joins => could lead to significant memory usage.
  • max_parallel_workers_per_gather  – how many workers an executor will use for the parallel execution of a planner node
  • max_worker_processes – adapt the total number of workers to the number of CPU cores installed on a server
  • max_parallel_workers – same for the number of parallel workers

Summary

Starting from 9.6 parallel queries execution could significantly improve performance for complex queries scanning many rows or index records. In PostgreSQL 10, parallel execution was enabled by default. Do not forget to disable parallel execution on servers with a heavy OLTP workload. Sequential scans or index scans still consume a significant amount of resources. If you are not running a report against the whole dataset, you may improve query performance just by adding missing indexes or by using proper partitioning.

References


Image compiled from photos by Nathan Gonthier and Pavel Nekoranec on Unsplash

Jan
28
2019
--

Monitor and Optimize Slow Queries with PMM and EverSQL – Part 2

percona_pmm_eversql

EverSQL is a platform that intelligently tunes your SQL queries by providing query optimization recommendations, and feedback on missing indexes. This is the second post of our EverSQL series, if you missed our introductory post take a look there first and then come back to this article.

We’ll use the Stackoverflow data set again as we did in our first post.

Diving into query optimization

We’ll grab the worst performing query in the list from PMM and optimize it. This query builds a list of the top 50 most recent posts which have a score greater than two, and involves joining two large tables – posts and comments. The original runtime of that query is above 20 minutes and causes high load on the server while running.

worst-query-in-PMM

Assuming you have EverSQL’s chrome extension installed, you’ll see a new button in the PMM Query Analytics page, allowing you to send the query and schema structure directly to EverSQL, to retrieve indexing and query optimization recommendations.

eversql recommendations

 

eversql-dashboard1

After implementing EverSQL’s recommendations, the query’s execution duration significantly improved:

improved-query-response-time

Optimization Internals

So what was the actual optimization in this specific case? And why did it work so well? Let’s look at the original query:

SELECT
   p.title
FROM
   so.posts p
       INNER JOIN
   so.comments c ON p.id = c.postid
WHERE
c.score > 2
GROUP BY p.id
ORDER BY p.creationdate DESC
LIMIT 100;

The tables’ structure:

CREATE TABLE `posts` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  ...
  PRIMARY KEY (`Id`),
  KEY `posts_idx_creationdate` (`CreationDate`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `comments` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  `PostId` int(11) NOT NULL,
  `Score` int(11) DEFAULT NULL,
  ....
  PRIMARY KEY (`Id`),
  KEY `comments_idx_postid` (`PostId`),
  KEY `comments_idx_postid_score` (`PostId`,`Score`),
  KEY `comments_idx_score` (`Score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This query will return the post title of the latest 100 stackoverflow posts, which had at least one popular comment (with a score higher than two). The posts table contains 39,646,923 records, while the comments table contains 64,510,258 records.

This is the execution plan MySQL (v5.7.20) chose:

original-execution-plan

One of the challenges with this query is that the GROUP BY and ORDER BY clauses contain different fields, which prevent MySQL from using an index for the ORDER BY. As MySQL’s documentation states:

“In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it may still use indexes to find the rows that match the WHERE clause. Examples:  … The query has different ORDER BY and GROUP BY expressions.”.

Now let’s look into the optimized query:

SELECT
   p.title
FROM
   so.posts p
WHERE
   EXISTS( SELECT
           1
       FROM
           so.comments c
       WHERE
           p.id = c.postid AND c.score > 2)
ORDER BY p.creationdate DESC
LIMIT 100;

Since the comments table is joined in this query only to check for existence of matching records in the posts table, we can use an EXISTS subquery instead. This will allow us to avoid inflating the results (by using JOIN) and then deflating them (by using GROUP BY), which are costly operations.

Now that the GROUP BY is redundant and removed, the database can optionally choose to use an index for the ORDER BY clause.

The new execution plan MySQL chooses is:

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms. We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms.

We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

Co-Author: Tomer Shay

Tomer Shay, EverSQL

 

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

Jan
22
2019
--

Monitor and Optimize Slow Queries with PMM and EverSQL – Part One

PMM EverSQL optimization integration

A common challenge with continuously deployed applications is that new and modified SQL queries are constantly being introduced to the application. Many companies choose to use a database monitoring system (such as PMM) to identify those slow queries. But identifying slow queries is only the start – what about actually optimizing them?

In this post we’ll demonstrate a new way to both identify and optimize slow queries, by utilizing the recent integration of Percona Monitoring and Management with EverSQL Query Optimizer via Chrome browser extension. This integration allows you to identify slow queries using PMM, and optimize them automatically using EverSQL Query Optimizer.

Optimizing queries with PMM & EverSQL

We’re using PMM to monitor our MySQL instance, which was pre-loaded with the publicly available StackOverflow dataset. PMM is configured to monitor for slow queries from MySQL’s slow log file.

monitor slow queries dashboard on PMM

We’ll begin with a basic example of how EverSQL can provide value for  a simple SELECT statement. In a follow-up blog post we’ll go through a more sophisticated multi-table query to show how response time can be reduced from 20 minutes to milliseconds(!) using EverSQL.

Let’s have a look at one of the slow queries identified by PMM:

PMM EverSQL optimization integration

In this example, the table posts contains two indexes by default (in addition to the primary key). One that contains the column AnswerCount, and the other contains the column CreationDate.

CREATE TABLE `posts` (
 `Id` int(11) NOT NULL,
 `AcceptedAnswerId` int(11) DEFAULT NULL,
 `AnswerCount` int(11) DEFAULT NULL,
 `ClosedDate` datetime DEFAULT NULL,
 ….
 `CreationDate` datetime NOT NULL,
  ….
 `ViewCount` int(11) NOT NULL,
 PRIMARY KEY (`Id`),
 KEY `posts_idx_answercount` (`AnswerCount`),
 KEY `posts_idx_creationdate` (`CreationDate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

As you can see below, EverSQL identifies that a composite index which contains both columns can be more beneficial in this case, and recommends to add an index for posts(AnswerCount, CreationDate).

EverSQL optimization report

After using pt-online-schema-change to apply the schema modification, using PMM we are able to observe that the query execution duration changed from 3m 40s to 83 milliseconds!

execution time improvement with EverSQL

 

Note that this Extension is available for Chrome from the chrome web store:

EverSQL for Database Monitoring Applications

Summary

If you’re looking for an easy way to both monitor for slow queries and quickly optimize them, consider deploying Percona Monitoring and Management and then integrating it with EverSQL’s Chrome extension!

Co-Author: Tomer Shay

Tomer Shay, EverSQL

 

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

 

Mar
14
2014
--

Tools and tips for analysis of MySQL’s Slow Query Log

MySQL's Slow Query LogMySQL has a nice feature, slow query log, which allows you to log all queries that exceed a predefined about of time to execute. Peter Zaitsev first wrote about this back in 2006 – there have been a few other posts here on the MySQL Performance Blog since then (check this and this, too) but I wanted to revisit his original subject in today’s post.

Query optimization is essential for good database server performance and usually DBAs need to ensure the top performance possible for all queries. In MySQL, the desirable way is to generate a query log for all running queries within a specific time period and then run a query analysis tool to identify the bad queries. Percona Toolkit’s pt-query-digest is one of the most powerful tools for SQL analysis. That’s because pt-query-digest can generate a very comprehensive report that spots problematic queries very efficiently. It works equally well with Oracle MySQL server. This post will focus mainly on pt-query-digest.

Slow query log is great at spotting really slow queries that are good candidates for optimization. Beginning with MySQL 5.1.21, the minimum value is 0 for long_query_time, and the value can be specified to a resolution of microseconds. In Percona Server additional statistics may be output to the slow query log. You can find the full details here. For our clients, we often need to identify queries that impact an application the most. It does not always have to be the slowest queries – queries that runs more frequently with lower execution time per call put more load on a server than queries running with lower frequency. We of course want to get rid of really slow queries but to really optimize application throughput, we also need to investigate queries that generate most of the load. Further, if you enable option log_queries_not_using_indexes  then MySQL will log queries doing full table scans which doesn’t always reflect that the query is slow, because in some situations the query optimizer chooses full table scan rather than using any available index or probably showing all records from a small table.

Our usual recommendation is to generate the slow log with long_query_time=0. This will record all the traffic but this will be I/O intensive and will eat up disk space very quickly depending on your workload. So beware of running with long_query_time=0 for only a specific period of time and revert it back to logging only very slow queries. In Percona Server there is nice option where you can limit the rate of logging, log_slow_rate_limit is the option to handle it. Filtering slow query log is very helpful too in some cases e.g. if we know the main performance issue is table scans we can log queries only doing full table scans or if we see I/O is bottleneck we can collect queries doing full scans and queries creating on disk temporary tables. Again, this is only possible in Percona Server with the log_slow_filter option. Also, you may want to collect everything on slow query log and then filter with pt-query-digest. Depending on I/O capacity, you might prefer one or another way, as collecting everything in slow query log allows us to investigate other queries too if needed. Finally, use pt-query-digest to generate an aggregate report over slow query log which highlights the problematic part very efficiently. Again, pt-query-digest can bring up server load high so our usual recommendation on it is to move slow query log to some staging/dev server and run pt-query-digest over there to generate the report.

Note: changing the long_query_time parameter value only affects newly created connections to log queries exceeds long_query_time threshold. In Percona Server there is feature which changes variable scope to global instead of local. Enabling slow_query_log_use_global_control  log queries for connected sessions too after changing long_query_time parameter threshold. You can read more about this patch here.

I am not going to show you a detailed report of pt-query-digest and explain each part of it here, because it is well defined already by my colleague Ovais Tariq in this post. However, I will show you some of the other aspects of pt-query-digest tool here.

Let me show you code snippets that enable slow query log for only a specific time period with long_query_time=0 and log_slow_verbosity to ‘full’. log_slow_verbosity is a Percona Server variable which logs extra stats such as information on query cache, Filesort, temporary tables, InnoDB statistics etc. Once you are done collecting logs, revert back the values for long_query_time to the previous value, and finally run pt-query-digest on the log to generate report. Note: run the below code in same MySQL session.

-- Save previous settings
mysql> SELECT @@global.log_slow_verbosity INTO @__log_slow_verbosity;
mysql> SELECT @@global.long_query_time INTO @__long_query_time;
mysql> SELECT @@global.slow_query_log INTO @__slow_query_log;
mysql> SELECT @@global.log_slow_slave_statements INTO @__log_slow_slave_statements;
-- Keep this in safe place, we'll need to run pt-query-digest
mysql> SELECT NOW() AS "Time Since";
-- Set values to enable query collection
mysql> SET GLOBAL slow_query_log_use_global_control='log_slow_verbosity,long_query_time';
mysql> SET GLOBAL log_slow_verbosity='full';
mysql> SET GLOBAL slow_query_log=1;
mysql> SET GLOBAL long_query_time=0;
mysql> SET GLOBAL log_slow_slave_statements=1;
-- Verify settings are OK
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity;
-- wait for 30 - 60 minutes
-- Keep this one too, also for pt-query-digest
mysql> SELECT NOW() AS "Time Until";
-- Revert to previous values
mysql> SET GLOBAL slow_query_log=@__slow_query_log;
mysql> SET GLOBAL long_query_time=@__long_query_time;
mysql> SET GLOBAL log_slow_verbosity=@__log_slow_verbosity; -- if percona server
mysql> SET GLOBAL log_slow_slave_statements=@__log_slow_slave_statements;
-- Verify settings are back to previous values
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity, @@global.slow_query_log_file;
-- Then with pt-query-digest run like (replace values for time-since, time-until and log name)
$ pt-query-digest --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out
-- If you're not using Percona Server then you need to remove all references to log_slow_verbosity, slow_query_log_use_global_control and log_slow_slave_statements (priot MySQL 5.6).

My colleague Bill Karwin wrote bash script that does almost the same as the above code. You can find the script to collect slow logs here. This script doesn’t hold connection to the database session while you wait for logs to accumulate and it sets all the variables back to the state they were before. For full documentation view this.

Further, you can also get explain output into the report from the pt-query-digest tool. For that you need to use –explain parameter similar to as follows.

$ pt-query-digest --explain u=<user>,p=<password>,h=<hostname> /path/to/slow.log > /path/to/report.out

Explain output in query report will get you all the information for query execution plan and explain output signal towards how that particular query going to be executed. Note that, if you execute pt-query-digest over slow query log other than originated server of slow query log as I mentioned above e.g. staging/dev you may get different execution path for the query in the report or lower number of rows to examined, etc., because usually staging/dev servers has different data distribution, different MySQL versions, or different indexes. MySQL explain adds overhead as queries needs to be prepared on the server to generate intended query execution path. For this reason, you may want to run pt-query-digest with –explain on a production replica.

It’s worth mentioning that logging queries with log_slow_verbosity in Percona Server is really handy as it shows lots of additional statistics and it is more helpful in situations when the explain plan reports a different execution path than when the query is executed. On that particular topic, you may want to check this nice post.

pt-query-digest also supports filters. You can read more about it here. Let me show you an example. The following command will discard everything apart from insert/update/delete queries in pt-query-digest output report.

$ pt-query-digest --filter '$event->{arg} =~ m/^(insert|update|delete)/i' --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out

If you’re looking for some GUI tools for pt-query-digest then I would recommend reading this nice blogpost from my colleague Roman. Further, our CEO Peter Zaitsev also wrote a post recently where he shows the comparison between performance_schema and slow query log. Check here for details.

In related new, Percona recently announced Percona Cloud Tools, the next generation of tools for MySQL. It runs a client-side agent (pt-agent) which runs pt-query-digest on the server with some intervals and uploads the aggregated data to the Percona Cloud Tools API which process it further.  Query Analytics is one tool from the Percona Cloud Tools that provides advanced query metrics. It  is a nice visualization tool. You may be interested to learn more about it here, and it’s also worth viewing this related webinar about Percona Cloud Tools from our CTO Vadim Tkachenko.

Conclusion:
pt-query-digest from Percona Toolkit is a versatile (and free) tool for slow query log analysis. It provides good insight about every individual query, especially in Percona Server with log_slow_verbosity enabled, e.g. log queries with microsecond precision, log information about the query’s execution plan. On top of that, Percona Cloud Tools includes Query Analytics which provides you with good visuals about query performance and also provides a view of historical data.

The post Tools and tips for analysis of MySQL’s Slow Query Log appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com