Oct
08
2018
--

Detailed Logging for Enterprise-Grade PostreSQL

detailed logging PostgreSQL

PostgreSQL® logoIn this penultimate post from our series on building an enterprise-grade PostgreSQL environment we cover the parameters we have enabled to configure detailed logging in the demo setup we will showcase in our upcoming webinar.

Detailed logging in PostgreSQL and log analyzer

Like other RDBMS, PostgreSQL allows you to maintain a log of activities and error messages. Until PostgreSQL 9.6, PostgreSQL log files were generated in pg_log directory (inside the data directory) by default. Since PostgreSQL 10, pg_log has been renamed to simply log. However, this directory can be modified to a different location by modifying the parameter log_directory.

Unlike MySQL, PostgreSQL writes the error and activity log to the same log file thus it may grow to several GBs when detailed logging is enabled. In these cases, logging becomes IO-intensive thus it is recommended to store log files in a different storage to the one hosting the data directory.

Parameters to enable detailed logging

Here’s a list of parameters used to customize logging in PostgreSQL. All of them need to be modified in the postgresql.conf or postgresql.auto.conf files.

logging_collector: in order to log any activity in PostgreSQL this parameter must be enabled. The backend process responsible for logging database activity is called logger, it gets started when logging_collector is set to ON. Changing this parameter requires a PostgreSQL restart.

log_min_duration_statement: this parameter is used primarily to set a time threshold: queries running longer than such should be logged (as “slow queries”). Setting it to -1 disables logging of statements. Setting it to 0 enables the logging of every statement running in the database, regardless of its duration. The time unit should follow the actual value, for example: 250ms,  250s, 250min, 1h. Changing this parameter does not require a PostgreSQL restart – a simple reload of the configuration is enough.reload but not a restart. For example:

log_min_duration_statement = 5s

  logs every statement running for 5 seconds or longer.

log_line_prefix: helps you customize every log line being printed in the PostgreSQL log file. You can log the process id, application name, database name and other details for every statement as required. The following log_line_prefix may be helpful in most scenarios:

log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h'

The above setting records the following for every statement being logged:

%t : Time stamp without milliseconds
%p : Process id
%l-1 : Number of the log line for each session or process, starting at 1
%u : User name
%d : Database name
%a : Application name
%h : Remote host name or IP address

With the above settings employed for log_line_prefix, the log appears as follows:

2018-06-08 12:36:26 EDT [21315]: [1-1] user=postgres,db=percona,app=psql,client=192.168.0.12 LOG: duration: 2.611 ms statement: select * from departments where id = 26;
2018-06-08 12:36:49 EDT [21323]: [1-1] user=postgres,db=percona,app=psql,client=192.168.0.12 LOG: duration: 2.251 ms statement: select count(*) from departments;

To see more on log_line_prefix, You can refer to the PostgreSQL documentation for further details on this feature.

log_duration: the enabling of this parameter records the duration of every completed statement in PostgreSQL log, and this irrespective of log_min_duration_statement. Have in mind that, as it happens with log_min_duration_statement, the enabling of log_duration may increase log file usage and add affect the server’s general performance. For this reason, if you already have log_min_duration_statement enabled it is often suggested to disable log_duration, unless there’s a specific need to keep track of both.

log_lock_waits: when log_lock_waits is enabled a log message is recorded when a session waits longer than deadlock_timeout to acquire a lock.

log_checkpoints: logs all checkpoints and restart points to the PostgreSQL log file.

log_rotation_size: defines the size limit for each log file; once it reaches this threshold the log file is rotated.
Example: 

log_rotation_size = '500MB'

  – every log file is limited to a size of 500 MB.

log_rotation_age: determines the maximum life span for a log file, forcing its rotation once this threshold is reached. This parameter is usually set in terms of hours, or maybe days, however the minimum granularity is a minute. However, if log_rotation_size is reached first, the log gets rotated anyway, irrespective of this setting.
Example: 

log_rotation_age = 1d

log_statement: controls what type of SQLs are logged. The recommended setting is DDL, which logs all DDLs that are executed. Tracking those allow you to later audit when a given DDL was executed, and by who. By monitoring and understanding the amount of information it may write to the log file you may consider modifying this setting. Other possible values are none, mod (includes DDLs plus DMLs) and all.

log_temp_files: logs information related to a temporary table file whose size is greater than this value (in KBs).

log_directory: defines the directory in which log files are created. Once more, please note that if you have enabled detailed logging it is recommended to have a separate disk—different from the data directory disk—allocated for log_directory.
Example: 

log_directory = /this_is_a_new_disk/pg_log

Log Analyzer – pgBadger

You cannot have a separate error log and a slow query log. Everything is written to one log file, which may be periodically rotated based on time and size. Over a period of time, this log file size may increase to several MB’s or even GB’s depending on the amount of logging that has been enabled. It could get difficult for a DBA/developer to parse the log files and get a better view about what is running slowly and how many times a query has run. To help with this taks you may use pgBadger, a log analyzer for PostgreSQL, to parse log files and generate a rich HTML-based report that you can access from a browser. An example report can be seen in the screenshots below:

We’ll be showing detailed logging and pgBadger in action in a couple of days so, if there’s still time, sign up for our October webinar !

Oct
05
2018
--

PostgreSQL Extensions for an Enterprise-Grade System

PostgreSQL extensions for logging

PostgreSQL® logoIn this current series of blog posts we have been discussing various relevant aspects when building an enterprise-grade PostgreSQL setup, such as security, back up strategy, high availability, and different methods to scale PostgreSQL. In this blog post, we’ll get to review some of the most popular open source extensions for PostgreSQL, used to expand its capabilities and address specific needs. We’ll cover some of them during a demo in our upcoming webinar on October 10.

Expanding with PostgreSQL Extensions

PostgreSQL is one of the world’s most feature-rich and advanced open source RDBMSs. Its features are not just limited to those released by the community through major/minor releases. There are hundreds of additional features developed using the extensions capabilities in PostgreSQL, which can cater to needs of specific users. Some of these extensions are very popular and useful to build an enterprise-grade PostgreSQL environment. We previously blogged about a couple of FDW extensions (mysql_fdw and postgres_fdw ) which will allow PostgreSQL databases to talk to remote homogeneous/heterogeneous databases like PostgreSQL and MySQL, MongoDB, etc. We will now cover a few other additional extensions that can expand your PostgreSQL server capabilities.

pg_stat_statements

The pg_stat_statements module provides a means for tracking execution statistics of all SQL statements executed by a server. The statistics gathered by the module are made available via a view named pg_stat_statements. This extension must be installed in each of the databases you want to track, and like many of the extensions in this list, it is available in the contrib package from the PostgreSQL PGDG repository.

pg_repack

Tables in PostgreSQL may end up with fragmentation and bloat due to the specific MVCC implementation in PostgreSQL, or simply due to a high number of rows being naturally removed. This could lead to not only unused space being held inside the table but also to sub-optimal execution of SQL statements. pg_repack is the most popular way to address this problem by reorganizing and repacking the table. It can reorganize the table’s content without placing an exclusive lock on it during the process. DMLs and queries can continue while repacking is happening.  Version 1.2 of pg_repack introduces further new features of parallel index builds, and the ability to rebuild just the indexes. Please refer to the official documentation for more details.

pgaudit

PostgreSQL has a basic statement logging feature. It can be implemented using the standard logging facility with

log_statement = all

 . But this is not sufficient for many audit requirements. One of the essential features for enterprise deployments is the capability for fine-grained auditing the user interactions/statements issued to the database. This is a major compliance requirement for many security standards. The pgaudit extension caters to these requirements.

The PostgreSQL Audit Extension (pgaudit) provides detailed session and/or object audit logging via the standard PostgreSQL logging facility. Please refer to the settings section of its official documentation for more details.

pldebugger

This is a must-have extension for developers who work on stored functions written in PL/pgSQL. This extension is well integrated with GUI tools like pgadmin, which allows developers to step through their code and debug it. Packages for pldebugger are also available in the PGDG repository and installation is straightforward.Once it is set up, we can step though and debug the code remotely.

The official git repo is available here

plprofiler

This is a wonderful extension for finding out where the code is slowing down. This is very helpful, particularly during complex migrations from proprietary databases, like from Oracle to PostgreSQL, which affect application performance. This extension can prepare a report on the overall execution time and tables representation, including flamegraphs, with clear information about each line of code. This extension is not, however, available from the PGDG repo: you will need to build it from source. Details on building and installing plprofiler will be covered in a future blog post. Meanwhile, the official repository and documentation is available here

PostGIS

PostGIS is arguably the most versatile implementation of the specifications of the Open Geospatial Consortium. We can see a large list of features in PostGIS that are rarely available in any other RDBMSs.

There are many users who have primarily opted to use PostgreSQL because of the features supported by PostGIS. In fact, all these features are not implemented as a single extension, but are instead delivered by a collection of extensions. This makes PostGIS one of the most complex extensions to build from source. Luckily, everything is available from the PGDG repository:

$ sudo yum install postgis24_10.x86_64

Once the postgis package is installed, we are able to create the extensions on our target database:

postgres=# CREATE EXTENSION postgis;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_topology;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_sfcgal;
CREATE EXTENSION
postgres=# CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION
postgres=# CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION
postgres=# CREATE EXTENSION address_standardizer;
CREATE EXTENSION

Language Extensions : PL/Python, PL/Perl, PL/V8,PL/R etc.

Another powerful feature of PostgreSQL is its programming languages support. You can code database functions/procedures in pretty much every popular language.

Thanks to the enormous number of libraries available, which includes machine learning ones, and its vibrant community, Python has claimed the third spot amongst the most popular languages of choice according to the TIOBE Programming index. Your team’s skills and libraries remain valid for PostgreSQL server coding too! Teams that regularly code in JavaScript for Node.js or Angular can easily write PostgreSQL server code in PL/V8. All of the packages required are readily available from the PGDG repository.

cstore_fdw

cstore_fdw is an open source columnar store extension for PostgreSQL. Columnar stores provide notable benefits for analytics use cases where data is loaded in batches. Cstore_fdw’s columnar nature delivers performance by only reading relevant data from disk. It may compress data by 6 to 10 times to reduce space requirements for data archive. The official repository and documentation is available here

HypoPG

HypoPG is an extension for adding support for hypothetical indexes – that is, without actually adding the index. This helps us to answer questions such as “how will the execution plan change if there is an index on column X?”. Installation and setup instructions are part of its official documentation

mongo_fdw

Mongo_fdw presents collections from mongodb as tables in PostgreSQL. This is a case where the NoSQL world meets the SQL world and features combine. We will be covering this extension in a future blog post. The official repository is available here

tds_fdw

Another important FDW (foreign data wrapper) extension in the PostgreSQL world is tds_fdw. Both Microsoft SQL Server and Sybase uses TDS (Tabular Data Stream) format. This fdw allows PostgreSQL to use tables stored in remote SQL Server or Sybase database as local tables. This FDW make use of FreeTDS libraries.

orafce

As previously mentioned, there are lot of migrations underway from Oracle to PostgreSQL. Incompatible functions in PostgreSQL are often painful for those who are migrating server code. The “orafce” project implements some of the functions from the Oracle database. The functionality was verified on Oracle 10g and the module is useful for production work. Please refer to the list in its official documentation about the Oracle functions implemented in PostgreSQL

TimescaleDB

In this new world of IOT and connected devices, there is a growing need of time-series data. Timescale can convert PostgreSQL into a scalable time-series data store. The official site is available here with all relevant links.

pg_bulkload

Is loading a large volume of data into database in a very efficient and faster way a challenge for you? If so pg_bulkload may help you solve that problem. Official documentation is available here

pg_partman

PostgreSQL 10 introduced declarative partitions. But creating new partitions and maintaining existing ones, including purging unwanted partitions, requires a good dose of manual effort. If you are looking to automate part of this maintenance you should have a look at what pg_partman offers. The repository with documentation is available here.

wal2json

PostgreSQL has feature related to logical replication built-in. Extra information is recorded in WALs which will facilitate logical decoding. wal2json is a popular output plugin for logical decoding. This can be utilized for different purposes including change data capture. In addition to wal2json, there are other output plugins: a concise list is available in the PostgreSQL wiki.

There are many more extensions that help us build an enterprise-grade PostgreSQL set up using open source solutions. Please feel free to comment and ask us if we know about one that satisfies your particular needs. Or, if there’s still time, sign up for our October webinar and ask us in person!

Apr
05
2017
--

Evaluation of PMP Profiling Tools

PMP Profiling Tools

In this blog post, we’ll look at some of the available PMP profiling tools.

While debugging or analyzing issues with Percona Server for MySQL, we often need a quick understanding of what’s happening on the server. Percona experts frequently use the pt-pmp tool from Percona Toolkit (inspired by http://poormansprofiler.org).

The

pt-pmp

 tool collects application stack traces GDB and then post-processes them. From this you get a condensed, ordered list of the stack traces. The list helps you understand where the application spent most of the time: either running something or waiting for something.

Getting a profile with

pt-pmp

 is handy, but it has a cost: it’s quite intrusive. In order to get stack traces, GDB has to attach to each thread of your application, which results in interruptions. Under high loads, these stops can be quite significant (up to 15-30-60 secs). This means that the

pt-pmp

 approach is not really usable in production.

Below I’ll describe how to reduce GDB overhead, and also what other tools can be used instead of GDB to get stack traces.

  • GDB
    By default, the symbol resolution process in GDB is very slow. As a result, getting stack traces with GDB is quite intrusive (especially under high loads).There are two options available that can help notably reduce GDB tracing overhead:

      1. Use readnever patch. RHEL and other distros based on it include GDB with the readnever patch applied. This patch allows you to avoid unnecessary symbol resolving with the 
        --readnever

         option. As a result you get  up to 10 times better speed.

      2. Use gdb_index. This feature was added to address symbol resolving issue by creating and embedding a special index into the binaries. This index is quite compact: I’ve created and embedded gdb_index for Percona server binary (it increases the size around 7-8MB). The addition of the gdb_index speeds up obtaining stack traces/resolving symbols two to three times.
    # to check if index already exists:
      readelf -S  | grep gdb_index
    # to generate index:
      gdb -batch mysqld -ex "save gdb-index /tmp" -ex "quit"
    # to embed index:
      objcopy --add-section .gdb_index=tmp/mysqld.gdb-index --set-section-flags .gdb_index=readonly mysqld mysqld
  • eu-stack (elfutils)
    The eu-stack from the elfutils package prints the stack for each thread in a process or core file.Symbol resolving also is not very optimized in eu-stack. By default, if you run it under load it will take even more time than GDB. But eu-stack allows you to skip resolving completely, so it can get stack frames quickly and then resolve them without any impact on the workload later.
  • Quickstack
    Quickstack is a tool from Facebook that gets stack traces with minimal overheads.

Now let’s compare all the above profilers. We will measure the amount of time it needs to take all the stack traces from Percona Server for MySQL under a high load (sysbench OLTP_RW with 512 threads).

The results show that eu-stack (without resolving) got all stack traces in less than a second, and that Quickstack and GDB (with the readnever patch) got very close results. For other profilers, the time was around two to five times higher. This is quite unacceptable for profiling (especially in production).

There is one more note regarding the

pt-pmp

 tool. The current version only supports GDB as the profiler. However, there is a development version of this tool that supports GDB, Quickstack, eu-stack and eu-stack with offline symbol resolving. It also allows you to look at stack traces for specific threads (tids). So for instance, in the case of Percona Server for MySQL, we can analyze just the purge, cleaner or IO threads.

Below are the command lines used in testing:

# gdb & gdb+gdb_index
  time gdb  -ex "set pagination 0" -ex "thread apply all bt" -batch -p `pidof mysqld` > /dev/null
# gdb+readnever
  time gdb --readnever -ex "set pagination 0" -ex "thread apply all bt" -batch -p `pidof mysqld` > /dev/null
# eu-stack
  time eu-stack -s -m -p `pidof mysqld` > /dev/null
# eu-stack without resolving
  time eu-stack -q -p `pidof mysqld` > /dev/null
# quickstack - 1 sample
  time quickstack  -c 1 -p `pidof mysqld` > /dev/null
# quickstack - 1000 samples
  time quickstack  -c 1000 -p `pidof mysqld` > /dev/null

Dec
16
2015
--

Using dbsake to recover table structure from .frm files and process mysqldump output

dbsakeWe work on data recoveries quite often. In many cases, we recover table structures from the .frm files because there was no backup available. There is already a great blog post by my colleague Miguel Ángel Nieto about how we can recover structures from .frm files using MySQL utilities.

This works pretty well and we prefer to run mysqlfrm with the “–server” option to get all possible information from a .frm file. However, this option expects that MySQL is up and running so that mysqlfrm can spawn a new MySQL instance, and run the structure recovery there.

Recently I came across a tool that makes this job easier. The name of tool is dbsake, it’s a collection of command-line tools that perform various DBA related tasks for MySQL. In this blog, we will look at two very useful dbsake commands.

Installation is very easy and straightforward. It’s in an executable python zip archive with all dependencies included.

# curl -s http://get.dbsake.net > dbsake
# chmod u+x dbsake
# ./dbsake --version
dbsake, version 2.1.0 9525896

Recovering table structures from MySQL .frm files with dbsake

To recover table structures using dbsake, you need to use the “dbsake frmdump” command, followed by the .frm file path. The frmdump command decodes the MySQL .frm file and provides a “CREATE TABLE” or “CREATE VIEW” statement in the output. The good thing is that it doesn’t require a running a MySQL server instance, and interprets the .frm file according to rules similar to the MySQL server.

Let’s see an example:

# ./dbsake frmdump /var/lib/mysql/sakila/staff.frm
--
-- Table structure for table `staff`
-- Created with MySQL Version 5.6.27
--
CREATE TABLE `staff` (
`staff_id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(45) NOT NULL,
`last_name` varchar(45) NOT NULL,
`address_id` smallint(5) unsigned NOT NULL,
`picture` blob,
`email` varchar(50) DEFAULT NULL,
`store_id` tinyint(3) unsigned NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT '1',
`username` varchar(16) NOT NULL,
`password` varchar(40) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`staff_id`),
KEY `idx_fk_store_id` (`store_id`),
KEY `idx_fk_address_id` (`address_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The result looks pretty good, and has recovered the character set and collation information as well. We can also see the MySQL version (5.6.27) retrieved from the .frm file. It is important to mention that the command only decodes the information available in .frm file, which means that it cannot recover InnoDB foreign-key references and AUTO_INCREMENT values. These items are stored outside of the .frm file.

The frmdump command makes the recovery process easy and faster. We can easily script this and recover the structure of a large number of tables. For example, if we need to recover the structure of all tables from a world database, we can do following:

mysql> create database world_recover;
# for tbl in `ls -1 /var/lib/mysql/world/*.frm`; do ./dbsake frmdump $tbl | mysql world_recover; done;
mysql> show tables from world_recover;
+-------------------------+
| Tables_in_world_recover |
+-------------------------+
| city                    |
| country                 |
| countrylanguage         |
+-------------------------+
3 rows in set (0.00 sec)

Filter and transform a mysqldump stream with dbsake

It’s a very common requirement to filter one or more tables from a mysqldump full database backup. The “dbsake sieve [options]” command helps us to filter or transform mysqldump output.

Let’s see how to extract a single table from a mysqldump file.

# mysqldump world > world.sql
# cat world.sql | ./dbsake sieve -t world.city > world.city.sql
Processed . Output: 1 database(s) 1 table(s) and 0 view(s)

The “-t” or “–table” option tells the command to only output the table matching the given pattern. It will also show the number of databases, tables or views processed in output.

To extract multiple tables, you can pass “-t db.tbl” multiple times.

# cat world.sql | ./dbsake sieve -t world.city -t world.country > world.city_country.sql
Processed . Output: 1 database(s) 2 table(s) and 0 view(s)
# cat world.city_country.sql | grep -i 'create table'
CREATE TABLE `city` (
CREATE TABLE `country` (

The latest Percona server added the new option “–innodb-optimize-keys” in mysqldump. It changes the way InnoDB tables are dumped, so that secondary keys are created after loading the data, thus taking advantage of InnoDB fast index creation. This is a really great feature in that it helps us to restore data more efficiently than the default incremental rebuild that mysqldump performs.

Using the “dbsake sieve [options]” command, we can transform the regular mysqldump output to take advantage of fast index creation. The “–defer-indexes” option rewrites the output of CREATE TABLE statements, and arranges for secondary indexes to be created after the table data is loaded. Similarly the “–defer-foreign-keys” option can be added to add foreign key constraints after loading table data.

Let’s see an example:

# cat world.sql | ./dbsake sieve --defer-indexes --defer-foreign-keys -t world.city  > world.city.sql
Processed . Output: 1 database(s) 1 table(s) and 0 view(s)

This means that world.city.sql will have a table structure with the Primary Key first, then will insert statements to load data, and an additional ALTER TABLE statement to create secondary keys when there is at least one secondary index to be added. Foreign keys will also created with secondary indexes.

The original structure of table world.city:

CREATE TABLE `city` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1;

And the transformation done by dbsake:

...........
CREATE TABLE `city` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1;
...........
LOCK TABLES `city` WRITE;
...........
INSERT INTO `city` VALUES .....................
...........
UNLOCK TABLES;
--
-- InnoDB Fast Index Creation (generated by dbsake)
--
ALTER TABLE `city`
  ADD KEY `CountryCode` (`CountryCode`),
  ADD CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`);
...........

For more sieve command options, please read the online manual.

Conclusion

The dbsake command makes it easier to recover table structures from .frm files. We can also filter and transform the mysqldump output easily without writing a complex awk or sed script. There are some more useful features of this tool that you can read about in the manual.

The post Using dbsake to recover table structure from .frm files and process mysqldump output appeared first on MySQL Performance Blog.

Jul
12
2013
--

innotop 1.9.1 released

Screen Shot 2013-07-12 at 10.45.58Lefred and I spent a bit of time making innotop 1.9.1.
We’ve released a new version mainly to include MySQL 5.6 support as well as including some bugs fixed by Baron Schwartz and Frédéric Descamps.

You can download the .tar.gz and rpm’s (new!) at
http://code.google.com/p/innotop/downloads/list

Bugs fixed:

  • Issue 71: DBI prints to STDERR instead of just throwing an inactive statement error
  • Issue 73: innotop fails to start if perl is not installed under /usr/bin/
  • Issue 74: Use of uninitialized value $obj in regexp compilation at ./innotop-1.9.0/innotop line 6935.
  • Issue 82&83: Some pages broken with MySQL 5.6
  • Issue 70: .spec doesn’t build rpm if someperl modules are missing
  • Issue 84: feature request: Deadlock clearing transactions set sql_log_bin=0

The post innotop 1.9.1 released appeared first on MySQL Performance Blog.

Jan
03
2013
--

Percona Toolkit by example – pt-stalk

pt-stalk recipes: Gather forensic data about MySQL when a server problem occurs

It happens to us all from time to time: a server issue arises that leaves you scratching your head. That’s when Percona Toolkit’s pt-stalk comes into play, helping you diagnose the problem by capturing diagnostic data that helps you pinpoint what’s causing the havoc hitting your database.

From the documentation (http://www.percona.com/doc/percona-toolkit/pt-stalk.html):

pt-stalk watches for a trigger condition to become true, and then collects data to help in diagnosing problems. It is designed to run as a daemon with root privileges, so that you can diagnose intermittent problems that you cannot observe directly. You can also use it to execute a custom command, or to gather the data on demand without waiting for the trigger to happen.

There are some common options that you can use in all examples, so I recommend you to read the documentation if you have any specific questions.

Be prepared! It’s wise to have pt-stalk running 24/7, because problems such as MySQL lockups or spikes of activity typically leave no evidence to use in root cause analysis. By continuously running pt-stalk, you’ll have the data it gathers when the trouble occurs.

Let’s look at some specific “pt-stalk recipes.”

Just collect the information:

pt-stalk will collect the information and will exit after that.

$ pt-stalk --no-stalk  -- --defaults-file=./my.default.cnf

Every hour for one day:

Collect the information every one hour (–sleep=3600) 24 times (–iterations=24) without wait for any condition (–threshold=0) and run in background (–daemonize).

$ pt-stalk --sleep=3600 --threshold=0 --iterations=24 \
  --daemonize -- --defaults-file=./my.default.cnf

A host has more than 10 connections:

Collect the all information when the server 10.0.0.23 (–match 10.0.0.23) have more than 10 (–threshold 10) connections opened. You can use any variable from the “show processlist” command, in this case, I’m using the “Host” variable.

$ pt-stalk --function processlist --variable Host\
  --match 10.0.0.23 --threshold 10 -- --defaults-file=./my.default.cnf

More than one variable:

In some cases, you want to check more than one variable, in those cases, you will have to write a small scrip to do this.

The script:
The script must contain a shell function called “trg_plugin” and that function must return a number, this number will be the one that pt-stalk will use to match against the –threshold option.

$ cat  pt-stalk-function 
# This function will count the commands that come form localhost ($4 in this case) 
# and with execution time ($7) greater than 5000.
# 
# The  $EXT_ARGV is the connection options that you will send it to the pt-stalk.
#   
#  $2    $3         $4          $5   $6        $7     $8      $9
#+-----+----------+-----------+----+---------+------+-------+------------------+
#| Id  | User     | Host      | db | Command | Time | State | Info             |
#+-----+----------+-----------+----+---------+------+-------+------------------+
#

trg_plugin() {
	mysqladmin $EXT_ARGV processlist |  awk -F\| 'BEGIN{counter=0} 
                     { if ($4 ~ /localhost/ && $7 > 5000 ) counter++ ; } END {print counter}'
}

The pt-stalk command:
Collect all information when the function called trg_plugin inside the script ./pt-stalk-function (–function ../pt-stalk-function) return more than 100 (–threshold 100)

$ pt-stalk --function ./pt-stalk-function \
  --threshold 100  --  --defaults-file=./my.default.cnf

Custom collector with plugins:

Plugins are useful to collect information that it is not included in the pt-stalk by default. For example, if you want to collect pid status information from /proc/[mysqlpid]/status you can use plugins for this.

The plugin:
The script in this case contain a shell function called “before_collect” and pt-stalk will run this function before collect the information (you can collect after, before and after, etc, please check the documentation for more information)

$ cat  pt-stalk-pidplugin
# This plugin will collect the /proc/[mysqlpid]/status from all mysqld pids running in 
# this box and it will store the information in a file in the default directory and 
# with the default prefix and adding the pid number and -status at the end. 
# For example: 2013_01_02_22_58_55-2082-status

before_collect()
{
        for __p in $(pidof mysqld); do 
            cat /proc/${__p}/status > ${OPT_DEST}/${prefix}-${__p}-status ; 
        done
}

The pt-stalk command:
Before collect the information, it will run the plugin ./pt-stalk-pidplugin (–plugin ./pt-stalk-pidplugin)

$ pt-stalk --plugin ./pt-stalk-pidplugin \
  --threshold 10  -- --defaults-file=./my.default.cnf

Have any comments or questions? Just let me know below!

The post Percona Toolkit by example – pt-stalk appeared first on MySQL Performance Blog.

Nov
22
2012
--

Get Me Some Query Logs!

One of my favorite tools in the Percona Toolkit is pt-query-digest.  This tool is indispensable for identifying your top SQL queries, and analyzing which queries are accounting for your database load.

But the report you get from pt-query-digest is only as good as the log of queries you give it as input.  You need a large enough sample of query logs, collected over a period of time when you have representative traffic on your database.

You also need the log to include all the queries, not just those that take more than N seconds.  The reason is that some queries are individually quick, and would not be logged if you set the long_query_time configuration variable to 1 or more seconds.  You want that threshold to be 0 seconds while you’re collecting logs.

However, activating such high-volume query log collection can be costly.  Every statement executed on your  database will cause file I/O, even when the query itself was served out of your buffer pool memory.  That’s a lot of overhead, so we need to be careful about how and when we collect logs, and for how long we leave that running.

I’ve put together a simple shell script to help automate this.  I have given it the functional but unimaginative name full-slow-log.

The script configures full logging, then sleeps for a number of seconds to allow queries to be collected in the logs.  After it finishes sleeping, or if you interrupt the script, the script restores log configuration back to the values they started with.

$ full-slow-log [ -v ] [ -s seconds ] [ -c config ]
  • -v is for verbose output.
  • -s seconds allows you to specify the number of seconds to sleep.  The default is 5 seconds, which is probably too short for most sites, but the value is chosen to be as low impact as possible if you forget to give another value.
  • -c config allows you to specify a MySQL config file other than $HOME/.my.cnf, so you can store host, user, and password.

Here’s an example of running it with verbose output:

$ full-slow-log -v
Discovering slow_query_log=1
Discovering slow_query_log_file=mysql-slow.log
Discovering long_query_time=60.000000
Setting long_query_time=0
Setting slow_query_log_file=mysql-slow.log-full-20121122112413
Setting slow_query_log=1
Flushing slow query log
Sleeping 5 seconds... done.
Restoring slow_query_log_file=mysql-slow.log
Restoring long_query_time=60.000000
Restoring slow_query_log=1
Flushing logs during restore

Notice that the script also redirects the slow query log to a new file, with a filename based on the timestamp.  This is so you have a distinct file that contains only the specific time range of logs you collected.

The restoration of settings is in a “trap” which is a shell scripting feature that serves as both an exit handler and signal handler.  So if you interrupt the script before it’s done, you have some assurance that it will do the right thing to restore settings anyway.

My full-slow-log script is now available on a Github project (along with a few other experimental scripts I have written).  See https://github.com/billkarwin/bk-tools

I hope you find this script a useful complement to my upcoming talks at the Percona Live  MySQL Conference in London, UK on December 3-4 2012:

If you can make it to London in December, we’d love to see you there!  If not, look for future Percona Live conferences.

Tweet the link to this blog for a chance to win a free full conference pass. Make sure to use hashtag #perconalive! Winner will be chosen at the end of the day. 

Aug
29
2012
--

Here’s a quick way to Foresee if Replication Slave is ever going to catch up and When!

If you ever had a replication slave that is severely behind, you probably noticed that it’s not catching up with a busy master at a steady pace. Instead, the “Seconds behind master” is going up and down so you can’t really tell whether the replica is catching up or not by looking at just few samples, unless these are spread apart. And even then you can’t tell at a glance when it is going to catch up.

Normally, the “severely behind” thing should not happen, but it does often happen in our consulting practice:

  • sometimes replication would break and then it needs to catch up after it is fixed,
  • other times new replication slave is built from a backup which is normally hours behind,
  • or, it could be that replication slave became too slow to catch up due to missing index

Whatever the case is, single question I am being asked by the customer every time this happens is this: When is the replica going to catch up?”

I used to tell them “I don’t know, it depends..” and indeed it is not an easy question to answer. There are few reasons catching up is so unstable:

  1. If you have restarted the server, or started a new one, caches are cold and there’s a lot of IO happening,
  2. Not all queries are created equal – some would run for seconds, while others can be instant,
  3. Batch jobs: some sites would run nightly tasks like building statistics tables or table checksum – these are usually very intense and cause slave to backup slightly.

I didn’t like my own answer to The question, so I decided to do something about it. And because I love awk, I did that something in awk:

delay=60
cmd="mysql -e 'show slave status\G' | grep Seconds_Behind_Master | awk '{print \$2}'"
while sleep $delay; do
  eval $cmd
done | awk -v delay=$delay '
{
   passed += delay;
   if (count%10==0)   
      printf("s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h\n");
   if (prev==NULL){
      prev = $1;
      start = $1;
   }
   speed = (prev-$1)/delay;
   o_speed = (start-($1-passed))/passed
   if (speed == 0)    speed_d = 1;
     else             speed_d = speed;
   eta = $1/speed_d;
   if (eta<0)         eta = -86400;
   o_eta = $1/o_speed;
   printf("%8d %8.6f %9.3f %7.3f | %9.3f %7.3f %7.2f\n",
      $1, $1/86400, speed, eta/86400, o_speed, o_eta/86400, o_eta/3600);
   prev=$1;
   count++;
}'

I don't know if this is ever going to become a part of a Percona Toolkit, however since it's pretty much a one-liner, I just keep it in my snippets pool for easy copy'n'paste.

Here's a piece of an output from a server that was almost 27 days behind just yesterday:

// at the beginning:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 2309941 26.735428     0.000  26.735 |     1.000  26.735  641.65
 2309764 26.733380     2.950   9.062 |     2.475  10.801  259.23
 2308946 26.723912    13.633   1.960 |     6.528   4.094   98.25
 2308962 26.724097    -0.267  -1.000 |     5.079   5.262  126.28
 2309022 26.724792    -1.000  -1.000 |     4.063   6.577  157.85
...
// after one hour:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 2264490 26.209375    39.033   0.671 |    13.418   1.953   46.88
 2262422 26.185440    34.467   0.760 |    13.774   1.901   45.63
 2261702 26.177106    12.000   2.181 |    13.762   1.902   45.65
...
// after three hours:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 2179124 25.221343    13.383   1.885 |    13.046   1.933   46.40
 2178937 25.219178     3.117   8.092 |    12.997   1.940   46.57
 2178472 25.213796     7.750   3.253 |    12.973   1.943   46.64
...
// after 12 hours:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 1824590 21.117940    20.233   1.044 |    12.219   1.728   41.48
 1823867 21.109572    12.050   1.752 |    12.221   1.727   41.46
 1823089 21.100567    12.967   1.627 |    12.223   1.726   41.43
...
// after 21 hours:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 1501659 17.380312    -0.533  -1.000 |    11.768   1.477   35.44
 1501664 17.380370    -0.083  -1.000 |    11.760   1.478   35.47
 1501689 17.380660    -0.417  -1.000 |    11.751   1.479   35.50
...

Of course, it is still not perfectly accurate and it does not account for any potential changes in queries, workload, warm-up, nor the time it takes to run the mysql cli, but it does give you an idea and direction that replication slave is going. Note, negative values mean replication isn't catching up, but values themselves are mostly meaningless.

Here's what the weird acronyms stand for:

  • s_behind - current Seconds_Behind_Master value
  • d_behind - number of days behind based on current s_behind
  • c_sec_s - how many seconds per second were caught up during last interval
  • eta_d - this is ETA based on last interval
  • O_c_sec_s - overall catch-up speed in seconds per second
  • O_eta_d - ETA based on overall catch-up speed (in days)
  • O_eta_h - same like previous but in hours

Let me know if you ever find this useful.

May
06
2011
--

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources. In this test, Shard-Query averages a nearly 6x (max over 10x) improvement over the baseline, as shown in the following graph:

One significant advantage of Shard-Query over Hive is that it works with existing MySQL data sets and queries. Another advantage is that it works with all MySQL storage engines.

This set of benchmarks evaluates how well Infobright community edition (ICE) performs in combination with Shard-Query.

Data set

It was important to choose a data set that was large enough to create queries that would run for a decent amount of time, but not so large that it was difficult to work with. The ontime flight performance statistics data, available online from the United States Bureau of Transportation Statistics (BTS) made a good candidate for testing, as it had been tested before:
Another MPB post
Lucid DB testing

The raw data is a completely denormalized schema (single table). In order to demonstrate the power of Shard-Query it is important to test complex queries involving joins and aggregation. A star schema is the most common OLAP/DW data model, since it typically represents a data mart. See also: “Data mart or data warehouse?”. As it is the most common data model, it is desirable to benchmark using a star schema, even though it involves work to transform the data.

Star schema

Transforming the data was straightforward. I should note that I did this preprocessing with the MyISAM storage engine, then I dumped the data to tab delimited flat files using mysqldump. I started by loading the raw data from the BTS into a single database table called ontime_stage.

Then, the airport information was extracted:

create table dim_airport(
airport_id int auto_increment primary key,
unique key(airport_code)
)
as
select
  Origin as `airport_code`,
  OriginCityName as `CityName`,
  OriginState as `State`,
  OriginStateFips as `StateFips`,
  OriginStateName as `StateName` ,
  OriginWac as `Wac`
FROM ontime_stage
UNION
select
  Dest as `airport_code`,
  DestCityName as `CityName`,
  DestState as `State`,
  DestStateFips as `StateFips`,
  DestStateName as `StateName` ,
  DestWac as `Wac`
FROM ontime_stage;

After extracting flight/airline and date information in a similar fashion, a final table `ontime_fact` is created by joining the newly constructed dimension table tables to the staging tables, omitting the dimension columns from the projection, instead replacing them with the dimension keys:

select dim_date.date_id,
       origin.airport_id as origin_airport_id,
       dest.airport_id as dest_airport_id,
       dim_flight.flight_id,
       ontime_stage.TailNum, ...
from ontime_stage
join dim_date using(FlightDate)
join dim_airport origin on ontime_stage.Origin = origin.airport_code
join dim_airport dest on ontime_stage.Dest = dest.airport_code
join dim_flight using (UniqueCarrier,AirlineID,Carrier,FlightNum);

The data set contains ontime flight information for 22 years, which can be confirmed by examining the contents of the date dimension:

mysql> select count(*),
min(FlightDate),
max(FlightDate)
from dim_date\G
*************************** 1. row ***************************
       count(*): 8401
min(FlightDate): 1988-01-01
max(FlightDate): 2010-12-31
1 row in set (0.00 sec)

The airport dimension is a puppet dimension. It is called a puppet because it serves as both origin and destination dimensions, being referenced by origin_airport_id and destination_airport_id in the fact table, respectively. There are nearly 400 major airports included in the data set.

mysql> select count(*) from dim_airport;
+----------+
| count(*) |
+----------+
|      396 |
+----------+
1 row in set (0.00 sec)

The final dimension is the flight dimension, which contains the flight numbers and air carrier hierarchies. Only the largest air carriers must register and report ontime information with the FAA, so there are only 29 air carriers in the table:

mysql> select count(*),
count(distinct UniqueCarrier)
from dim_flight\G
*************************** 1. row ***************************
                     count(*): 58625
count(distinct UniqueCarrier): 29
1 row in set (0.02 sec)

Each year has tens of millions of flights:

mysql> select count(*) from ontime_one.ontime_fact;
+-----------+
| count(*)  |
+-----------+
| 135125787 |
+-----------+
1 row in set (0.00 sec)

This should be made fully clear by the following schema diagram:

Star schema (ontime_fact, dim_date, dim_flight, dim_airport)

Diagram of the ontime dimensional schema

Test environment

For this benchmark, a test environment consisting of a single commodity database server with 6 cores (+6ht) and 24GB of memory was selected. The selected operating system was Fedora 14. Oracle VirtualBox OSE was used to create six virtual machines, each running Fedora 14. Each of the virtual machines was granted 4GB of memory. A SATA 7200rpm RAID10 battery backed RAID array was used as the underlying storage for the virtual machines.

Baseline:
The MySQL command line client was used to execute the a script file containing the 11 queries. This same SQL script was used in the Shard-Query tests. For the baseline, the results and response times were captured with the \T command. The queries were executed on a single database schema containing all of the data. Before loading, there was approximately 23GB of data. After loading, ICE compressed this data to about 2GB. The test virtual machine was assigned 12 cores in this test.

Scale-up:
Shard-Query was given the following configuration file, which lists only one server. A single schema (ontime_one) contained all of the data. The test virtual machine was assigned 12 cores for this test. The same VM was used as the previous baseline test. This VM was rebooted between tests. A SQL script was fed to the run_query.php script and the output was captured with the ‘tee’ command.

$ cat one.ini
[default]
user=mysql
db=ontime_one
password=
port=5029
column=date_id
mapper=hash
gearman=127.0.0.1:7000
inlist=*
between=*

[shard1]
host=192.168.56.102

Scale-out
In addition to adding parallelism via scale-up, Shard-Query can improve performance of almost all queries by spreading them over more than one physical server. This is called “scaling out” and it allows Shard-Query to vastly improve the performance of queries which have to examine a large amount of data. Shard-Query includes a loader (loader.php) which can be used to either split a data into multiple files (for each shard, for later loading) or it can load files directly, in parallel, to multiple hosts.

Shard-Query will execute queries in parallel over all of these machines. With enough machines, massive parallelism is possible and very large data sets may be processed. As each machine examines only a small subset of the data, performance can be improved significantly:

$ cat shards.ini
[default]
user=mysql
db=ontime
password=
port=5029
column=date_id
mapper=hash
gearman=127.0.0.1:7000
inlist=*
between=*

[shard1]
host=192.168.56.101
db=ontime

[shard2]
host=192.168.56.102
db=ontime

[shard3]
host=192.168.56.103
db=ontime

[shard4]
host=192.168.56.105
db=ontime

[shard5]
host=192.168.56.106
db=ontime

[shard6]
host=192.168.56.104
db=ontime

In this configuration, each shard has about 335MB-350MB of data (23GB raw data, compressed to about 2GB total data. then spread over six servers). Due to ICE limitations, the data was split before loading. The splitting/loading process will be described in another post.

Complex queries

Shard-Query was tested with the simple single table version of this data set in a previous blog post. Previous testing was limited to a subset of Vadim’s test queries (see that post). As this new test schema is normalized, Vadim’s test queries must be modified to reflect the more complex schema structure. For this benchmark each of the original queries has been rewritten to conform to the new schema, and additionally two new test queries have been added. To model real world complexity, each of the test queries feature at least one join, and many of the filter conditions are placed on attributes in the dimension tables. It will be very interesting to test these queries on various engines and databases.

Following is a list of the queries, followed by a response time table recording the actual response times for each query. The queries should be self-explanatory.

Performance, at a glance

You will notice that Shard-Query is faster in nearly every case. The performance of the queries is improved significantly by scaling out, even more so than scaling up, because additional parallelism is added beyond what can be exploited by scale up. Scale up can improve query performance when there is enough resources to support the added parallelism, and when one of the the following are in used in the query: BETWEEN or IN clauses, subqueries in the FROM clause, UNION or UNION ALL clauses. If none of those features are used, then parallelism can’t be added. Q9 is an example of such a query.
.

Query details:

-- Q1
 SELECT DayOfWeek, count(*) AS c
   from ontime_fact
   JOIN dim_date using (date_id)
  WHERE Year
BETWEEN 2000 AND 2008
  GROUP BY DayOfWeek
  ORDER BY c DESC;

-- Q2
SELECT DayOfWeek, count(*) AS c
  from ontime_fact
  JOIN dim_date using (date_id)
 WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008
 GROUP BY DayOfWeek
 ORDER BY c DESC;

-- Q3
SELECT CityName as Origin, count(*) AS c
  from ontime_fact
  JOIN dim_date using (date_id)
  JOIN dim_airport origin
    ON origin_airport_id = origin.airport_id
 WHERE DepDelay>10
   AND Year BETWEEN 2000 AND 2008
 GROUP BY 1
 ORDER BY c
 LIMIT 10;


The next queries show how performance is improved when Shard-Query adds parallelism when “subqueries in the from clause” are used. There are benefits with both “scale-up” and “scale-out”, but once again, the “scale-out” results are the most striking.

-- Q4
SELECT Carrier, count(*) as c
  from ontime_fact
  JOIN dim_date using (date_id)
  join dim_flight using(flight_id)
 WHERE DepDelay>10
   AND Year=2007
 GROUP BY Carrier
 ORDER BY c DESC;

-- Q5
SELECT t.Carrier, c, c2, c*1000/c2 as c3
FROM
     (SELECT Carrier, count(*) AS c
        from ontime_fact
        join dim_date using(date_id)
        join dim_flight using(flight_id)
       WHERE DepDelay>10
         AND Year=2007
       GROUP BY Carrier) t
JOIN (SELECT Carrier, count(*) AS c2
        from ontime_fact
        join dim_date using(date_id)
        join dim_flight using(flight_id)
       WHERE Year=2007
       GROUP BY Carrier) t2
  ON (t.Carrier=t2.Carrier)
ORDER BY c3 DESC;

-- Q6
SELECT t.Year, c1 / c2 as ratio
FROM
     (select Year, count(*)*1000 as c1
        from ontime_fact
        join dim_date using (date_id)
       WHERE DepDelay>10
       GROUP BY Year) t
JOIN (select Year, count(*) as c2
        from ontime_fact
        join dim_date using (date_id)
       WHERE DepDelay>10
       GROUP BY Year) t2
  ON (t.Year=t2.Year);

-- Q7
SELECT t.Year, c1 / c2 as ratio
  FROM (select Year, count(Year)*1000 as c1
          from ontime_fact
          join dim_date using (date_id)
         WHERE DepDelay>10
         GROUP BY Year) t
  JOIN (select Year, count(*) as c2
          from ontime_fact
          join dim_date using (date_id)
         GROUP BY Year) t2
    ON (t.Year=t2.Year);


The performance of the following queries depends on the size of the date range:

-- Q8.0
SELECT dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
 WHERE Year BETWEEN 2001 and 2001
 GROUP BY dest.CityName
 ORDER BY 2 DESC;

-- Q8.1
SELECT dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
 WHERE Year BETWEEN 2001 and 2005
 GROUP BY dest.CityName
 ORDER BY 2 DESC;

-- Q8.2
SELECT dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
 WHERE Year BETWEEN 2001 and 2011
 GROUP BY dest.CityName
 ORDER BY 2 DESC;

-- Q8.3
SELECT dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
 WHERE Year BETWEEN 1990 and 2011
 GROUP BY dest.CityName
 ORDER BY 2 DESC;

-- Q8.4
SELECT dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
 WHERE Year BETWEEN 1980 and 2011
 GROUP BY dest.CityName
 ORDER BY 2 DESC;


Finally, Shard-Query performance continues to improve when grouping and filtering is used. Again, notice Q9. It doesn’t use any features which Shard-Query can use to add parallelism. Thus, in the scale up configuration it does not perform any better than the baseline, and actually performed just a little worse. Since scale out splits the data between servers, it performs about 6x better as the degree of parallelism is controlled by the number of shards.

-- Q9
select Year ,count(Year) as c1
  from ontime_fact
  JOIN dim_date using (date_id)
 group by Year;

-- Q10
SELECT Carrier, dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
  JOIN dim_flight using (flight_id)
 WHERE Year BETWEEN 2009 and 2011
 GROUP BY Carrier,dest.CityName
 ORDER BY 3 DESC;

-- Q11
SELECT Year, Carrier, dest.CityName, COUNT( DISTINCT origin.CityName)
  from ontime_fact
  JOIN dim_airport dest on ( dest_airport_id = dest.airport_id)
  JOIN dim_airport origin on ( origin_airport_id = origin.airport_id)
  JOIN dim_date using (date_id)
  JOIN dim_flight using (flight_id)
 WHERE Year BETWEEN 2000 and 2003
   AND Carrier = 'AA'
 GROUP BY Year, Carrier,dest.CityName
 ORDER BY 4 DESC;


Conclusion

The divide and conquer approach is very useful when working with large quantities of data. Shard-Query can be used with existing data sets easily, improving the performance of queries significantly if they use common query features like BETWEEN or IN. It is also possible to spread your data over multiple machines, scaling out to improve query response times significantly.

These queries are a great test of Shard-Query features. It is currently approaching RC status. If you decide to test it and encounter issues, please file a bug on the bug tracker. You can get Shard-Query (currently in development release form as a checkout from SVN) here: Shard-Query Google code project

Full disclosure

Justin Swanhart, the author of this article is also the creator and maintainer of Shard-Query. The author has previously worked in cooperation with Infobright, including on benchmarking. These particular tests were performed independently of Infobright, without their knowledge or approval. Infobright was, however, given the chance to review this document before publication, as a courtesy. All findings are represented truthfully, transparently, and without any intended bias.

Apr
17
2011
--

Aspersa tools bit.ly download shortcuts

I use Aspersa tools a lot and I find myself going to the website just to download one of the tools all the time. I love I can download maatkit with a simple wget maatkit.org/get/tool command so I made bit.ly shortcuts for all of the current aspersa tools. Here’s the full list with my favorite on the top and least favorite (but none the less very useful) on the bottom:

So now it’s as simple as wget bit.ly/aspersa-tool. I hope you will find these useful too.

Aurimas

Update: I made an error with align tool which now links to the root directory of a trunk. I don’t think this is a big deal as I use align least often, but what I did to fix that is created even shorter shortcuts for all the tools, so you can also use wget bit.ly/a-tool

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com