Mar
31
2011
--

InnoDB Flushing: a lot of memory and slow disk

You may have seen in the last couple of weekly news posts that Baron mentioned we are working on a new adaptive flushing algorithm in InnoDB. In fact, we already have three such algorithms in Percona Server (reflex, estimate, keep_average). Why do we need one more? Okay, first let me start by showing the current problems, and then we will go to solutions.

The basic problem is that, unfortunately, none of the existing flushing implementations (including both MySQL native adaptive flushing and that in Percona Server) can handle it properly. Our last invention, “keep_average”, is doing a very good job on systems based on SSD/Flash storage, but it is not so good for regular slow hard drives.

Let me state the following: If you have a lot of memory (and this is not rare nowadays, for example Cisco UCS C250), your database fits into memory, and your write rate is significant, then the slow disk is, by definition, not able to keep up with all the changes you do in memory.

Here is a graph for the tpcc-mysql benchmark (100W ~ 10GB of data, 12G innodb_buffer_pool_size, 1G innodb_log_file_size),
MySQL 5.5.10 with innodb_adaptive_flushing=ON (default)). The problem here is that the system has quite slow hard drives (2 hard disks in software RAID0).

As you can see, after the initial warmup, throughput drops constantly, all the way to zero, and may stay in that state for a long time (minutes). This comes from the fact that MySQL performs changes in memory at a faster rate than it can write to disks. Eventually it gets into an “async” state, where InnoDB tries to flush as many pages as possible in order to keep the checkpoint age fitting into the InnoDB transaction logs, and this makes the situation even worse (these are the periods of zero throughput on the graph).

MySQL’s general recommendation for this case is to increase innodb_io_capacity and decrease innodb_max_dirty_pages_pct so as to have fewer dirty pages. But I call that tuning by shooting yourself in the left foot. (You can’t run fast with a broken foot, right ?) And actually it does not work, as MySQL is not able to keep the number of dirty pages within the given innodb_max_dirty_pages_pct limit.

Another possible solution would be to increase innodb_log_file_size, but that: 1) only delays the problem until later; 2) increases recovery time (and that is an important factor with slow disks); and 3) MySQL does not support innodb_log_file_size > 4GB (it is supported in Percona Server).

To make things more interesting, let’s touch on the topic of flushing of neighbor pages. Performing some research, we found it works in a way we really did not expect. My understanding was that flushing neighbor pages was implemented to perform sequential writes where possible and avoid random writes, which is quite critical for hard drives. What we found is that InnoDB does it in the following way. Say we want to flush page P. InnoDB is looking in an area of 128 pages around page P, and flushes all the pages in that area that are dirty. To illustrate, say we have an area of memory like this: ...D...D...D....P....D....D...D....D where each dot is a page that does not need flushing, each “D” is a dirty page that InnoDB will flush, and P is our page.

So, as a result of how it works, instead of performing 1 random write, InnoDB will perform 8 random writes. I do not have a good explanation for why it is implemented this way.

To make the situation even worse, the count of flushed neighbor pages is counted toward the number of pages we asked to be flushed. That is, for example, we see that to make an improvement in the checkpoint_age we need to flush 8 pages. InnoDB flushes page P and its 7 neighbors (which are not really neighbors), and then stops after this (alas, 8 pages flushed). That is, instead of flushing 8 pages what we would expect to be flushed, InnoDB flushes 1 needed page and 7 random pages, which may not even be relevant for improving the checkpoint age.

This makes calculating how many pages we need to flush extremely hard (read “impossible”), and this is one of the reasons why innodb_adaptive_flushing is not able to keep up.

What if we disable flushing of neighbor pages (we have the option innodb_flush_neighbor_pages in Percona Server)? Well, it may help to some extent, but remember that hard disks love sequential operations, and your throughput in this case may turn out significantly worse.

So what is the solution? This was exactly the topic of our research. We set these goals:

  • Provide stable throughput by avoiding big jumps in flushing pages.
  • Make the algorithm independent of innodb_io_capacity and innodb_max_dirty_pages_pct. (This is important for systems where I/O capacity may vary a lot, like EC2 systems, or is affected by periodic tasks like backup or heavy background queries.)
  • If we see that flushing to disk is not able to keep up with changes in memory, we need to have a throttling mechanism that will limit the rate of changes in memory.
  • Optimize the flushing of sequential neighbor pages in a way that makes sense.

We have made good progress, and I will keep you posted on our ideas and results.

Mar
28
2011
--

MySQL on Amazon RDS part 2: Determining Peak Throughput

This is a continuation of my series of benchmark posts comparing Amazon RDS to a server running on Amazon EC2. Upcoming posts (probably 6 or 8 in total) will extend the scope of the benchmark to include data on our Dell r900 with traditional hard drives in RAID10, and a server in the Joyent cloud. As a reminder, my goal was to run a long-term benchmark and see how the instance performed over time. Can it sustain performance over a several-day period of intense workload? The first step was to determine the number of threads that should be used for the benchmark.

To gauge this, I ran a series of 60-second benchmarks on the RDS server, and extracted the transactions per second from them, then used the peak throughput as my target configuration. The benchmark was sysbench oltp complex, with 400 million rows (88GB of data and indexes, which is larger than memory in all of the servers I benchmarked). Here are the results:

#Threads    #TPS
       8  319.47
      16  496.63
      32  885.46
      64 1254.46
     128  348.32
     256  194.67

With that as a rough guide, I ran another series of 60-second benchmarks, with the following thread counts:

#Threads    #TPS
       1   44.26
       2   78.66
       4  183.47
       8  316.79
      16  527.50
      24  576.44
      32  671.08
      40  796.30
      48 1128.46
      56 1278.88
      64 1333.63
      72 1312.06
      80 1203.37
      88  900.18
      96  733.57
     104  655.50
     112  567.87

The following is a chart of both benchmark runs.

Now, you may ask the question, is a duration of 60 seconds a good way to decide on a number of threads for a benchmark that runs for a long time? I answered that in my previous post on choosing a good benchmark length. The answer is no, 60 seconds is not enough to decide. However, while testing a variety of setups, I actually ran some other thread counts for durations of a few hours at a time. For example, I benchmarked some machines at up to 256 threads for a few hours.

I did not prepare publishable reports on all of the benchmarks I ran. While running the benchmarks, I gathered many gigabytes of performance and diagnostic metrics, and ensuring that it is ready to publish on this blog takes a lot of care. Instead, during the benchmarks I performed live bottleneck analysis, and observed data such as snapshots of the processlist and status counters, and created one-off graphs of various metrics, to determine how the machines and configurations were behaving. Based on my observations (not my formal analysis), I thought that 64 threads was a good balance between over-working some machines and under-working others. We have to find something in the middle so that we can benchmark all systems at the same number of threads, or the benchmarks will not be apples-to-apples.

It is important to note that these are the results I achieved on this RDS server, from this EC2 “driver” machine, at this specific point in time. There will be variations between machines and from time to time, so your benchmark mileage may vary.

Based on these results, I decided to benchmark the system at the RDS server’s peak short-term throughput of 64 threads and see how things went in longer benchmarks. The next blog post in this series will discuss the results.

Mar
28
2011
--

What’s up with HandlerSocket?

I’ve presented at two different venues about HandlerSocket recently and the number one question that always arises is:

Why hasn’t HandlerSocket become more popular than it is?

Considering how fast and awesome HandlerSocket is, it’s not seeing as rapid adoption as some might expect. I theorize that there are five reasons for this:

Bugs, Bugs, Bugs

Up until the beginning of the year, HandlerSocket had a couple of bugs that a lot of people considered deal-breakers, and it’s not widely known that these issues have been fixed.

Distribution

Lots of organizations simply use what is provided to them. This was the case with MyISAM in standard distributions of MySQL for the longest time; many people were on MyISAM simply because it was the default. Even today, many organizations using recent versions of MySQL 5.1 are not using the InnoDB plugin because it is not enabled by default! If people cannot be bothered to add a couple of lines to my.cnf to enable the InnoDB plugin, they certainly cannot be expected to build the HandlerSocket plugin from source.

This is being changed with Percona-Server. The HandlerSocket plugin is now included by default and it simply takes a couple of steps to set it up. Hopefully we’ll see HandlerSocket be included in other places in the MySQL community.

No releases

There are no real releases or release schedules that are easy to digest. There is active development, bugs are being fixed, and the documentation is improving, but that does not translate into people knowing that their bugs are fixed and there is new code available.

No popular use cases

This is sort of a chicken-and-egg problem where people need to see use cases or endorsements of well-known companies in order to adopt a technology. Generally speaking, engineers are more inclined to remember a use case for a particular technology when it is well documented.

Not enough benchmarks

Finally, the benchmarks are not thorough enough. Nobody has yet taken the time to show diverse workloads on a variety of data sets. The existing work has focused almost entirely on primary-key-select workloads because that is where HandlerSocket truly shines; but people still want to know how the rest of the functionality performs. Also, there is an inherent distrust of benchmarks when done only by parties who have an interest in the product; there needs to be a wide variety of benchmarks from an even wider variety of industries and companies.

Summary

Most of these “issues” will work themselves out naturally as HandlerSocket matures. More people will publish benchmarks and use cases. As companies upgrade their Percona-Server installations, HandlerSocket will become available and easily accessible to them. Finally, as the code matures and the community grows, we’ll see lots more third-party involvement with bug reports, feature requests, and patches. I see lots of interest in HandlerSocket and am tremendously optimistic about it’s future.

Mar
28
2011
--

Percona Server and XtraBackup available on Solaris

I am happy to announce that Percona Server and Percona XtraBackup are now available for the Solaris platform!

  • You can download the latest Percona Server 5.1.55-12.6 stable release from our download area.
  • As for Percona XtraBackup, we made Solaris binaries for the upcoming XtraBackup-1.6 release, and they are available from “pre-release” directory.

There are couple features not available in Solaris build of Percona Server: Response Time Distribution (it depends on GCC platform specific functions) , HandlerSocket ( compilation issues on Solaris), Maatkit UDFs.

I want to say a special thanks to Joyent (a cloud computing provider) for Solaris resources and help with building binaries. Joyent provides templates for Percona Server and Percona XtraBackup by request.

Mar
28
2011
--

Promote Percona Server and XtraBackup with Web Badges

We have web badges for promoting Percona Server and XtraBackup! If you use them and want to put little buttons in your website’s footer or sidebar, now it’s easy. There is copy-paste ready HTML for three styles of badge, for each of them: Percona Server badges and Percona XtraBackup badges.

Mar
28
2011
--

Maatkit’s mk-query-digest filters

Have you ever seen BIG weird numbers in mk-query-digest report that just seem wrong? I have! Here’s one report I got today:

...
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time          5088s     1us    171s     2ms   467us   104ms    28us
# Lock time            76s       0      3s    26us    69us     3ms       0
# Rows sent          9.80M       0   1.05M    3.50    0.99  642.32       0
# Rows examine       5.59G       0  82.56M   2.00k    0.99  97.41k       0
# Rows affecte     457.30k       0   2.62k    0.16    0.99    1.68       0
# Rows read          2.16G       0  82.56M  788.53   21.45  82.91k    0.99
# Bytes sent         2.14T       0   4.00G 781.27k   3.52k  47.84M   84.10
# Merge passes     273.47G       0   4.00G  97.69k       0  10.35M       0
# Tmp tables       225.85G       0   4.00G  80.67k       0   7.89M       0
# Tmp disk tbl     381.88G       0   4.00G 136.41k       0  14.08M       0
# Tmp tbl size     255.54G       0   2.39G  91.28k       0   8.59M       0
# Query size       418.38M       6 257.39k  149.45  563.87   1.49k   42.48
# InnoDB:
# IO r bytes       272.92G       0   1.58G 479.74k       0  18.19M       0
# IO r ops         328.05G       0   2.00G 576.65k       0  23.08M       0
# IO r wait        237964272912s       0 3377771733s 398921s       0 22888946s       0
# pages distin     156.50G       0   2.39G 275.11k   27.38  14.48M    4.96
# queue wait       143150489533s       0 3377769328s 239976s       0 16014027s       0
# rec lock wai     216352062699s       0 4085510331s 362690s       0 27625029s       0
...

That can’t be right! Apparently there are couple bugs in the slowlog patch that aren’t really critical as the numbers are only wrong for the administrator commands. Yet when aggregated it kind of messes up the whole mk-query-digest output.

There’s a simple solution to that though — mk-query-digest –filter. With just a small change it will completely ignore administrator commands:

mk-query-digest --filter='$event->{arg} !~ m/administrator command/' in > out

If these bugs are really bugging you, feel free to sponsor the fix. Otherwise enjoy the power of the almighty mk-query-digest!

Mar
27
2011
--

Percona Server and XtraBackup weekly news, March 28th

It’s time for the weekly roundup of news for Percona Server and XtraBackup. To follow up on my note from last week, we’re still checking and validating the download stats. It appears that Percona Server has been downloaded well over 100,000 times, and Percona XtraBackup has been downloaded over 200,000 times. But when we changed the script to separate out the downloads for our 5.5 release series, the overall number came back slightly decreased, which makes no sense. We haven’t had the chance to vet the analysis script yet, but we should be able to do that shortly.

For Percona Server,

  • We worked on Solaris builds for the server.
  • Peter proposed a feature for the server to enable more efficient incremental backups with XtraBackup.
  • Peter, Vadim, and Nickolay continued investigating alternative methods of implementing flush in Percona Server. (I am involved, but not as deeply.) The task is complex and there are a great many scenarios to consider. For example, there is fast storage and slow storage, there is the case when the data is larger than the buffer pool and when the data is smaller than the buffer pool (which is actually worse in some ways), there is flush list flushing and LRU flushing, and on and on. We have been benchmarking various types of scenarios for modified neighbor flushing, for example, to investigate how the oldest LSN is advanced in different cases. We still have nothing conclusive, but we have learned some things that gave us ideas about how we could make the flushing algorithm self-tuning in more cases. We will keep you posted.

In XtraBackup news,

  • Stewart Smith of the Drizzle project contributed a number of improvements to XtraBackup so that it can be a first-class backup tool for Drizzle. Thanks, Stewart!
  • Continuing with the theme of community contributions, are also delighted that Lachlan Mulcahy is working on software to manage backups with XtraBackup. It is called XtraBackup Manager and is hosted on Google Code.
  • We finished preparing XtraBackup to be used with Percona Server 5.5.
  • We worked on Solaris builds for XtraBackup.
  • We started from scratch and wrote documentation for innobackupex, removing mention of obsolete features and documenting command-line options that were not mentioned in the –help output previously. The docs will be available now with –help, via perldoc, as a man page, and on our documentation wiki.

In an unrelated note, we have a new home for the documentation of our InnoDB data recovery tools. The tools moved from Google Code to Launchpad a couple of years ago, but the documentation never got updated, and became stale and inaccurate. It is now hosted on our documentation wiki along with all of our other software documentation.

Mar
25
2011
--

Using Flexviews – part two, change data capture

In my previous post I introduced materialized view concepts. This post begins with an introduction to change data capture technology and describes some of the ways in which it can be leveraged for your benefit. This is followed by a description of FlexCDC, the change data capture tool included with Flexviews. It continues with an overview of how to install and run FlexCDC, and concludes with a demonstration of the utility.

As a reminder, the first post covered the following topics:

  1. What is a materialized view(MV)?
  2. It explained that an MV can pre-compute joins and may aggregate and summarize data.
  3. Using the aggregated data can significantly improve query response times compared to accessing the non-aggregated data.
  4. Keeping MVs up-to-date (refreshing) is usually expensive.
  5. A change data capture tool can be used to implement an efficient way of refreshing them.

What is Change Data Capture (CDC)?

As the name implies, CDC software captures the changes made to database rows and makes those changes available in a convenient form which can be accessed by other programs. CDC applications exist for many commercial databases but until recently this type of software was not available for MySQL.

Change Data Capture can be used to:

  1. Monitor a database table, or tables for changes.
  2. Improve ETL performance by identifying the data which has changed.
  3. Maintain materialized views with Flexviews (the primary purpose of FlexCDC).
  4. Feed search engines like Sphinx or Solr only the rows that change.
  5. Feed third party replication systems.
  6. Provide data to “external triggers” such as Gearman jobs.

CDC tools usually operate in one of the following ways:

  1. Timestamps (usually more than one) to identify rows that have changed
  2. Triggers to capture changes synchronously
  3. Database log reading to capture the changes asynchronously

The first method has serious drawbacks, such that it can’t identify deleted rows and MySQL timestamps may not be flexible enough.

The trigger method has a lot of problems. Triggers add a significant overhead. When the structure of a table is changed, the triggers must be changed. The work in the trigger is immediate and affects every transaction. Finally, MySQL has limited trigger support, some of which is the cause of the aforementioned problems. The biggest problem, at least from standpoint of how Flexviews works, is that triggers can not, under normal conditions, detect the commit order of transactions. This above all makes triggers an unacceptable CDC method.

This leaves the third method, log based capture as the best option because it imposes less overhead than triggers and change data capture may be done remotely and asynchronously.

Binary log based CDC

The CDC tool included with Flexviews is called FlexCDC. It seemed like an appropriate name. The Binary Log is the MySQL log which records changes to tables in the database. FlexCDC reads the binary log to determine what rows have changed. Because of this, FlexCDC requires that you use row-based binary logs (RBR). If you don’t have MySQL 5.1 or aren’t using RBR, then it is possible to set up a dedicated MySQL slave which has log_slave_updates=1 and binlog_format=row to process SBR changes from a MySQL master. I’ll talk more about that in another blog post.

FlexCDC does not implement a full binary log parser. Instead, it invokes the ‘mysqlbinlog’ utility and it processes the predictable output of this program. mysqlbinlog will always be able to read the binary logs of the version of mysql it ships with (and previous versions) so there is no worry about binary log format changes. FlexCDC is written in PHP so it is portable.

Setting up FlexCDC

FlexCDC has some basic requirements:

  • MySQL 5.1+
  • row based logging (binlog_format=1)
  • unique server_id in the my.cnf
  • log_slave_updates=1, if this is a MySQL slave
  • transaction_isolation=READ-COMMITTED

You can get FlexCDC directly out of the Flexviews SVN. I suggest that you just grab all of Flexviews:

$ svn checkout http://flexviews.googlecode.com/svn/trunk/ flexviews

Next you have to customize the example ini file. FlexCDC is located in the flexviews/consumer/ subdirectory.

Create the settings file:

Change to the flexviews/consumer directory and copy the consumer.ini.example file to consumer.ini and edit it, making appropriate changes. The file is well commented. The example settings file should work for most MySQL installations which allow connections as root with no password from localhost. It is possible to read from and/or write to remote servers, but this example uses the local machine which is the usual configuration for Flexviews since it requires local access to the tables and the changelogs in order to maintain materialized views. Most database servers have some spare CPU for binary log parsing.

Run the setup script:

This will create the metadata tables and capture the initial binary log position.

php ./setup_flexcdc.php --ini consumer.ini
$ php setup_flexcdc.php
setup starting
setup completed

If the setup detects any problems (such as binary logging not being enabled) it will exit with an error. It will exit with a message “setup completed” otherwise.

Verify installation

The binary log stores it progress in a metadata table:

$ mysql -e 'select * from flexviews.binlog_consumer_status\G' -uroot
*************************** 1. row ***************************
server_id: 999
master_log_file: binary_log.000001
master_log_size: 214652
exec_master_log_pos: 214652

If you select from that table you won’t see anything changing, even if you are writing into your database. This isn’t anything to worry about, since the background process isn’t running yet.

Starting the background process:

FlexCDC includes a consumer_safe.sh script that will start up a copy of FlexCDC and restart it if it exits with error. You can shut down FlexCDC by sending it a HUP signal. The script will drop a .pid file so you know what process to HUP.

$ ./consumer_safe.sh --ini=consumer.ini &
[1] 959
$ ps
PID TTY          TIME CMD
959 pts/1    00:00:00 consumer_safe.sh
960 pts/1    00:00:00 php
967 pts/1    00:00:00 ps
6248 pts/1    00:00:00 bash
$ cat flexcdc.pid
960

Adding a changelog to a table

FlexCDC copies the contents of database rows which change into special tables called changelogs. Each changelog is located in the flexviews database and is named $SCHEMA_$TABLE where $SCHEMA is the schema in which the source table is located and $TABLE the name of the source table. If that is confusing it should be clear in a moment.

Lets create a table, insert some rows, add a change log, delete rows and then insert some more of them:

mysql> create table
> test.demo (
> c1 int auto_increment primary key,
> c2 int
> )
> engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test.demo values (1,1);
Query OK, 1 row affected (0.00 sec)

Even though FlexCDC is running in the background, it didn’t capture any changes from that insert. We need to add the table to the list of tables to changelog. There is a utility included with FlexCDC called ‘add_table.php’. This script automates the process of adding a table to the list of tables to changelog. It does this by adding an entry to the `flexviews`.`mvlogs` metadata table, and it creates the changelog table itself.

$ php add_table.php --schema=test --table=demo
success

Note that you can enable auto_changelog=true in the config file to automatically record changes for any table, starting from the first time a change is seen for that table. This is generally only useful if you have a small number of tables, and you want to track changes on all of them.

You may have also noted that I did not include –ini=consumer.ini. This is because this is the default config filename to search for. I included it in the earlier examples for illustration purposes.

Examine the changes

Now that the changelog has been added, any changes to `test`.`demo` will automatically be captured.

Insert data in one transaction (two rows):

mysql> insert into test.demo values (NULL,2),(NULL,3);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

And delete data in a second transaction:

mysql> delete from test.demo where c1=1;
Query OK, 1 row affected (0.00 sec)

The changelog is flexviews.test_demo. This because the source table is `test`.`demo`.

mysql> select * from flexviews.test_demo\G
*************************** 1. row ***************************
dml_type: 1
uow_id: 10
fv$server_id: 999
c1: 2
c2: 2
*************************** 2. row ***************************
dml_type: 1
uow_id: 10
fv$server_id: 999
c1: 3
c2: 3
*************************** 3. row ***************************
dml_type: -1
uow_id: 11
fv$server_id: 999
c1: 1
c2: 1
3 rows in set (0.00 sec)

As you can see, there are three rows in the changelog, each representing one of the changes we made.

You will notice that the source table only has two columns, but the changelog contains five. All change logs contain three additional metadata columns: dml_type, uow_id and fv$server_id. These columns represent the type of change (insert is 1, delete -1), the transaction order and the source of the changes, respectively.

Finally, note that the two insertions happened inside of the same transaction, and that the insertions happened before the deletion. Though they are none shown here, updates would be represented by a deletion followed by an insertion.

Mar
23
2011
--

Using Flexviews – part one, introduction to materialized views

If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I’m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.

You might be asking yourself “what is an incrementally refreshable materialized view?”. If so, then keep reading. This is the first in a multi-part series describing Flexviews.

edit:
You can find part 2 of the series here: http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/


The output of any SQL query is logically an in-memory table
The output of a SQL query, and what we normally think of as a result set is really a virtual table. It has columns and rows, just like a database table, but it is temporary in nature, usually existing only in memory for a short time. This concept extends from relational algebra, upon which SQL is built. All SQL can be broken down into relational algebra, which is convenient because that means there are lots of transformations that can be done to it, all without changing the meaning or the output. The output of all relational algebraic statements is a table, so conceptually so too is the output of a SQL statement.

MySQL even includes a SQL statement that makes this perfectly clear: CREATE TABLE .. AS SELECT (CTAS). The results of the SELECT portion of the statement are stored in a table. Storing the results of a SQL statement into a table (even a temporary table) is called materializing the results.

Views are similar to regular SQL statements.
A view is a SQL statement which acts like a table. When you execute a query on a view, the result set is ephemeral, generated at run-time for consumption and then immediately discarded.

Views are not generally considered to be a performance optimization because:

  • The contents of the result set are computed each time the view is accessed.
  • If multiple statements access the same view repeatedly or concurrently, then this computation is likely to be expensive
  • If the view accesses large quantities of data, then the computation is likely expensive.
  • Views containing grouping, aggregation, sorting, distinct or other conditions must be fully computed and stored in a temporary table before they can be accessed, which is very expensive.

What is a materialized view (MV)?
A materialized view is similar to a regular view, in that it represents the result set of a query, but the contents are stored (materialized!) as a real table.  So a MV is similar to a table created with the CTAS command described above. This similarity is fairly superficial though. While both store results in a table, the MV represents the results of a SQL statement at a specific point in time, essentially a consistent snapshot of the query result. It is not possible to create multiple different tables via CTAS and have them all be transactionally consistent with one another, unless you stop all database write activity.

Materialized views can be used to enhance performance by acting as a cache. Further, the cost of a cache miss is lower because incremental refresh is faster than recomputing the contents from scratch.

  • The contents of the result set are updated periodically, not each time the view is accessed.
  • If multiple statements access the same view repeatedly or concurrently it is not likely to be very expensive.
  • If the view is large, accessing the table will be considerably cheaper
  • You can add indexes to the MV.
  • Since the data is already joined together and pre-aggregated, CPU and memory usage may be reduced compared to computing the results.
  • You can often generate multiple different blocks of content from the summarized data in one or more views

Materialized views must be refreshed.
Refreshing a MV brings it up to date to reflect the changes in the database since the view was either first created or last refreshed, whichever is more recent. More importantly a MV can can be refreshed to a specific point in time, not just “now”. This means that you can maintain multiple MVs and keep them synced to the same point in time.   There are two different methods by which a MV can be refreshed.

The first is the complete refresh method, which rebuilds the entire contents of the view from scratch. This is the less desirable method:

  • During a complete refresh, the view contents are completely recalculated, which could be very expensive
  • On some databases, the contents of the view are not available during complete refresh (not true of Flexviews)
  • During refresh the view may occupy twice as much space (similar to ALTER TABLE)
  • Supports all SQL syntax (like outer join) but can’t be refreshed to a specific point in time.
  • Performs no better than CTAS, but gives a convenient method of creating and refreshing the materialized results

The second is refresh method is incremental refresh. This method updates the view. It usually examines only the rows which have changed since the view was last refreshed.

Incremental refresh has obvious benefits which include:

  • Refreshing large views is orders of magnitude faster than complete refresh.
  • When updating the view, only a subset of the database rows must be examined
  • The rows examined are related to the rows which have changed since the last refresh
  • The view can be refreshed forward to to a specific transactional point in time
  • Multiple views can be rolled forward to the exact same consistent point in time, with respect to committed transactions
  • The processing can be done on a slave dedicated to such work.

And some drawbacks:

  • Not all SQL syntax supported (no outer join), no non-deterministic functions, etc.
  • There is overhead for change-data capture.
  • Some extra storage is used for the changes and deltas.
  • Creating an MV is not as simple as I’d like.

Incremental refresh capabilities imply that Flexviews has some way to capture the changes that happen in the database and then apply those changes to the views.

These capabilities break down into three main categories which will be blogged about in subsequent posts:

  • Change Data Capture – How Flexviews figures out what changed in the database
  • Delta computation – How it uses those changes to compute the differences between the old results and the new results
  • Delta application – How Flexviews uses the deltas to actually make the changes
  • Finally, the last blog post will describe how to use Flexviews for online schema change, or as an ELT* tool and it will cover how to create the materialized views which Flexviews manages.

*ELT – extract, load and transform, a variant of the data integration Extract-Transform-Load process (ETL).

You can find part 2 of the series here: http://www.mysqlperformanceblog.com/2011/03/25/using-flexviews-part-two-change-data-capture/

Mar
22
2011
--

Two quick site updates: videos online; Kindle edition

Two notes for readers. First, we have been publishing a great deal of video material on our companion site Percona.TV. This includes our recordings from past conferences, user group meetings, screencasts, and most recently our webinars. (I hate the proprietary formats and silly technical restrictions that the webinar companies impose. I figured out how to download the streams and convert them to formats that will upload to blip.tv and embed, so now they’re easy to watch.)

Second, you can now subscribe to this blog through Kindle, if that’s how you like to read your MySQL technical geekery. We’re in the Kindle store as MySQL Performance Blog. Happy reading!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com