Nov
16
2016
--

All You Need to Know About GCache (Galera-Cache)

GCache

GCacheThis blog discusses some important aspects of GCache.

Why do we need GCache?

Percona XtraDB Cluster is a multi-master topology, where a transaction executed on one node is replicated on another node(s) of the cluster. This transaction is then copied over from the group channel to Galera-Cache followed by apply action.

The cache can be discarded immediately once the transaction is applied, but retaining it can help promote a node as a DONOR node serving write-sets for a newly booted node.

So in short, GCache acts as a temporary storage for replicated transactions.

How is GCache managed?

Naturally, the first choice to cache these write-sets is to use memory allocated pool, which is governed by gcache.mem_store. However, this is deprecated and buggy and shouldn’t be used.

Next on the list is on-disk files. Galera has two types of on-disk files to manage write-sets:

  • RingBuffer File:
    • A circular file (aka RingBuffer file). As the name suggests, this file is re-usable in a circular queue fashion, and is pre-created when the server starts. The size of this file is preconfigured and can’t be changed dynamically, so selecting a proper size for this file is important.
    • The user can set the size of this file using gcache.size. (There are multiple blogs about how to estimate size of the Galera Cache, which is generally linked to downtime. If properly planned, the next booting node will find all the missing write-sets in the cache, thereby avoiding need for SST.)
    • Write-sets are appended to this file and, when needed, the file is re-cycled for use.
  • On-demand page store:
    • If the transaction write-set is large enough not to fit in a RingBuffer File (actually large enough not to fit in half of the RingBuffer file) then an independent page (physical disk file) is allocated to cache the write-sets.
    • Again there are two types of pages:
      • Page with standard size: As defined by gcache.page_size (default=128M).
      • Page with non-standard page size: If the transaction is large enough not to fit into a standard page, then a non-standard page is created for the transaction. Let’s say
        gcache.page_size=1M

         and transaction

        write_set = 1.5M

        , then a separate page (in turn on-disk file) will be created with a size of 1.5M.

How long are on demand pages retained? This is controlled using following two variables:

  • gcache.keep_pages_size
    • keep_pages_size

       defines total size of allocated pages to keep. For example, if

      keep_pages_size = 10M

       then N pages that add up to 10M can be retained. If N pages add to more than 10M, then pages are removed from the start of the queue until the size falls below set threshold. A size of 0 means don’t retain any page.

  • gcache.keep_pages_count (PXC specific)
    • But before pages are actually removed, a second check is done based on
      page_count

      . Let’s say

      keep_page_count = N+M

      , then even though N pages adds up to 10M, they will be retained as the 

      page_count

       threshold is not yet hit. (The exception to this is non-standard pages at the start of the queue.)

So in short, both condition must be satisfied. The recommendation is to use whichever condition is applicable in the user environment.

Where are GCache files located?

The default location is the data directory, but this can be changed by setting gcache.dir. Given the temporary nature of the file, and iterative read/write cycle, it may be wise to place these files in a faster IO disk. Also, the default name of the file is gcache.cache. This is configurable by setting gcache.name.

What if one of the node is DESYNCED and PAUSED?

If a node desyncs, it will continue to received write-sets and apply them, so there is no major change in gcache handling.

If the node is desynced and paused, that means the node can’t apply write-sets and needs to keep caching them. This will, of course, affect the desynced/paused node and the node will continue to create on-demand page store. Since one of the cluster nodes can’t proceed, it will not emit a “last committed” message. In turn, other nodes in the cluster (that can purge the entry) will continue to retain the write-sets, even if these nodes are not desynced and paused.

Sep
08
2014
--

How to calculate the correct size of Percona XtraDB Cluster’s gcache

How to calculate the correct size of Percona XtraDB Cluster's gcacheWhen a write query is sent to Percona XtraDB Cluster all the nodes store the writeset on a file called gcache. By default the name of that file is galera.cache and it is stored in the MySQL datadir. This is a very important file, and as usual with the most important variables in MySQL, the default value is not good for high-loaded servers. Let’s see why it’s important and how can we calculate a correct value for the workload of our cluster.

What’s the gcache?
When a node goes out of the cluster (crash or maintenance) it obviously stops receiving changes. When you try to reconnect the node to the cluster the data will be outdated. The joiner node needs to ask a donor to send the changes happened during the downtime.

The donor will first try to transfer an incremental (IST), that is, the writesets the cluster received while the node was down. The donor checks the last writeset received by the joiner and then checks local gcache file. If all needed writesets are on that cache the donor sends them to the joiner. The joiner applies them and that’s all, it is up to date and ready to join the cluster. Therefore, IST can only be achieved if all changes missed by the node that went away are still in that gcache file of the donor.

On the other hand, if the writesets are not there a full transfer would be needed (SST) using one of the supported methods, XtraBackup, Rsync or mysqldump.

In a summary, the difference between a IST and SST is the time that a node needs to join the cluster. The difference could be from seconds to hours. In case of WAN connections and large datasets maybe days.

That’s why having a correct gcache is important. It work as a circular log, so when it is full it starts to rewrite the writesets at the beginning. With a larger gcache a node can be out of the cluster more time without requiring a SST. My colleague Jay Janssen explains in more detail about how IST works and how to find the right server to use as donor.

Calculating the correct size
When trick is pretty similar to the one used to calculate the correct InnoDB log file size. We need to check how many bytes are written every minute. The variables to check are:

wsrep_replicated_bytes: Total size (in bytes) of writesets sent to other nodes.

wsrep_received_bytes: Total size (in bytes) of writesets received from other nodes.

mysql> show global status like 'wsrep_received_bytes';
show global status like 'wsrep_replicated_bytes';
select sleep(60);
show global status like 'wsrep_received_bytes';
show global status like 'wsrep_replicated_bytes';
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received_bytes | 83976571 |
+----------------------+----------+
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| wsrep_replicated_bytes | 0     |
+------------------------+-------+
[...]
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received_bytes | 90576957 |
+----------------------+----------+
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| wsrep_replicated_bytes | 800   |
+------------------------+-------+

Therefore:

Bytes per minute:

(second wsrep_received_bytes – first wsrep_received_bytes) + (second wsrep_replicated_bytes – first wsrep_replicated_bytes)

(90576957 – 83976571) + (800 – 0) = 6601186 bytes or 6 MB per minute.

Bytes per hour:

6MB * 60 minutes = 360 MB per hour of writesets received by the cluster.

If you want to allow one hour of maintenance (or downtime) of a node, you need to increase the gcache to that size. If you want more time, just make it bigger.

The post How to calculate the correct size of Percona XtraDB Cluster’s gcache appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com