Aug
20
2018
--

Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost

AWS EC2 MySQL cost savings

AWS EC2 MySQL cost savingsIf you are using large EBS GP2 volumes for MySQL (i.e. 10TB+) on AWS EC2, you can increase performance and save a significant amount of money by moving to local SSD (NVMe) instance storage. Interested? Then read on for a more detailed examination of how to achieve cost-benefits and increase performance from this implementation.

EBS vs Local instance store

We have heard from customers that large EBS GP2 volumes can be affected by short term outages—IO “stalls” where no IO is going in or out for a couple of minutes. This can happen, especially, in the largest AWS region us-east-1. Statistically, with so many disks in disk arrays (which back EBS volumes) we can expect frequent disk failures. If we allocate a very large EBS GP2 volume, i.e. 10Tb+, hitting such failure events can be common.

In the case of MySQL/InnoDB, such an IO “stall” will be obvious, particularly with the highly loaded system where MySQL needs to do physical IO. During the stall, you will see all write queries are waiting, or “hang”.  Some of the writes may error out with “Error 1030” (MySQL error code 1030 (ER_GET_ERRNO): Got error %d from storage engine). There is nothing MySQL can do here – if the IO subsystem is not available, it will need to wait for it.

The good news is: many of the newer EC2 instances (i.e. i3, m5d, etc) have local SSD disks attached (NVMe). Those disks are local to the physical server and should not suffer from the EBS issues described above. Using local disks can be a very good solution:

  1. They are faster, as they are local to the server, and do not suffer from the EBS issues
  2. They are much cheaper compared to large EBS volumes.

Please note, however, that local storage does not guarantee persistence. More about this below.

Another potential option will be to use IO1 volumes with provisional IOPS. However, it will be significantly more expensive for the large volumes and high traffic.

A look at costs

To estimate the costs, I’ve used the AWS simple monthly calculator. Estimated costs are based on 1 year reserved instances. Let’s imagine we will need to use 14TB volume (to store ~10Tb of MySQL data including binary logs). The pricing estimates will look like this:

r4.4xlarge, 122GB RAM, 16 vCPUs + EBS, 14TB volume (this is what we are presumably using now)

Amazon EC2 Service (US East (N. Virginia)) $ 1890.56 / month
Compute: $ 490.56
EBS Volumes: $1400.00

Local storage price estimate:
i3.4xlarge, 122GB RAM, 16 vCPUs, 3800 GiB disk (2 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $ 627.21 / month
Compute: $ 625.61

i3.8xlarge, 244GB RAM, 32 vCPUs, 7600 GiB disk (4 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $1252.82 / month
Compute: $ 1251.22

As we can see, even if we switch to i3.8xlarge and get 2x more RAM and 2x more virtual CPUs, faster storage, 10 gigabit network we can still pay 1.5x less per box what we are presumably paying now. Include replication, then that’s paying 1.5x less per each of the replication servers.

But wait … there is a catch.

How to migrate to local storage from EBS

Well, we have some challenges here to migrate from EBS to local instance NVMe storage.

  1. Wait, we are storing ~10Tb and i3.8xlarge have 7600 GiB disk. The answer is simple: compression (see below)
  2. Wait, but the local storage is ephemeral, if we loose the box we will loose our data – that is unacceptable.  The answer is also simple: replication (see below)
  3. Wait, but we use EBS snapshots for backups. That answer is simple too: we can still use EBS (and use snapshots) on 1 of the replication slave (see below)

Compression

To fit i3.8xlarge we only need 2x compression. This can be done with InnoDB row compression (row_format=compressed) or InnoDB page compression, which requires sparse file and hole punching support. However, InnoDB compression may be slower and will only compress ibd files—it does not compress binary logs, frm files, etc.

ZFS

Another option: use the ZFS filesystem. ZFS will compress all files, including binary logs and frm. That can be very helpful if we use a “schema per customer” or “table per customer” approach and need to store 100K – 200K tables in a single MySQL instance. If the data is compressible, or new tables were provisioned without much data in those, ZFS can give a significant disk savings.

I’ve used ZFS (followed Yves blog post, Hands-On Look at ZFS with MySQL). Here are the results of data compression with ZFS (this is real data, not a generated data):

# du -sh --apparent-size /mysqldata/mysql/data
8.6T	/mysqldata/mysql/data
# du -sh /mysqldata/mysql/data
3.2T	/mysqldata/mysql/data

Compression ratio:

# zfs get all | grep -i compress
...
mysqldata/mysql/data  compressratio         2.42x                  -
mysqldata/mysql/data  compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/data  refcompressratio      2.42x                  -
mysqldata/mysql/log   compressratio         3.75x                  -
mysqldata/mysql/log   compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/log   refcompressratio      3.75x                  -

As we can see, the original 8.6Tb of data was compressed to 3.2Tb, the compression ratio for MySQL tables is 2.42x, for binary logs 3.75x. That will definitely fit i3.8xlarge.

(For another test, I’ve generated 40 million tables spread across multiple schemas (databases). I’ve added some data only to one schema, leaving others blank. For that test I achieved ~10x compression ratio.)

Conclusion: ZFS can provide you with very good compression ratio, will allow you to use different EC2 instances on AWS, and save you a substantial amount of money. Although compression is not free performance-wise, and ZFS can be slower for some workloads, using local NVMe storage can compensate.

You can find some performance testing for ZFS on linux in this blog post: About ZFS Performance. Some benchmarks comparing EBS and local NVMe SSD storage (i3 instances) can be found in this blog post: Percona XtraDB Cluster on Amazon GP2 Volumes

MyRocks

Another option for compression would be using the MyRocks storage engine in Percona Server for MySQL, which provides compression.

Replication and using local volumes

As the local instance storage is ephemeral we need redundancy: we can use MySQL replication or Percona XtraDB cluster (PXC). In addition, we can use one replication slave—or we can attach a replication slave to PXC—and have it use EBS storage.

Local storage is not durable. If you stop the instance and then start it again, the local storage will probably disappear. (Though reboot is an exception, you can reboot the instance and the local storage will be fine.) In addition if the local storage disappears we will have to recreate MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

For example, using MySQL replication:

master - local storage (AZ 1, i.e. 1a)
-> slave1 - local storage (AZ 2, i.e. 1b)
-> slave2 - ebs storage (AZ 3, i.e. 1c)
   (other replication slaves if needed with local storage - optional)

MySQL Master AZ 1a, Local storage

Then we can use slave2 for ebs snapshots (if needed). This slave will be more expensive (as it is using EBS) but it can also be used to either serve production traffic (i.e. we can place smaller amount of traffic) or for other purposes (for example analytical queries, etc).

For Percona XtraDB cluster (PXC) we can just use 3 nodes, 1 in each AZ. PXC uses auto-provisioning with SST if the new node comes back blank. For MySQL replication we need some additional things:

  1. Failover from master to a slave if the master will go down. This can be done with MHA or Orchestrator
  2. Ability to clone slave. This can be done with Xtrabackup or ZFS snapshots (if using ZFS)
  3. Ability to setup a new MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

Other options

Here are some totally different options we could consider:

  1. Use IO1 volumes (as discussed). That can be way more expensive.
  2. Use local storage and MyRocks storage engine. However, switching to another storage engine is another bigger project and requires lots of testing
  3. Switch to AWS Aurora. That can be even more expensive for this particular case; and switching to aurora can be another big project by itself.

Conclusions

  1. Using EC2 i3 instances with local NVMe storage can increase performance and save money. There are some limitations: local storage is ephemeral and will disappear if the node has stopped. Reboot is fine.
  2. ZFS filesystem with compression enabled can decrease the storage requirements so that a MySQL instance will fit into local storage. Another option for compression could be to use InnoDB compression (row_format=compressed).

That may not work for everyone as it requires additional changes to the existing server provisioning: failover from master to a slave, ability to clone replication slaves (or use PXC), ability to setup a new MySQL local storage partition, using compression.

The post Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost appeared first on Percona Database Performance Blog.

Sep
01
2017
--

How Life360 Used ProxySQL to Lower Its Database Load

ProxySQL

ProxySQLIn this blog post, we’ll look at how to use ProxySQL to help the database load by handling PINGs.

I’ve blogged before about one of our regular clients, Life360. One of the issues they recently had was the PING command taking about 30%-40% of total queries per second across their database infrastructure. This is a non-trivial amount and was easily tens of thousands of pings per second. This added a significant amount of latency to real queries.

A large number of pings is due to the use of PHP PDO with persistent connections. Persistence, or pooling, is necessary to reduce time spent on connecting, disconnecting and reconnecting.

Unfortunately, in PHP (and other) implementations, the driver checks if the database is still alive with a PING before sending the actual command. Logic dictates that you could use the actual command as the PING, and if it fails it could return the same error it would have if the ping itself failed. Baron Schwartz has a lot to say about how unwise the use of a PING is within the drivers.

Barring rewriting PHP PDO, we thought up another solution: ProxySQL.

Before testing ProxySQL, we didn’t know how much gets forwarded to the actual hosts (including these com_admin commands). We wanted to test a quick PoC to discover what the actual behavior of ProxySQL was with respect to these commands. We had two hypothesis that we wanted to check:

  1. ProxySQL forwards everything including com_commands
  2. ProxySQL responds to the com_commands itself

In the event that ProxySQL forwarded everything, we set up a “decoy” MySQL instance to respond to the pings using ProxySQL query filtering. As it turned out, we found that ProxySQL quickly and silently replies to PINGs and doesn’t forward it onto the underlying database server backend. This is the case for other commands as well, as ProxySQL isn’t strictly a forwarding proxy (but more of a reverse proxy).

By placing ProxySQL on the application servers, Life360 was able to reduce QPS significantly. The other advantage of introducing ProxySQL is that it does connection pooling and multiplexing for you.

Here is the graph of com_ping on the day of the deployment:

ProxySQL

Overall, we are talking about hundreds of millions of pings per day down to 0.

We can see that the vanilla install of ProxySQL also reduced active threads significantly:

ProxySQL

This change has enabled Life360 to put off some of their scaling plans and provided other operational gains.

In conclusion, you can use ProxySQL as a simple (or advanced) firewall between your application and database. It consumes very little resources, but provides an immense performance gain.

While ProxySQL can be used as a more advanced firewall, these features are beyond the scope of this post. There are very specific ways to configure it as an advanced firewall. We plan to blog more on this soon.

Mar
16
2017
--

Percona Server for MongoDB: Dashing New LDAP Authentication Plugin

LDAP Authentication

This blog post is another in the series on the Percona Server for MongoDB 3.4 bundle release. In this blog, we’ll look at the new LDAP authentication plugin. 

Hear ye, hear ye, hear ye… With the arrival of version 3.4, Percona has included an LDAP plugin in Percona Server for MongoDB. Authentication is an essential part in client communication with the database backend.

What is LDAP?

LDAP stands for Lightweight Directory Access Protocol. It’s a centralized environment containing information on users or services. This information can be objects like passwords, roles or other meta information. Typically when connecting to the MongoDB server, you simply have to authenticate with the MongoDB servers local credential lists. Using an external authentication method like the one included in Percona Server for MongoDB, you can poll an external service. External authentication allows the MongoDB server to verify the client’s username and password against a separate service, such as OpenLDAP or Active Directory.

LDAP Authentication

Why should you use it?

Having a centralized LDAP offers you the ability to rely on one single “truth” for authentication and authorization. LDAP is essential in large organizations where maintaining users on a big infrastructure becomes troublesome.

In ideal situations, you would use your LDAP authentication for multiple MongoDB servers, and even other database solutions like MySQL. The idea is that you only need to modify the passwords or accounts centrally, so you can manage entries without having to modify them locally on each MongoDB instance.

Having a centralized authentication repository is often a requirement when managing highly sensitive information (due to compliancy requirements). A central repository for user information, like an LDAP server, solves the problem of a central source for authentication. When a user with database level privileges leaves the organization, simply shutting off the one central LDAP account will prevent access to all databases that use it as a source. If local accounts were created without being tied back to a central user repository, then the likelihood of an access revocation getting missed is far greater. This is why many security standards require accounts to be created with an LDAP/Active Directory type of service.

So what components do we actually use?

If you  want to visualize it in a figure:

LDAP Authentication

  • SASL Daemon. Used as a MongoDB server-local proxy for the remote LDAP service. This service is used for MongoDB as an intermediate service. It will translate the request and poll the LDAP server.
  • SASL Library. Used by the MongoDB client and server to create data necessary for the authentication mechanism. This library is used by the MongoDB client and server for making properly formatted requests to the SASL daemon.

So how does it work?

An authentication session uses the following sequence:

  1. A MongoDB client connects to a running mongod instance.
  2. The client creates a PLAIN authentication request using the SASL library.
  3. The client then sends this SASL request to the server as a special MongoDB command.
  4. The mongod server receives this SASL Message, with its authentication request payload.
  5. The server then creates a SASL session scoped to this client, using its own reference to the SASL library.
  6. Then the server passes the authentication payload to the SASL library, which in turn passes it on to the saslauthd daemon.
  7. The saslauthd daemon passes the payload on to the LDAP service to get a YES or NO authentication response (in other words, does this user exist and is the password correct).
  8. The YES/NO response moves back from saslauthd, through the SASL library, to mongod.
  9. The mongod server uses this YES/NO response to authenticate the client or reject the request.
  10. If successful, the client has authenticated and can proceed.

Below a  visualisation of the authentication path using the LDAP connector

An example of the output when using the authentication plugin

The mongod server is running with the added option:

cat /etc/default/mongod OPTIONS="-f /etc/mongod.conf --auth --setParameter authenticationMechanisms=PLAIN,SCRAM-SHA-1" STDOUT="/var/log/mongodb/mongod.stdout"
STDERR="/var/log/mongodb/mongod.stderr" First let’s make a user in MongoDB, I’ve created an Organisational Unit and associated user with password on LDAP >
db.getSiblingDB("$external").createUser({ ... user : 'dim0', ... roles: [ {role : "read", db: 'percona'} ] ... }) Successfully added user: { "user" : "utest", "roles" : [ {
"role" : "read", "db" : "percona" } ] }

At this point the user is correctly added on MongoDB.

Let’s try and perform a read query on the database “percona”:

db.getSiblingDB("$external").auth({mechanism: "PLAIN", user: 'dim0', pwd: 'test’, digestPassword: false }) 1
use percona db.foo.find() { "_id" : ObjectId("58b3e4b80322deccc99dc763"), "x" : 1 } { "_id" : ObjectId("58b3e8fee48bdc7edeb31cc5"), "x" : 2 } { "_id" :
ObjectId("58b3e931e48bdc7edeb31cc6"), "x" : 3 } > exit

Now let’s try and write something in it:

db.foo.insert({x : 5}) WriteResult({ "writeError" : { "code" : 13, "errmsg" : "not authorized on percona to execute command { insert: "foo", documents: [ { _id:
ObjectId('58b3e97f2343c5da97a2256e'), x: 5.0 } ], ordered: true }" } })

This is logical behavior, as we only allowed read interaction on the percona database.

After a correct login, you will find the following in the mongod.log:

2017-02-27T08:55:19.612+0000 I ACCESS   [conn2] Successfully authenticated as principal dim0 on $external

If an incorrect login happens, the following entry will appear in the mongod.log:

2017-02-27T09:10:55.297+0000 I ACCESS   [conn4] PLAIN authentication failed for dim0 on $external from client 127.0.0.1:34812 ; OperationFailed: SASL step did not complete: (authentication failure)

Conclusion

Percona Server for MongoDB has an easy way of integrating correctly with SASL authd. If you are looking for an option of centrally managing the users of your MongoDB environments, look no further. Keep in mind, however, that if you don’t need a centrally managed environment adding this functionality creates additional complexity to your infrastructure. You can find additional information on the LDAP plugin in our documentation at https://www.percona.com/doc/percona-server-for-mongodb/ext_authentication/index.html.

Mar
08
2017
--

Migrating MongoDB Away from MMAPv1

MMAPv1

MMAPv1This is another post in the series of blogs on the Percona Server for MongoDB 3.4 bundle release. In this blog post, we’ll discuss moving away from the MMAPv1 storage engine.

Introduction

WIth the MongoDB v3.0 release in February of 2015, the long-awaited ability to choose storage engines became a reality. As of version 3.0, you could choose two engines in MongoDB Community Server and, if you use Percona Server for MongoDB, you could choose from four. Here’s a table for ease of consumption:

Here’s a table for easy consumption:

Storage Engine Percona Server for MongoDB MongoDB Community Server MongoDB Enterprise Server (licensed)
MMAPv1

?

?

?

WiredTiger

?

?

?

MongoRocks

?

In-memory

?

?

Encrypted

?

 

Why change engines?

With increased possibilities comes an increase in the decision-making process difficult (a concept that gets reinforced every time I take my mother out a restaurant with a large menu – ordering is never quick). In all seriousness, the introduction of the storage engine API to MongoDB is possibly the single greatest feature MongoDB, Inc has released to-date.

One of the biggest gripes from the pre-v3.0 days was MongoDB’s lack of scale. This was mostly due to the MMAPv1 storage engine, which suffered from a very primitive locking scheme. If you would like a illustration of the problem, think of the world’s biggest supermarket with one checkout line – you might be able to fit in lots of shoppers, but they’re not going to accomplish their goal quickly. So, the ability to increase performance and concurrency with a simple switch is huge! Additionally, modern storage engines support compression. This should reduce your space utilization when switching by at least 50%.

All the way up to MongoDB v3.2, the default storage engine was MMAPv1. If you didn’t make a conscious decision about what storage engine to choose when you started using MongoDB, there is a good chance that MMAPv1 is what you’re on. If you’d like to find out for sure what engine you’re using, simply run the command below. The output will be the name of the storage engine. As you can see, I was running the MMAPv1 storage engine on this machine. Now that we understand where we’re at, let’s get into where we can be in the future.

db.serverStatus().storageEngine.name
mmapv1

Public Service Announcement

Before we get into what storage engine(s) to evaluate, we need to talk about testing. In my experience, a majority of the MySQL and MongoDB community members are rolling out changes to production without planning or testing. If you’re in the same boat, you’re in very good company (or at least in a great deal of company). However, you should stop this practice. It’s basic “sample size” in statistics – when engaged in risk-laden behavior, the optimal time to stop increasing the sample size is prior to the probability of failure reaching “1”. In other words, start your testing and planning process today!

At Percona, we recommend that you thoroughly test any database changes in a testing or development environment before you decide to roll them into production. Additionally, prior to rolling the changes into production (with a well thought out plan, of course), you’ll need to have a roll-back plan in case of unintended consequences. Luckily, with MongoDB’s built-in replication and election protocols, both are fairly easy. The key here is to plan. This is doubly true if you are undertaking a major version upgrade, or are jumping over major versions. With major version upgrades (or version jumps) comes the increased likelihood of a change in database behavior as it relates to your application’s response time (or even stability).

What should I think about?

In the table above, we listed the pre-packaged storage engine options that are available to us and other distributions. We also took a look at why you should consider moving off of MMAPv1 in the preceding section. To be clear, in my opinion a vast majority of MongoDB users that are on MMAPv1 can benefit from a switch. Which engine to switch to is the pressing question. Your first decision should be to evaluate whether or not your workload fits into the sweet spot for MMAPv1 by reading the section below. If that section doesn’t describe your application, then the additional sections should help you narrow down your choices.

Now, let’s take a look at what workloads match up with what storage engines.

MMAPv1

Believe it or not, there are some use cases where MMAPv1 is likely to give you as good (or better) performance as any other engine. If you’re not worried about the size of your database on disk, then you may not want to bother changing engines. Users that are likely to see no benefit from changing have read-heavy (or 100%) read applications. Also, certain update-heavy use cases, where you’re updating small amounts of data or performing $set operations, are likely to be faster on MMAPv1.

WiredTiger

WiredTiger is a the new default storage engine for MongoDB. It is a good option for general workloads that are currently running on MMAPv1. WiredTiger will give you good performance for most workloads and will reduce your storage utilization with compression that’s enabled by default. If you have a write-heavy workload, or are approaching high I/O utilization (>55%) with plans for it to rise, then you might benefit from a migration to WiredTiger.

MongoRocks (RocksDB from Facebook)

This is Facebook’s baby, which was forged in the fires of the former Parse business unit. MongoRocks, which uses LSM indexing, is advertised as “designed to work with fast storage.” Don’t let this claim fool you. For workloads that are heavy on writes, highly concurrent or approaching disk bound, MongoRocks could give you great benefits. In terms of compression, MongoRocks has the ability to efficiently handle deeper compression algorithms, which should further decrease your storage requirements.

In-Memory

The in-memory engine, whether we’re speaking about the MongoDB or Percona implementation, should be used for workloads where extreme low latency is the most important requirement. The types of applications that I’m talking about are usually low-latency, “real-time” apps — like decision making or user session tracking. The in-memory engine is not persistent, so it operates strictly out of the cache allocated to MongoDB. Consequently, the data may (and likely will) be lost if the server crashes.

Encrypted

This is for applications in highly secure environments where on-disk encryption is necessary for compliance. This engine will protect the MongoDB data in the case that a disk or server is stolen. On the flip side, this engine will not protect you from a hacker that has access to the server (MongoDB shell), or can intercept your application traffic. Another way to achieve the same level of encryption for compliance is using volume level encryption like LUKS. An additional benefit of volume level encryption, since it works outside the database, is re-use on all compliant servers (not just MongoDB).

Getting to your new engine

Switching to the new engine is actually pretty easy, especially if you’re running a replica set. One important caveat is that unlike MySQL, the storage engine can only be defined per mongod process (not per database or collection). This means that it’s an all or nothing operation on a single MongoDB process. You’ll need to reload your data on that server. This is necessary because the data files from one engine are not compatible with another engine. Thus reloading the data to transform from one engine format to another is necessary. Here are the high-level steps (assuming you’re running a replica set):

  1. Make sure you’re not in your production environment
  2. Backup your data (it can’t hurt)
  3. Remove a replica set member
  4. Rename (or delete) the old data directory. The member will re-sync with the replica set
    • Make sure you have enough disk space if you’re going to keep a copy of the old data directory
  5. Update the mongo.conf file to use a new storage engine. Here’s an example for RocksDB from our documentation:
    storage:
     engine: rocksdb
     rocksdb:
       cacheSizeGB: 4
       compression: snappy
  6. Start the MongoDB process again
  7. Join the member to the replica set (initial sync will happen)
  8. When the updated member is all caught up, pick another member and repeat the process.
  9. Continue until the primary is the only server left. At this point, you should step down the primary, but hold off switching storage engines until you are certain that the new storage engine meets your needs.

The Wrap Up

At this point I’ve explained how you can understand your options, where you can gain additional performance and what engines to evaluate. Please don’t forget to test your application with the new setup before launching into production. Please drop a comment below if you found this helpful or, on the other hand, if there’s something that would make it more useful to you. Chances are, if you’d find something helpful, the rest of the community will as well.

Feb
24
2017
--

Installing Percona Monitoring and Management (PMM) for the First Time

Percona Monitoring and Management 2

Percona Monitoring and ManagementThis post is another in the series on Percona’s MongoDB 3.4 bundle release. This post is meant to walk a prospective user through the benefits of Percona Monitoring and Management (PMM), how it’s architected and the simple install process. By the end of this post, you should have a good idea of what PMM is, where it can add value in your environment and how you can get PMM going quickly.

Percona Monitoring and Management (PMM) is Percona’s open-source tool for monitoring and alerting on database performance and the components that contribute to it. PMM monitors MySQL (Percona Server and MySQL CE), Amazon RDS/Aurora, MongoDB (Percona Server and MongoDB CE), Percona XtraDB/Galera Cluster, ProxySQL, and Linux.

What is it?

Percona Monitoring and Management is an amalgamation of exciting, best in class, open-source tools and Percona “engineering wizardry,” designed to make it easier to monitor and manage your environment. The real value to our users is the amount of time we’ve spent integrating the tools, plus the pre-built dashboards we’ve constructed that leverage the ten years of performance optimization experience. What you get is a tool that is ready to go out of the box, and installs in minutes. If you’re still not convinced, like ALL Percona software it’s completely FREE!

Sound good? I can hear you nodding your head. Let’s take a quick look at the architecture.

What’s it made of?

PMM, at a high-level, is made up of two basic components: the client and the server. The PMM Client is installed on the database servers themselves and is used to collect metrics. The client contains technology specific exporters (which collect and export data), and an “admin interface” (which makes the management of the PMM platform very simple). The PMM server is a “pre-integrated unit” (Docker, VM or AWS AMI) that contains four components that gather the metrics from the exporters on the PMM client(s). The PMM server contains Consul, Grafana, Prometheus and a Query Analytics Engine that Percona has developed. Here is a graphic from the architecture section of our documentation. In order to keep this post to a manageable length, please refer to that page if you’d like a more “in-depth” explanation.

How do I use it?

PMM is very easy to access once it has been installed (more on the install process below). You will simply open up the web browser of your choice and connect to the PMM Landing Page by typing

http://<ip_address_of _PMM_server>

. That takes you to the PMM landing page, where you can access all of PMM’s tools. If you’d like to get a look into the user experience, we’ve set up a great demo site so you can easily test it out.

Where should I use it?

There’s a good chance that you already have a monitoring/alerting platform for your production workloads. If not, you should set one up immediately and start analyzing trends in your environment. If you’re confident in your production monitoring solution, there is still a use for PMM in an often overlooked area: development and testing.

When speaking with users, we often hear that their development and test environments run their most demanding workloads. This is often due to stress testing and benchmarking. The goal of these workloads is usually to break something. This allows you to set expectations for normal, and thus abnormal, behavior in your production environment. Once you have a good idea of what’s “normal” and the critical factors involved, you can alert around those parameters to identify “abnormal” patterns before they cause user issues in production. The reason that monitoring is critical in your dev/test environment(s) is that you want to easily spot inflection points in your workload, which signal impending disaster. Dashboards are the easiest way for humans to consume and analyze this data.

Are you sold? Let’s get to the easiest part: installation.

How do you install it?

PMM is very easy to install and configure for two main reasons. The first is that the components (mentioned above) take some time to install, so we spent the time to integrate everything and ship it as a unit: one server install and a client install per host. The second is that we’re targeting customers looking to monitor MySQL and MongoDB installations for high-availability and performance. The fact that it’s a targeted solution makes pre-configuring it to monitor for best practices much easier. I believe we’ve all seen a particular solution that tries to do a little of everything, and thus actually does no particular thing well. This is the type of tool that we DO NOT want PMM to be. Now, onto the installation procedure.

There are four basic steps to get PMM monitoring your infrastructure. I do not want to recreate the Deployment Guide in order to maintain the future relevancy of this post. However, I’ll link to the relevant sections of the documentation so you can cut to the chase. Also, underneath each step, I’ll list some key takeaways that will save you time now and in the future.

  1. Install the integrated PMM server in the flavor of your choice (Docker, VM or AWS AMI)
    1. Percona recommends Docker to deploy PMM server as of v1.1
      1. As of right now, using Docker will make the PMM server upgrade experience seamless.
      2. Using the default version of Docker from your package manager may cause unexpected behavior. We recommend using the latest stable version from Docker’s repositories (instructions from Docker).
    2. PMM server AMI and VM are “experimental” in PMM v1.1
    3. When you open the “Metrics Monitor” for the first time, it will ask for credentials (user: admin pwd: admin).
  2. Install the PMM client on every database instance that you want to monitor.
    1. Install with your package manager for easier upgrades when a new version of PMM is released.
  3. Connect the PMM client to the PMM Server.
    1. Think of this step as sending configuration information from the client to the server. This means you are telling the client the address of the PMM server, not the other way around.
  4. Start data collection services on the PMM client.
    1. Collection services are enabled per database technology (MySQL, MongoDB, ProxySQL, etc.) on each database host.
    2. Make sure to set permissions for PMM client to monitor the database: Cannot connect to MySQL: Error 1045: Access denied for user ‘jon’@’localhost’ (using password: NO)
      1. Setting proper credentials uses this syntax sudo pmm-admin add <service_type> –user xxxx –password xxxx
    3. There’s good information about PMM client options in the “Managing PMM Client” section of the documentation for advanced configurations/troubleshooting.

What’s next?

That’s really up to you, and what makes sense for your needs. However, here are a few suggestions to get the most out of PMM.

  1. Set up alerting in Grafana on the PMM server. This is still an experimental function in Grafana, but it works. I’d start with Barrett Chambers’ post on setting up email alerting, and refine it with  Peter Zaitsev’s post.
  2. Set up more hosts to test the full functionality of PMM. We have completely free, high-performance versions of MySQL, MongoDB, Percona XtraDB Cluster (PXC) and ProxySQL (for MySQL proxy/load balancing).
  3. Start load testing the database with benchmarking tools to build your troubleshooting skills. Try to break something to learn what troubling trends look like. When you find them, set up alerts to give you enough time to fix them.
Feb
21
2017
--

Percona Monitoring and Management (PMM) Upgrade Guide

Percona Monitoring and Management

Percona Monitoring and ManagementThis post is part of a series of Percona’s MongoDB 3.4 bundle release blogs. The purpose of this blog post is to demonstrate current best-practices for an in-place Percona Monitoring and Management (PMM) upgrade. Following this method allows you to retain data previously collected by PMM in your MySQL or MongoDB environment, while upgrading to the latest version.

Step 1: Housekeeping

Before beginning this process, I recommend that you use a package manager that installs directly from Percona’s official software repository. The install instructions vary by distro, but for Ubuntu users the commands are:

wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb

sudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb

Step 2: PMM Server Upgrade

Now that we have ensured we’re using Percona’s official software repository, we can continue with the upgrade. To check which version of PMM server is running, execute the following command on your PMM server host:

docker ps

This command shows a list of all running Docker containers. The version of PMM server you are running is found in the image description.

docker_ps

Once you’ve verified you are on an older version, it’s time to upgrade!

The first step is to stop and remove your docker pmm-server container with the following command:

docker stop pmm-server && docker rm pmm-server

Please note that this command may take several seconds to complete.
docker_stop

The next step is to create and run the image with the new version tag. In this case, we are installing version 1.1.0. Please make sure to verify the correct image name in the install instructions.

Run the command below to create and run the new image.

docker run -d
&nbsp;&nbsp;-p 80:80
&nbsp;&nbsp;--volumes-from pmm-data
&nbsp;&nbsp;--name pmm-server
&nbsp;&nbsp;--restart always
&nbsp;&nbsp;percona/pmm-server:1.1.0

docker_run

We can confirm our new image is running with the following command:

docker ps

docker_ps

As you can see, the latest version of PMM server is installed. The final step in the process is to update the PMM client on each host to be monitored.

Step 3: PMM Client Upgrade

The GA version of Percona Monitoring and Management supports in-place upgrades. Instructions can be found in our documentationOn the client side, update the local apt cache, and upgrade to the new version of pmm-client by running the following commands:

apt-get update

apt-get_update

apt-get install pmm-client

Congrats! We’ve successfully upgraded to the latest PMM version. As you can tell from the graph below, there is a slight gap in our polling data due to the downtime necessary to upgrade the version. However, we have verified that the data that existed prior to the upgrade is still available and new data is being gathered.

grafana_graph

Conclusion

I hope this blog post has given you the confidence to do an in-place Percona Monitoring and Management upgrade. As always, please submit your feedback on our forums with regards to any PMM-related suggestions or questions. Our goal is to make PMM the best-available open-source MySQL and MongoDB monitoring tool.

Jan
09
2017
--

MongoDB PIT Backups: Part 2

MongoDB PIT Backups

This blog post is the second in a series covering MongoDB PIT backups. You can find the first part here.

Sharding Makes Everything Fun(ner)

The first blog post in this series looked at MongoDB backups in a simple single-replica set environment. In this post, we’ll look at the scale-out use case. When sharding, we have exactly the same problem as we do on a single replica set. However, now the problem is multiplied by the number of replica sets in the cluster. Additionally, we have a bonus problem: each replica set has unique data. That means to get a truly consistent snapshot of the cluster, we need to orchestrate our backups to capture a single consistent point in time. Just so we’re on the same page, that means that every replica set needs to stop their backups at, or near, the same time that the slowest replica set stops. Are you sufficiently confused now? Let me get to a basic concept that I forgot to cover in the first post, and then I’ll give you a simple description of the problem.

Are you Write Concerned?

So far, I’ve neglected to talk about the very important role of “write concern” when taking consistent backups. In MongoDB, the database is not durable by default. By “durable,” I mean “on disk” when the database acknowledges receipt of an operation from your application. There are most likely several reasons for this. Most likely the biggest one originally was probably throughput given a lack of concurrency.

However, the side effect is possible data loss due to loss of operations applied only in memory. Changing the write concern to “journaled” (

j : true

) will change this behavior so that MongoDB journals changes before acknowledging them (you also need to be running with journal enabled).

TIP: For true durability in a replica set, you should use a write concern of “majority” for operations and the writeConcernMajorityJournalDefault : true on all replica set members (new to v3.4). This has the added benefit of greatly decreasing the chance of rollback after an election.

Wow, you’re inconsistent

At the risk of being repetitive, the crux of this issue is that we need to run a backup on every shard (replica set). This is necessary because every shard has a different piece of the data set. Each piece of that data set is necessary to get an entire picture of the data set for the cluster (and thus, your application). Since we’re using mongodump, we’ll only have a consistent snapshot at the point in time when the backup completes. This means we must end each shard’s backup at a consistent point in time. We cannot expect that the backup will complete in exactly the same amount of time on every shard, which is what we’ll need for a consistent point in time across the cluster. This means that Shard1 might have a backup that is consistent to 12:05 PM, and another shard that is consistent to 12:06 PM. In a high traffic environment (the kind that’s likely to need horizontal scale), this could mean thousands of lost documents. Here’s a diagram:

MongoDB PIT Backups
MongoDB PIT Backups

 

Here’s the math to illustrate the problem:

  • Shard1’s backup will contain 30,000 documents ((100 docs * 60 secs) * 5 mins)
  • Shard2’s backup will contain 36,000 documents ((100 docs * 60 secs) * 6 mins)

In this example, to get a consistent point in time you’d need to remove all insert, update and delete operations that happened on Shard 2 from the time that Shard 1’s backup completed (6,000 documents). This means examining the timestamp of every operation in the oplog and reversing it’s operation. That’s a very intensive process, and will be unique for every mongodump that’s executed. Furthermore, this is a pretty tricky thing to do. The repeatable and much more efficient method is to have backups that finish in a consistent state, ready to restore when needed.

Luckily, Percona has you covered!

You’re getting more consistent

Having data is important, but knowing what data you have is even more important. Here’s how you can be sure you know what you have in your MongoDB backups:

David Murphy has released his MongoDB Consistent Backup Tool in the Percona Labs github account, and has written a very informative blog post about it. My goal with these blog posts is to make it even easier to understand the problem and how to solve it. We’ve already had an exhaustive discussion about the problem on both small and large scales. How about the solution?

It’s actually pretty simple. The solution, at a basic level, is to use a simple algorithm to decide when a cluster-wide consistent point-in-time can be reached. In the MongoDB Consistent Backup tool, this is done by the backup host kicking off backups on a “known good member” of each shard (that’s a pretty cool feature by itself) and then tracking the progress of each dump. At the same time the backup is kicked off, the backup host kicks off a separate thread that tails the oplog on each “known good member” until the mongodump on the slowest shard completes. By using this method, we have a very simple way of deciding when we can get a cluster-wide consistent snapshot. In other words, when the slowest member completes their piece of the workload. Here’s the same workload from Figure 4, but with the MongoDB Consistent Backup Tool methodology:

MongoDB PIT Backups
MongoDB PIT Backups

 

TIP: The amount of time that it takes to perform these backups is often decided by two factors:

  1. How evenly distributed the data is across the shards (balanced)
  2. How much data each shard contains (whether or not it’s balanced).

The takeaway here is that you may need to shard so that each shard has a manageable volume of data. This allows you to hit your backup/restore windows more easily.

…The Proverbial “Monkey Wrench”

There’s always a “gotcha” just when you think you’ve got your mind around any difficult concept. Of course, this is no different.

There is one very critical concept in sharding that we didn’t cover: tracking what data lies on which shard. This is important for routing the workload to the right place, and balancing the data across the shards. In MongoDB, this is completed by the config servers. If you cannot reach (or recover) your config servers, your entire cluster is lost! For obvious reasons, you need to back them up as well. With the Percona Labs MongoDB Consistent Backup Tool, there are actually two modes used to backup config servers: v3.2 and greater, and legacy. The reason is that in v3.2, config servers went from mirrors to a standard replica set. In v3.2 mode, we just treat the config servers like another replica set. They have their own mongodump and oplog tail thread. They get a backup that is consistent to the same point in time as all other shards in the cluster. If you’re on a version of MongoDB prior to v3.2, and you’re interested in an explanation of legacy mode, please refer back to David’s blog post.

The Wrap Up

We’ve examined the problems with getting consistent backups in a running MongoDB environment in this and the previous blog posts. Whether you have a single replica set or a sharded cluster, you should have a much better understanding of what the problems are and how Percona has you covered. If you’re still confused, or you’d just like to ask some additional questions, drop a comment in the section below. Or shoot me a tweet @jontobs, and I’ll make sure to get back to you.

Dec
16
2016
--

MongoDB PIT Backups In Depth

MongoDB PIT Backups

In this blog is an in-depth discussion of MongoDB PIT backups.

Note: INTIMIDATION FREE ZONE!! This post is meant to give the reader most of the knowledge that is needed to understand the material. This includes basic MongoDB knowledge and other core concepts. I have tried to include links for further research where I can. Please use them where you need background and ask questions. We’re here to help!

Intro

In this two-part series, we’re going to fill in some of the gaps you might have to help you get awesome Point-in-Time (PIT) backups in MongoDB. These procedures will work for both MongoDB and Percona Server for MongoDB. This is meant to be a prequel to David Murphy’s MongoDB PIT Backup blog post. In that blog, David shows you how to take and to restore a dump up until a problem operation happened. If you haven’t read that post yet, hold off until you read this one. This foundation will help you better understand the how and why of the necessary steps. Let’s move onto some core concepts in MongoDB – but first, let me tell you what to expect in each part.

Blog 1 (this one): Core concepts – replica set backups, problem statement and solution

Blog 2: Getting Shardy – why backup consistency is tough and how to solve it

Core Concepts

Replica Set (process name: mongod) – MongoDB uses replica sets to distribute data for DR purposes. Replica sets are made up of primaries, secondaries and/or arbiters. These are much like master/slave pairs in MySQL, but with one big caveat. There’s an election protocol included that also handles failover! That’s right, high availability (HA) too! So, in general, there is a “rule of three” when thinking about the minimum number of servers to put in your replica sets. This is necessary to avoid a split-brain scenario.

Oplog (collection name: local.oplog.rs) The oplog is the log that records changes to the data on the MongoDB primary (secondaries also have an oplog). Much like MySQL, the non-primary members (secondaries) pull operations from the oplog and apply them to their own collections. Secondaries can pull from any member of the replica set that is ahead of them. The operations in the oplog are idempotent, meaning they always result in the same change to the database no matter how many times they’re performed.

Sharding (process name: mongos) – MongoDB also has built in horizontal scalability. This is implemented in a “shared nothing” architecture. A sharded cluster is made up of several replica sets. Each replica set contains a unique range of data. The data is distributed amongst replica sets based on a sharding key. There is also a sharding router (mongos) that runs as a routing umbrella over the cluster. In a sharded setup the application solely interfaces with the sharding router (never the replica sets themselves). This is the main function for scaling reads or writes in MongoDB. Scaling both takes very thoughtful design, but may not be possible.

Mongodump (utility name: mongodump) – MongoDB has built in database dump utility that can interface with mongod or mongos. Mongodump can also use the oplog of the server that it is run on to create a consistent point in time backup by using a “roll forward” strategy.

Mongorestore (utility name: mongorestore) – MongoDB has a built in database restore utility. Mongorestore is a rather simple utility that will replay binary dumps created by mongodump. When used with –oplogReplay when restoring a dump made with mongodump’s –oplog switch, it can make for a very functional backup facility.

Tip: make sure that user permissions are properly defined when using –oplogReplay – besides restore, anyAction and anyResource need to be granted.

OK, So What?

We’re going to first need to understand how backups work in a simple environment (a single replica set). Things are going to get much more interesting when we look at sharded clusters in the next post.

Backing Up

In a single replica set, things are pretty simple. There is no sharding router to deal with. You can get an entire data set by interacting with one server. The only problem that you need to deal with is the changes that are being made to the database while your mongodump is in process. If the concept of missed operations is a little fuzzy to you, just consider this simple use case:

We’re going to run a mongodump, and we have a collection with four documents:MongoDB PIT Backups

We start mongodump on this collection. We’re also running our application at the same time, because we can’t take down production. Mongodump scans from first to last in the collection (like a range scan based on ID). In this case mongodump has backed up of all documents from id:1 through id:4MongoDB PIT Backups

At this same moment in time, our application inserts id:3 into the collection.MongoDB PIT Backups

Is the document with id:3 going to be included in the mongodump? The answer is: most likely not. The problem is that you would expect it to be in the completed dump. However, if you need to restore this backup, you’re going to lose id:3. Now, this is perfectly normal in Disaster Recovery scenarios. Knowing that this is happening is the key part. Your backups will have the consistency of swiss cheese if you don’t have a way to track changes being made while the backup is running. Unknown data loss is one of the worst situations one can be in. What we need is a process to capture changes while the backup is running.

Here’s where the inclusion of the –oplog flag is very important. The –oplog flag will allow mongodump to capture changes that are being made to the database while the backup is running. Since the oplog is idempotent, there is chance that we’ll change the data during a restore. This gives the mongodump a consistent snapshot of when the dump completes, like most “clone” type operations.

Restoring

When running mongorestore, you can use the –oplogReplay option. Using oplog recovers to the point in time when the dump completed. Back to the use case, we may not capture id:3 on the first pass in this case, but as long as we’ve captured the oplog up until the backup completes, we’ll have id:3 available. When replaying the oplog during mongorestore, we will basically re-run the insert operation, completing the dataset. The oplog BSON timestamps all entries, so we know for sure until what point in time we’ve captured.

TIP: If you need to convert the timestamp to something human-readable, here’s something helpful

The Wrap Up

Now we have a firm understanding of the problem. Once we understand the problem, we can easily design a solution to ensure our backups have the integrity that our environment demands. In the next post, we’re going to step up the complexity by examining backups in conjunction with the most complex feature MongoDB has: sharding. Until then, post your feedback and questions in the comments section below.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com