Jul
05
2018
--

Configuring PMM Monitoring for MongoDB Cluster

In this blog, we will see how to configure Percona Monitoring and Management (PMM) monitoring for a MongoDB cluster. It’s very simple, like adding a replica set or standalone instances to PMM Monitoring. 

For this example, I have used docker to create PMM Server and MongoDB sharded cluster containers. If you want the steps I used to create the MongoDB® cluster environment using docker, I have shared them at the end of this blog. You can refer to this if you would like to create the same set up.

Configuring PMM Clients

For PMM installations, you can check these links for PMM installation and pmm-client setup. The following are the members of the MongoDB cluster:

mongos:
 mongos1  
config db: (replSet name - mongors1conf)  
 mongocfg1  
 mongocfg2  
 mongocfg3
Shard1: (replSet name - mongors1)  
 mongors1n1  
 mongors1n2
 mongors1n3
Shard1: (replSet name - mongors2)  
 mongors2n1
 mongors2n2  
 mongors2n3

In this setup, I installed the pmm-client on mongos1 server. Then I added an agent to monitor MongoDB metrics with cluster option as shown in the next code block. I named the cluster “mongoClusterPMM” (Note: you have to be root user or need sudo access to execute the

pmm-admin

 command):

root@90262f1360a0:/# pmm-admin add  mongodb --cluster mongoClusterPMM
[linux:metrics]   OK, now monitoring this system.
[mongodb:metrics] OK, now monitoring MongoDB metrics using URI localhost:27017
[mongodb:queries] OK, now monitoring MongoDB queries using URI localhost:27017
[mongodb:queries] It is required for correct operation that profiling of monitored MongoDB databases be enabled.
[mongodb:queries] Note that profiling is not enabled by default because it may reduce the performance of your MongoDB server.
[mongodb:queries] For more information read PMM documentation (https://www.percona.com/doc/percona-monitoring-and-management/conf-mongodb.html).
root@90262f1360a0:/# pmm-admin list
pmm-admin 1.11.0
PMM Server      | 172.17.0.2 (password-protected)
Client Name     | 90262f1360a0
Client Address  | 172.17.0.4 
Service Manager | unix-systemv
---------------- ------------- ----------- -------- ---------------- ------------------------
SERVICE TYPE     NAME          LOCAL PORT  RUNNING  DATA SOURCE      OPTIONS                 
---------------- ------------- ----------- -------- ---------------- ------------------------
mongodb:queries  90262f1360a0  -           YES      localhost:27017  query_examples=true     
linux:metrics    90262f1360a0  42000       YES      -                                        
mongodb:metrics  90262f1360a0  42003       YES      localhost:27017  cluster=mongoClusterPMM 
root@90262f1360a0:/#

As you can see, I used the 

pmm-admin add mongodb [options]

 command which enables monitoring for system, MongoDB metrics and queries. You need to enable profiler to monitor MongoDB queries. Use the next command to enable it at database level:

use db_name
db.setProfilingLevel(1)

Check this blog to know more about QAN setup and details. If you want to enable only MongoDB metrics, rather than queries to be monitored, then you can use the command 

pmm-admin add  mongodb:metrics [options]

 . After this, go to the PMM homepage (in my case localhost:8080) in your browser and select MongoDB Cluster Summary from the drop down list under the MongoDB option. The below screenshot shows the MongoDB Cluster—“mongoClusterPMM” statistics— collected by the agent that we added in mongos1 server. 

Percona Monitoring and Management for a MongoDB cluster

Did we miss something here? And do you see any metrics in the dashboard above except “Balancer Enabled” and “Chunks Balanced”?

No. This is because, PMM doesn’t have enough data to show in the dashboard. The shards are not added to the cluster yet and as you can see it displays 0 under shards. Let’s add two shard replica sets mongors1 and mongors2 in the mongos1 instance, and enable sharding to the database to complete the cluster setup as follows:

mongos> sh.addShard("mongors1/mongors1n1:27017,mongors1n2:27017,mongors1n3:27017")
{ "shardAdded" : "mongors1", "ok" : 1 }
mongos> sh.addShard("mongors2/mongors2n1:27017,mongors2n2:27017,mongors2n3:27017")
{ "shardAdded" : "mongors2", "ok" : 1 }

Now, I’ll add some data, collection and shard keys, and enable sharding so that we can see some statistics in the dashboard:

use vinodh
db.setProfilingLevel(1)
db.testColl.insertMany([{id1:1,name:"test insert”},{id1:2,name:"test insert"},{id1:3,name:"insert"},{id1:4,name:"insert"}])
db.testColl.ensureIndex({id1:1})
sh.enableSharding("vinodh")
sh.shardCollection("vinodh.testColl", {id1:1})

At last! Now you can see statistics in the graph for the MongoDB cluster:

PMM Statistics for a MongoDB Cluster 

Full Cluster Monitoring

We are not done yet. We have just added an agent to monitor mongos1 instance, and so only the cluster related statistics and QAN are collected through this node. For monitoring all nodes in the MongoDB cluster, we need to configure all nodes in PMM under “mongoClusterPMM” cluster. This will tell PMM that the configured nodes are part of the same cluster. We could also monitor the replica set related metrics for the members in config DBs and shards. Let’s add the monitoring agents in mongos1 server to monitor all MongoDB instances remotely. We’ll use these commands:

pmm-admin add mongodb:metrics --uri "mongodb://mongocfg1:27017/admin?replicaSet=mongors1conf" mongocfg1replSet --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongocfg2:27017/admin?replicaSet=mongors1conf" mongocfg2replSet --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongocfg3:27017/admin?replicaSet=mongors1conf" mongocfg3replSet --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongors1n1:27017/admin?replicaSet=mongors1" mongors1replSetn1 --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongors1n2:27017/admin?replicaSet=mongors1" mongors1replSetn2 --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongors1n3:27017/admin?replicaSet=mongors1" mongors1replSetn3 --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongors2n1:27017/admin?replicaSet=mongors2" mongors2replSetn1 --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongors2n2:27017/admin?replicaSet=mongors2" mongors2replSetn2 --cluster  mongoClusterPMM
pmm-admin add mongodb:metrics --uri "mongodb://mongors2n3:27017/admin?replicaSet=mongors2" mongors2replSetn3 --cluster  mongoClusterPMM

Once you have added them, you can check the agent’s (mongodb-exporter) status as follows:

root@90262f1360a0:/# pmm-admin list
pmm-admin 1.11.0
PMM Server      | 172.17.0.2 (password-protected)
Client Name     | 90262f1360a0
Client Address  | 172.17.0.4 
Service Manager | unix-systemv
---------------- ------------------ ----------- -------- ----------------------- ------------------------
SERVICE TYPE     NAME               LOCAL PORT  RUNNING  DATA SOURCE             OPTIONS                 
---------------- ------------------ ----------- -------- ----------------------- ------------------------
mongodb:queries  90262f1360a0       -           YES      localhost:27017         query_examples=true     
linux:metrics    90262f1360a0       42000       YES      -                                               
mongodb:metrics  90262f1360a0       42003       YES      localhost:27017         cluster=mongoClusterPMM 
mongodb:metrics  mongocfg1replSet   42004       YES      mongocfg1:27017/admin   cluster=mongoClusterPMM 
mongodb:metrics  mongocfg2replSet   42005       YES      mongocfg2:27017/admin   cluster=mongoClusterPMM 
mongodb:metrics  mongocfg3replSet   42006       YES      mongocfg3:27017/admin   cluster=mongoClusterPMM 
mongodb:metrics  mongors1replSetn1  42007       YES      mongors1n1:27017/admin  cluster=mongoClusterPMM 
mongodb:metrics  mongors1replSetn3  42008       YES      mongors1n3:27017/admin  cluster=mongoClusterPMM 
mongodb:metrics  mongors2replSetn1  42009       YES      mongors2n1:27017/admin  cluster=mongoClusterPMM 
mongodb:metrics  mongors2replSetn2  42010       YES      mongors2n2:27017/admin  cluster=mongoClusterPMM 
mongodb:metrics  mongors2replSetn3  42011       YES      mongors2n3:27017/admin  cluster=mongoClusterPMM 
mongodb:metrics  mongors1replSetn2  42012       YES      mongors1n2:27017/admin  cluster=mongoClusterPMM

So now you can monitor every member of the MongoDB cluster including their replica set and shard statistics. The next screenshot shows one of the members from the replica set under MongoDB replica set dashboard. You can select this from dashboard in this way: Cluster: mongoClusterPMM ? Replica Set: mongors1 ? Instance: mongors1replSetn2]:

 PMM MongoDB Replica Set Dashboard

MongoDB cluster docker setup

As I said in the beginning of this blog, I’ve provided the steps for the MongoDB cluster setup. Since this is for testing, I have used very simple configuration to setup the cluster environment using docker-compose. Before creating the MongoDB cluster, create the network in docker for the cluster nodes and PMM to connect each other like this:

Vinodhs-MBP:docker-mcluster vinodhkrish$ docker network create mongo-cluster-nw
7e3203aed630fed9b5f5f2b30e346301a58a068dc5f5bc0bfe38e2eef1d48787

I used the docker-compose.yaml file to create the docker environment here:

version: '2'
services:
  mongors1n1:
    container_name: mongors1n1
    image: mongo:3.4
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27047:27017
    expose:
      - "27017"
    environment:
      TERM: xterm
  mongors1n2:
    container_name: mongors1n2
    image: mongo:3.4
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27048:27017
    expose:
      - "27017"
    environment:
      TERM: xterm
  mongors1n3:
    container_name: mongors1n3
    image: mongo:3.4
    command: mongod --shardsvr --replSet mongors1 --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27049:27017
    expose:
      - "27017"
    environment:
      TERM: xterm
  mongors2n1:
    container_name: mongors2n1
    image: mongo:3.4
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27057:27017
    expose:
      - "27017"
    environment:
      TERM: xterm
  mongors2n2:
    container_name: mongors2n2
    image: mongo:3.4
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27058:27017
    expose:
      - "27017"
    environment:
      TERM: xterm
  mongors2n3:
    container_name: mongors2n3
    image: mongo:3.4
    command: mongod --shardsvr --replSet mongors2 --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27059:27017
    expose:
      - "27017"
    environment:
      TERM: xterm
  mongocfg1:
    container_name: mongocfg1
    image: mongo:3.4
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27025:27017
    environment:
      TERM: xterm
    expose:
      - "27017"
  mongocfg2:
    container_name: mongocfg2
    image: mongo:3.4
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27024:27017
    environment:
      TERM: xterm
    expose:
      - "27017"
  mongocfg3:
    container_name: mongocfg3
    image: mongo:3.4
    command: mongod --configsvr --replSet mongors1conf --dbpath /data/db --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27023:27017
    environment:
      TERM: xterm
    expose:
      - "27017"
  mongos1:
    container_name: mongos1
    image: mongo:3.4
    depends_on:
    - mongocfg1
      - mongocfg2
    command: mongos --configdb mongors1conf/mongocfg1:27017,mongocfg2:27017,mongocfg3:27017 --port 27017
    networks:
      - mongo-cluster
    ports:
      - 27019:27017
    expose:
      - "27017"
networks:
  mongo-cluster:
    external:
      name: mongo-cluster-nw

Hint:
In the docker compose file, above, if you are using docker-compose version >=3.x, then you can use the option for “networks” in the yaml file like this:

networks:
  mongo-cluster:
    name: mongo-cluster-nw

Now start the containers and configure the cluster quickly as follows:

Vinodhs-MBP:docker-mcluster vinodhkrish$ docker-compose up -d
Creating mongocfg1 ... 
Creating mongos1 ... 
Creating mongors2n3 ... 
Creating mongocfg2 ... 
Creating mongors1n1 ... 
Creating mongocfg3 ... 
Creating mongors2n2 ... 
Creating mongors1n3 ... 
Creating mongors2n1 ... 
Creating mongors1n2 ...

ConfigDB replica set setup

Vinodhs-MBP:docker-mcluster vinodhkrish$ docker exec -it mongocfg1 mongo --quiet
> rs.initiate({  _id: "mongors1conf",configsvr: true,
...   members: [{ _id:0, host: "mongocfg1" },{_id:1, host: "mongocfg2" },{ _id:2, host: "mongocfg3"}]
...   })
{ "ok" : 1 }

Shard1 replica set setup

Vinodhs-MBP:docker-mcluster vinodhkrish$ docker exec -it mongors1n1 mongo --quiet
> rs.initiate( { _id: "mongors1", members:[ {_id: 0, host: "mongors1n1"}, {_id:1, host: "mongors1n2"}, {_id:2, host: "mongors1n3"}] })
{ "ok" : 1 }

Shard2 replica set setup:

Vinodhs-MBP:docker-mcluster vinodhkrish$ docker exec -it mongors2n1 mongo --quiet
> rs.initiate({ _id: "mongors2",  members: [ { _id: 0, host: "mongors2n1"}, { _id:1, host: "mongors2n2"}, { _id:2, host: "mongors2n3"}] })
{ "ok" : 1 }

I hope that this blog helps you to setup a MongoDB cluster and also to configure PMM to monitor it! Have a great day!

The post Configuring PMM Monitoring for MongoDB Cluster appeared first on Percona Database Performance Blog.

Jun
13
2018
--

Zone Based Sharding in MongoDB

MongoDB shard zones

MongoDB shard zonesIn this blog post, we will discuss about how to use zone based sharding to deploy a sharded MongoDB cluster in a customized manner so that the queries and data will be redirected per geographical groupings. This feature of MongoDB is a part of its Data Center Awareness, that allows queries to be routed to particular MongoDB deployments considering physical locations or configurations of mongod instances.

Before moving on, let’s have an overview of this feature. You might already have some questions about zone based sharding. Was it recently introduced? If zone-based sharding is something we should use, then what about tag-aware sharding?

MongoDB supported tag-aware sharding from even the initial versions of MongoDB. This means tagging a range of shard keys values, associating that range with a shard, and redirecting operations to that specific tagged shard. This tag-aware sharding, since version 3.4, is referred to as ZONES. So, the only change is its name, and this is the reason sh.addShardTag(shard, tag) method is being used.

How it works

  1. With the help of a shard key, MongoDB allows you to create zones of sharded data – also known as shard zones.
  2. Each zone can be associated with one or more shards.
  3. Similarly, a shard can associate with any number of non-conflicting zones.
  4. MongoDB migrates chunks to the zone range in the selected shards.
  5. MongoDB routes read and write to a particular zone range that resides in particular shards.

Useful for what kind of deployments/applications?

  1. In cases where data needs to be routed to a particular shard due to some hardware configuration restrictions.
  2. Zones can be useful if there is the need to isolate specific data to a particular shard. For example, in the case of GDPR compliance that requires businesses to protect data and privacy for an individual within the EU.
  3. If an application is being used geographically and you want a query to route to the nearest shards for both reads and writes.

Let’s consider a Scenario

Consider the scenario of a school where students are experts in Biology, but most students are experts in Maths. So we have more data for the maths students compare to Biology students. In this example, deployment requires that Maths students data should route to the shard with the better configuration for a large amount of data. Both read and write will be served by specific shards.  All the Biology students will be served by another shard. To implement this, we will add a tag to deploy the zones to the shards.

For this scenario we have an environment with:

DB: “school”

Collection: “students”

Fields: “sId”, “subject”, “marks” and so on..

Indexed Fields: “subject” and “sId”

We enable sharding:

sh.enableSharding("school")

And create a shardkey: “subject” and “sId” 

sh.shardCollection("school.students", {subject: 1, sId: 1});

We have two shards in our test environment

shards:

{  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1 }
{  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1 }

Zone Deployment

1) Disable balancer

To prevent migration of the chunks across the cluster, disable the balancer for the “students” collection:

mongos> sh.disableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Before proceeding further make sure the balancer is not running. It is not a mandatory process, but it is always a good practice to make sure no migration of chunks takes place while configuring zones

mongos> sh.isBalancerRunning()
false

2) Add shard to the zone

A zone can be associated with a particular shard in the form of a tag, using the sh.addShardTag(), so a tag will be added to each shard. Here we are considering two zones so the tags “MATHS” and “BIOLOGY” need to be added.

mongos> sh.addShardTag( "shard0000" , "MATHS");
{ "ok" : 1 }
mongos> sh.addShardTag( "shard0001" , "BIOLOGY");
{ "ok" : 1 }

We can see zones are assigned in the form of tags as required against each shard.

mongos> sh.status()
 shards:
        {  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1,  "tags" : [ "MATHS" ] }
        {  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1,  "tags" : [ "BIOLOGY" ] }

3) Define ranges for each zone

Each zone covers one or more ranges of shard key values. Note: each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.

mongos> sh.addTagRange(
	"school.students",
	{ "subject" : "maths", "sId" : MinKey},
	{ "subject" : "maths", "sId" : MaxKey},
	"MATHS"
)
{ "ok" : 1 }
mongos> sh.addTagRange(
	"school.students",
	{ "subject" : "biology", "sId" : MinKey},
	{ "subject" : "biology", "sId" : MaxKey},
"BIOLOGY"
)
{ "ok" : 1 }

4) Enable balancer

Now enable the balancer so the chunks will migrate across the shards as per the requirement and all the read and write queries will be routed to the particular shards.

mongos> sh.enableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
mongos> sh.isBalancerRunning()
true

Let’s check how documents get routed as per the tags:

We have inserted 6 documents, 4 documents with “subject”:”maths” and 2 documents with “subject”:”biology”

mongos> db.students.find({"subject":"maths"}).count()
4
mongos> db.students.find({"subject":"biology"}).count()
2

Checking the shard distribution for the students collection:

mongos> db.students.getShardDistribution()
Shard shard0000 at 127.0.0.1:27003
data : 236B docs : 4 chunks : 4
estimated data per chunk : 59B
estimated docs per chunk : 1
Shard shard0001 at 127.0.0.1:27004
data : 122B docs : 2 chunks : 1
estimated data per chunk : 122B
estimated docs per chunk : 2

So in this test case, all the queries for the students collection have routed as per the tag used, with four documents inserted into shard0000 and two documents inserted to shard0001.

Any queries related to MATHS will route to shard0000 and queries related to BIOLOGY will route to shard0001, hence the load will be distributed as per the configuration of the shard, keeping the database performance optimized.

Sharding MongoDB using zones is a great feature provided by MongoDB. With the help of zones, data can be isolated to the specific shards. Or if we have any kind of hardware or configuration restrictions to the shards, it is a possible solution for routing the operations.

The post Zone Based Sharding in MongoDB appeared first on Percona Database Performance Blog.

Nov
14
2017
--

Common MongoDB Topologies

MongoDB Topologies

In this blog post, we’ll look at some common MongoDB topologies used in database deployments.

The question of the best architecture for MongoDB will arise in your conversations between developers and architects. In this blog, we wanted to go over the main sharded and unsharded designs, with their pros and cons.

We will first look at “Replica Sets.” Replica sets are the most basic form of high availability (HA) in MongoDB, and the building blocks for sharding. From there, we will cover sharding approaches and if you need to go that route.

Replica Set

Replica Set

From the MongoDB manual:

replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

Short of sharding, this is the ideal way to run MongoDB. Things like high availability, failover and recovery become automated with no action typically needed. If you expect large growth or more than 200G of data, you should consider using this plus sharding to reduce your mean time to recovery on a restore from backup.

Pros:

  • Elections happen automatically and unnoticed by application setup with retry
  • Rebuilding a new node, or adding an additional read-only node, is as easy as  “rs.add(‘hostname’)”
  • Can skip building indexes to improve write speed
  • Can have members
    • hidden in other geographic location
    • delayed replication
    • analytics nodes via taggings

Cons:

  • Depending on the size of the oplog used, you can use  10-100+% more space to hold to change data for replication
  • You must scale up not out meaning more expensive hardware
  • Recovery using a sharded approach is faster than having is all on a single node ( parallelism)

Flat Mongos (not load balanced)

Flat Mongos

This is one of MongoDB’s more suggested deployment designs. To understand why, we should talk about the driver and the fact that it supports a CSV list of mongos hosts for fail-over.

You can’t distribute writes in a single replica set. Instead, they all need to go to the primary node. You can distribute reads to the secondaries using Read Preferences. The driver keeps track of what is a primary and what is a secondary and routes queries appropriately.

Conceptually, the driver should have connections bucketed into the mongos they go to. This allowed the 3.0+ driver to be semi-stateless and ask any connection to a specific mongos to preform a getMore to that mongos. In theory, this allows slightly more concurrency. Realistically you only use one mongos, since this is only a fail-over system.

Pros:

  • Mongos is on its own gear, so it will not run the application out of memory
  • If Mongos doesn’t respond, the driver “fails-over” to the next in the list
  • Can be put closer to the database or application depending on your network and sorting needs

Cons:

  • You can’t use mongos in a list evenly, so it is only good for fail-over (not evenness) in most drivers. Please read specific drivers for support, and test thoroughly.

Load Balanced (preferred if possible)

Load Balanced

According to the Mongo docs:

You may also deploy a group of mongos instances and use a proxy/load balancer between the application and the mongos. In these deployments, you must configure the load balancer for client affinity so that every connection from a single client reaches the same mongos.

This is the model used by platforms such as ObjectRocket. In this pattern, you move mongos nodes to their own tier but then put them behind a load-balancer. In this design, you can even out the use of mongos by using a least-connection system. The challenge, however, is new drivers have issues with getMores. By this we mean the getMore selects a new random connection, and the load balancer can’t be sure which mongos should get it. Thus it has a one in N (number of mongos) chance of selecting the right one, or getting a “Cursor Not Found” error.

Pros:

  • Ability to have an even use of mongos
  • Mongos are separated from each other and the applications to prevent memory and CPU contention
  • You can easily remove or add mongos to help scale the layer without code changes
  • High availability at every level (multiple mongos, multiple configs, ReplSet for high availability and even multiple applications for app failures)

Cons:

  • If batching is used, unless switched to an IP pinning algorithm (which loses evenness) you can get “Cursor Not Found” errors due to the wrong mongos getting getMore and bulk connector connections

App-Centric Mongos

Appcentric Mongos

By and large, this is one of the most typical deployment designs for MongoDB sharding. In it, we have each application host talking to a mongos on the local network interface. This ensures there is very little latency to the application from the mongos.

Additionally, this means if a mongos fails, at most its own host is affected instead of the wider range of all application hosts.

Pros:

  • Local mongos on the loopback interface mean low to no latency
  • Limited scope of outage if this mongos fails
  • Can be geographically farther from the data storage in cases where you have a DR site

Cons:

  • Mongos is a memory hog; you could steal from your application memory to support running it here
    • Made worse with large batches, many connections, and sorting
  • Mongos is single-threaded and could become a bottleneck for your application
  • It is possible for a slow network to cause bad decision making, including duplicate databases on different shards. The functional result is data writing intermittently to two locations, and a DBA must remediate that at some point (think MMM VIP ping pong issues)
  • All sorting and limits are applied on the application host. In cases where the sort uses an index this is OK, but if not indexed the entire result set must be held in memory by mongos and then sorted, then returned the limited number of results to the client. This is the typical cause of mongos OOM’s errors due to the memory issues listed before.

Conclusion

The topologies are above cover many of the deployment needs for MongoDB environments. Hope this helps, and list any questions in the comments below.

Oct
17
2017
--

Webinar Wednesday, October 18, 2017: How to Scale with MongoDB

Scale with MongoDB

Scale with MongoDBJoin Percona’s Senior Technical Services Engineer Adamo Tonete as he presents How To Scale with MongoDB on Wednesday, October 18, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

In this webinar, we will talk about how to scale with MongoDB, up to thousands of writes and reads per second. What are the common issues when you scale with MongoDB? Is it better to shard or to add further secondaries?

We will walk through many common scaling situations, and through the steps needed to deploy a sharded cluster: from a single instance to a sharded environment. We will also talk about common mistakes/pitfalls a company can make when scaling its database – and how to avoid such situations.

Register for the webinar.

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL database administrator for three years. As the main database member of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 24×7 environment. Before that, he worked as a Microsoft SQL Server DBA for a large e-commerce company, mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA, and in the past three has moved to NoSQL technologies without giving up relational databases. He likes to play video games and study everything that is related to engines. Adamo lives with his wife in São Paulo, Brazil.

Oct
16
2017
--

When Should I Enable MongoDB Sharding?

MongoDB Sharding

MongoDB ShardingIn this blog post, we will talk about MongoDB sharding and walk through the main reasons why you should start a cluster (independent of the approach you have chosen).

Note: I will cover this subject in my webinar How To Scale with MongoDB on Wednesday, October 18, 2017, at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Sharding is the most complex architecture you can deploy using MongoDB, and there are two main approaches as to when to shard or not. The first is to configure the cluster as soon as possible – when you predict high throughput and fast data growth.

The second says you should use a cluster as the best alternative when the application demands more resources than the replica set can offer (such as low memory, an overloaded disk or high processor load). This approach is more corrective than preventative, but we will discuss that in the future.

1) Disaster recovery plan

Disaster recovery (DR) is a very delicate topic: how long would you tolerate an outage? If necessary, how long would it take you to restore the entire database? Depending on the database size and on disk speed, a backup/restore process might take hours or even days!
There is no hard number in Gigabytes to justify a cluster. But in general, you should engage when the database is more than 200GB the backup and restore processes might take a while to finish.
Let’s consider the case where we have a replica set with a 300GB database. The full restore process might last around four hours, whereas if the database has two shards, it will take about two hours – and depending on the number of shards we can improve that time. Simple math: if there are two shards, the restore process takes half of the time to restore when compared to a single replica set.

2) Hardware limitations

Disk and memory are inexpensive nowadays. However, this is not true when companies need to scale out to high numbers (such as TB of RAM). Suppose your cloud provider can only offer you up to 5,000 IOPS in the disk subsystem, but the application needs more than that to work correctly. To work around this performance limitation, it is better to start a cluster and divide the writes among instances. That said, if there are two shards the application will have 10000 IOPS available to use for writes and reads in the disk subsystem.

3) Storage engine limitations

There are a few storage engine limitations that can be a bottleneck in your use case. MMAPv2 does have a lock per collection, while WiredTiger has tickets that will limit the number of writes and reads happening concurrently. Although we can tweak the number of tickets available in WiredTiger, there is a virtual limit – which means that changing the available tickets might generate processor overload instead of increasing performance. If one of these situations becomes a bottleneck in your system, you start a cluster. Once you shard the collection, you distribute the load/lock among the different instances.

4) Hot data vs. cold data

Several databases only work with a small percentage of the data being stored. This is called hot data or working set. Cold data or historical data is rarely read, and demands considerable system resources when it is. So why spend money on expensive machines that only store cold data or low-value data? With a cluster deployment we can choose where the cold data is stored, and use cheap devices and disks to do so. The same is true for hot data – we can use better machines to have better performance. This methodology also speeds up writes and reads on the hot data, as the indexes are smaller and add less overhead to the system.

5) Geo-distributed data

It doesn’t matter whether this need comes from application design or legal compliance. If the data must stay within continent or country borders, a cluster helps make that happen. It is possible to limit data localization so that it is stored solely in a specific “part of the world.” The number of shards and their geographic positions is not essential for the application, as it only views the database. This is commonly used in worldwide companies for better performance, or simply to comply with the local law.

6) Infrastructure limitations

Infrastructure and hardware limitations are very similar. When thinking about infrastructure, however, we focus on specific cases when the instances should be small. An example is running MongoDB on Mesos. Some providers only offer a few cores and a limited amount of RAM. Even if you are willing to pay more for that, it is not possible to purchase more than they offer as their products. A cluster provides the option to split a small amount of data among a lot of shards, reaching the same performance a big and expensive machine provides.

7) Failure isolation

Consider that a replica set or a single instance holds all the data. If for any reason this instance/replica set goes down, the whole application goes down. In a cluster, if we lose one of the five shards, 80% of the data is still available. Running a few shards helps to isolate failures. Obviously, running a bunch of instances makes the cluster prone to have a failed instance, but as each shard must have at least three instances the probability of the entire shard being down is minimal. For providers that offer different zones, it is good practice to have different members of the shard in different availability zones (or even different regions).

8) Speed up queries

Queries can take too long, depending on the number of reads they perform. In a clustered deployment, queries can run in parallel and speed up the query response time. If a query runs in ten seconds in a replica set, it is very likely that the same query will run in five to six seconds if the cluster has two shards, and so on.

I hope this helps with MongoDB sharding. Having a cluster solves several other problems as well, and we have listed only a few of them. Don’t miss our webinar regarding scaling out MongoDB next Wednesday, October 18, 2017!

Feb
03
2017
--

Percona Live Featured Tutorial with Derek Downey, David Turner and René Cannaò — ProxySQL Tutorial

Percona Live Featured Tutorial

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Derek Downey (OSDB Practice Advocate, Pythian), David Turner (Storage SRE, Uber) and René Cannaò (MySQL SRE, Dropbox / ProxySQL). Their session is ProxySQL Tutorial. There is a stigma attached to database proxies when it comes to MySQL. This tutorial hopes to blow away that stigma by showing you what can be done with a proxy designed from the ground up to perform. I had a chance to speak with Derek, David and René and learn a bit more about ProxySQL:

Percona: How did you get into database technology? What do you love about it?

Percona Live Featured Tutorial
Derek Downey

Derek: I took a relational database course in college based on Oracle. Set theory and the relational model made a lot of sense to me. After a few years as a web developer at a small company, I transitioned to a hybrid SysAdmin/DBA role and got my first taste of the potential of “the cloud” (and some of the drawbacks).

I really came to understand that data is the lifeblood of any organization, and making sure it is always available through any disaster – from human error to hurricanes – is a unique and exciting challenge.

You should never notice the DBA if they’re doing their job right. There’s not much praise for a DBA on a job well done. But it’s a vital position to keep a company running. And that suits me just fine.

David: I started working for the Advanced Projects Group at the University of Missouri, now known as MOREnet. They were largely responsible for connecting all of the libraries and schools in the state to the Internet. I was initially helping them with their Internet presence as a webmaster. Later they needed help with their databases. I got very excited about working with Oracle at the time, and decided to join that team.

My relationship with MySQL started primarily because the cost of sharding Oracle was so high. Additionally, MySQL’s replication allowed us to use slaves. Oracle’s Dataguard/standby options wouldn’t allow reads from the slaves at that time. Lastly, MySQL was sort of “wild west” fun, since it lacked so many other features that Oracle had long ago. You had to get creative. It has been humbling to see how much innovation has come from the community and how far MySQL has come. And this is only the beginning!

Percona Live Featured Tutorial
René Cannaò

René: My career followed the classic path of a system administrator that ends up becoming a DBA. I used to work for a few companies as webmaster,  and finally as SysAdmin for a web hosting company. I always saw a similar pattern: “the bottleneck is in the database.” Nobody ever knew why the database was the bottleneck. I volunteered to improve the performance of this “unknown system.” Learning was a fun experience, and the result was extremely rewarding. I love understanding how databases operate and their internals. This is the only way to be able to get the maximum performance: “scientia potentia est”!

Percona: Your tutorial is called “ProxySQL Tutorial.” What exactly is ProxySQL, and what does it do?

Derek: I’ll leave it to René, the creator of ProxySQL, to give more detail on exactly what it is. But for a DBA trying to ensure their data is always available, it is a new and exciting tool in our toolbox.

René: ProxySQL is the MySQL data gateway. It’s the Stargate that can inspect, control, transform, manage, and route all traffic between clients and database servers. It builds reliable and fault-tolerant networks. It is a software bridge that empowers MySQL DBAs, built by DBAs for DBAs, allowing them to control all MySQL traffic where previously such traffic could not be controlled either on the client side (normally the developers’ realm) or server side (where there are not enough tools).

Percona Live Featured Tutorial
David Turner

David: Architecturally, ProxySQL is a separate process between the client and the database. Because traffic passes through it, ProxySQL can become many things (three of which got my attention). It can be a multiplexer, a filter, and a replicator.

Multiplexers reduce many signals down to a few. Web servers often open many static connections to MySQL. Since MySQL can only support a limited number of connections before performance suffers, ProxySQL’s ability to transparently manage tens of thousands of connections while only opening a few to the database is a great feature.

Administrators can update ProxySQL to filter and even rewrite queries based on patterns they decide on. As someone that has worked in operations and seen how long it can take to disable misbehaving applications, this is a very compelling feature. With ProxySQL in place, I can completely block a query from the database in no time.

ProxySQL’s replication or mirroring capability means that all of the queries sent to one database can now be sent to N databases. As someone that has to roll out new versions of MySQL, test index changes, and benchmark hardware this is also a compelling feature.

Percona: What are the advantages of using ProxySQL in a database environment?

René: ProxySQL is the bridge between the clients and the servers. It creates two layers, and controls all the communication between the two. Sitting in the middle, ProxySQL provides a lot of advantages normally impossible to achieve in a standard database environment, such as throttling or blocking queries, rewriting queries, implementing sharding, read/write splitting, caching, duplicating traffic, handling backend failures, failovers, integration with HA solutions, generating real-time statistics, etc. All this, without any application change, and completely transparent to the application.

Derek: For me, ProxySQL decouples the application from the data layer. This provides more control over the backend database environment to the DBA in regards to queries, maintenance and failovers, without impact to the application.

David: In addition to the roles noted above, ProxySQL can be part of a failover solution, routing queries to a new master when the current master fails. Other advantages are splitting queries over multiple databases to distribute the load, provide additional metrics, etc.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Derek: This tutorial highlights what ProxySQL is trying to achieve, and discusses how to add ProxySQL to common architectures environments. Attendees will get hands-on experience with the technology, and learn how to install and configure ProxySQL to achieve query rewriting, seamless failover, and query mirroring.

David: Not only hands-on experience with ProxySQL, but an understanding of how much they can leverage with it. The more I use ProxySQL, the more advantages I see to it. For example, I did not realize that by clustering ProxySQL processes I can distribute the query matching and rewrites over many hosts, as well as use them as a caching layer.

René: ProxySQL is built upon very innovative technologies. Certain architectural concepts like hostgroups, chaining of query rules, granular routing and sharding, query retries and the very powerful Admin interface are concepts not always intuitive for DBAs with experience using other proxies. This tutorial helps understand these concepts, and attendees get hand-on experience in the configuration of ProxySQL in various scenarios.

Percona: What are you most looking forward to at Percona Live?

David: First, the people. Next, everything everyone is working on. We’re really lucky to work in such an innovative and collaborative industry. As databases evolve, we are on the ground floor of their evolution. What an exciting place to be.

Derek: I am mostly looking forward to reconnecting with my peers in the MySQL community. Both ones I’ve formerly worked with or previously met at Percona Live, as well as meeting new open source database professionals and hearing how they are providing solutions for their companies.

René: I am looking forward to attending sessions regarding new features in MySQL 8 and other new technologies. But moreover, I am excited to interact with MySQL users and get more input on how to improve ProxySQL so that it can become an indispensable tool in any MySQL environment.

Register for Percona Live Data Performance Conference 2017, and see Derek, David and René present their ProxySQL Tutorial. Use the code FeaturedTutorial and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Jan
30
2017
--

MySQL Sharding Models for SaaS Applications

MySQL Sharding Models

MySQL Sharding ModelsIn this blog post, I’ll discuss MySQL sharding models, and how they apply to SaaS application environments.

MySQL is one of the most popular database technologies used to build many modern SaaS applications, ranging from simple productivity tools to business-critical applications for the financial and healthcare industries.

Pretty much any large scale SaaS application powered by MySQL uses sharding to scale. In this blog post, we will discuss sharding choices as they apply to these kinds of applications.

In MySQL, unlike in some more modern technologies such as MongoDB, there is no standard sharding implementation that the vast majority of applications use. In fact, if anything “no standard” is the standard. The common practice is to roll your own sharding framework, as famous MySQL deployments such as Facebook and Twitter have done. MySQL Cluster – the MySQL software that has built-in Automatic Sharding functionality – is rarely deployed (for a variety of reasons). MySQL Fabric, which has been the official sharding framework, has no traction either.

When sharding today, you have a choice of rolling your own system from scratch, using comprehensive sharding platform such as Vitess or using a proxy solution to assist you with sharding. For proxy solutions, MySQL Router is the official solution. But in reality, third party solutions such as open source ProxySQL, commercial ScaleArc and semi-commercial (BSL)  MariaDB MaxScale are widely used. Keep in mind, however, that traffic routing is only one of the problems that exist in large scale sharding implementations.

Beneath all these “front end” choices for sharding on the application database connection framework or database proxy, there are some lower level decisions that you’ve got to make. Namely, around how your data is going to be led out and organized on the MySQL nodes.

When it comes to SaaS applications, at least one answer is simple. It typically makes sense to shard your data by “customer” or “organization” using some sort of mapping tables. In the vast majority of cases, single node (or replicated cluster) should be powerful enough to handle all the data and load coming from each customer.

What Should I Ask Myself Now?

The next set questions you should ask yourself are around your SaaS applications:

  • How much revenue per customer are you generating?
  • Do your customers (or regulations) require data segregation?
  • Are all the customers about the same, or are there outliers?
  • Are all your customers running the same database schema?

I address the answers in the sections below.

How Much Revenue?

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

How much revenue per customer you’re generating is an important number. It defines how much infrastructure costs per customer you can afford. In the case of “freemium” models, and customers generating less than $1 a month an average, you might need to ensure low overhead per customer (even if you have to compromise on customer isolation).

Typically with low revenue customers, you have to co-locate the data inside the same MySQL instance (potentially even same tables). In the case of high revenue customers, isolation in separate MySQL instances (or even containers or virtualized OS instances) might be possible.

Data Segregation?

Isolation is another important area of consideration. Some enterprise customers might require that their data is physically separate from others. There could also be government regulations in play that require customer data to be stored in a specific physical location. If this is the case, you’re looking at completely dedicated customer environments. Or at the very least, separate database instances (which come with additional costs).

Customer Types?

Customer size and requirements are also important. A system designed to handle all customers of approximately the same scale (for example, personal accounting) is going to be different than if you are in the business of blog hosting. Some blogs might be 10,000 times more popular than the average.

Same Database Schema?

Finally, there is a there is the big question of whether all your customers are running the same database schema and same software version. If you want to support different software versions (if your customers require a negotiated maintenance window for software upgrades, for example) or different database schemas (if the schema is dependent on the custom functionality and modules customers might use, for example), keeping such customers in different MySQL schemas make sense.

Sharding Models

This gets us to the following sharding isolation models, ranging from lowest to highest:

  • Customers Share Schemas. This is the best choice when you have very large numbers of low-revenue customers. In this case, you would map multiple customers to the same set of tables, and include something like a customer_id field in them to filter customer data. This approach minimizes customer overhead and reduces customer isolation. It’s harder to backup/restore data for individual customers, and it is easier to introduce coding mistakes that can access other customers data. This method does not mean there is only one schema, but that there is a one-to-many relationship between schemas and customers.  For example, you might have 100 schema’s per MySQL instance, each handling 1000 to 10000 customers (depending on the application). Note that with a well-designed sharding implementation, you should be able to map customers individually to schemas. This allows you to have key customer data stored in dedicated schemas, or even on dedicated nodes.
  • Schema per Customer. This is probably the most common sharding approach in MySQL powered SaaS applications. Especially ones that have substantial revenue ($10+ per month / per customer). In this model, each customer’s data is stored in its own schema (database). This makes it very easy to backup/restore individual customers. It allows customers to have different schemas (i.e., add custom tables). It also allows them to run different versions of the application if desired. This approach allows the application server to use different MySQL users connecting on behalf of different customers, which adds an extra level of protection from accidental (or intentional) access of data that belongs to different customers. The schema per customer approach also makes it easier to move the shards around, and limits maintenance impact. The downside of this approach is higher overhead. It also results in a large number of tables per instance, and potentially larger numbers of files (which can be hard to manage).
  • Database Instance per Customer. You achieve even better isolation by having a MySQL instance per customer. This approach, however, increases overhead even further. The recent rise of light virtualization technologies and containers has reduced its usage.
  • OS Instance/Container per Customer. This approach allows you to improve isolation even further. It can be used for any customer, but can also be applied to selected customers in a model that uses Schema per Customer model for a majority of them.  Dedicated OS Instance, with improved isolation and better performance SLAs, might be a feature of some premium customer tiers. This method not only allows better isolation, but it also let’s you handle outliers better. You might chose to run a majority of your customers on the hardware (or cloud instance) that has best price/performance numbers, and also place some of the larger customers on the highest performance nodes.
  • Environment per customer. Finally, if you take this all the way you can build completely separate environments for customers. This includes databases, application servers and other required components. This is especially useful if you need to deploy the application close to the customer – which includes the appliance model, or deployment in the customer’s data center or cloud provider. This also allows you to accommodate customers if their data must be stored in a specific location. This is often due to government regulations. It is worth noting that many SaaS applications, even if they do not quite have one environment per customer, have multiple independent environments. These are often hosted in different locations or availability zones. Such setups allow you to reduce the impact of large-scale failures to only a portion of your customers. This avoids overloading your customer service group and allowing the operational organization to focus on repairing smaller environments.

The farther you go down this route – from the shared schema to an environment per customer – the more important is to have a high level of automation. With a shared schema, you often can get by with little automation (and some environments manually set up) and all the schema’s pre-created. If customer sign up requires setting up dedicated database instance or the whole environment, manual implementation doesn’t scale. For this type of setup, you need state-of-the-art automation and orchestration.

Conclusion

I hope this helps you to understand your options for MySQL sharding models. Each of the different sharding models for SaaS applications powered by MySQL have benefits and drawbacks. As you can see, many of these approaches require you to work with a large number of tables in the MySQL – this will be the topic of one of my next posts!

Jan
09
2017
--

MongoDB PIT Backups: Part 2

MongoDB PIT Backups

This blog post is the second in a series covering MongoDB PIT backups. You can find the first part here.

Sharding Makes Everything Fun(ner)

The first blog post in this series looked at MongoDB backups in a simple single-replica set environment. In this post, we’ll look at the scale-out use case. When sharding, we have exactly the same problem as we do on a single replica set. However, now the problem is multiplied by the number of replica sets in the cluster. Additionally, we have a bonus problem: each replica set has unique data. That means to get a truly consistent snapshot of the cluster, we need to orchestrate our backups to capture a single consistent point in time. Just so we’re on the same page, that means that every replica set needs to stop their backups at, or near, the same time that the slowest replica set stops. Are you sufficiently confused now? Let me get to a basic concept that I forgot to cover in the first post, and then I’ll give you a simple description of the problem.

Are you Write Concerned?

So far, I’ve neglected to talk about the very important role of “write concern” when taking consistent backups. In MongoDB, the database is not durable by default. By “durable,” I mean “on disk” when the database acknowledges receipt of an operation from your application. There are most likely several reasons for this. Most likely the biggest one originally was probably throughput given a lack of concurrency.

However, the side effect is possible data loss due to loss of operations applied only in memory. Changing the write concern to “journaled” (

j : true

) will change this behavior so that MongoDB journals changes before acknowledging them (you also need to be running with journal enabled).

TIP: For true durability in a replica set, you should use a write concern of “majority” for operations and the writeConcernMajorityJournalDefault : true on all replica set members (new to v3.4). This has the added benefit of greatly decreasing the chance of rollback after an election.

Wow, you’re inconsistent

At the risk of being repetitive, the crux of this issue is that we need to run a backup on every shard (replica set). This is necessary because every shard has a different piece of the data set. Each piece of that data set is necessary to get an entire picture of the data set for the cluster (and thus, your application). Since we’re using mongodump, we’ll only have a consistent snapshot at the point in time when the backup completes. This means we must end each shard’s backup at a consistent point in time. We cannot expect that the backup will complete in exactly the same amount of time on every shard, which is what we’ll need for a consistent point in time across the cluster. This means that Shard1 might have a backup that is consistent to 12:05 PM, and another shard that is consistent to 12:06 PM. In a high traffic environment (the kind that’s likely to need horizontal scale), this could mean thousands of lost documents. Here’s a diagram:

MongoDB PIT Backups
MongoDB PIT Backups

 

Here’s the math to illustrate the problem:

  • Shard1’s backup will contain 30,000 documents ((100 docs * 60 secs) * 5 mins)
  • Shard2’s backup will contain 36,000 documents ((100 docs * 60 secs) * 6 mins)

In this example, to get a consistent point in time you’d need to remove all insert, update and delete operations that happened on Shard 2 from the time that Shard 1’s backup completed (6,000 documents). This means examining the timestamp of every operation in the oplog and reversing it’s operation. That’s a very intensive process, and will be unique for every mongodump that’s executed. Furthermore, this is a pretty tricky thing to do. The repeatable and much more efficient method is to have backups that finish in a consistent state, ready to restore when needed.

Luckily, Percona has you covered!

You’re getting more consistent

Having data is important, but knowing what data you have is even more important. Here’s how you can be sure you know what you have in your MongoDB backups:

David Murphy has released his MongoDB Consistent Backup Tool in the Percona Labs github account, and has written a very informative blog post about it. My goal with these blog posts is to make it even easier to understand the problem and how to solve it. We’ve already had an exhaustive discussion about the problem on both small and large scales. How about the solution?

It’s actually pretty simple. The solution, at a basic level, is to use a simple algorithm to decide when a cluster-wide consistent point-in-time can be reached. In the MongoDB Consistent Backup tool, this is done by the backup host kicking off backups on a “known good member” of each shard (that’s a pretty cool feature by itself) and then tracking the progress of each dump. At the same time the backup is kicked off, the backup host kicks off a separate thread that tails the oplog on each “known good member” until the mongodump on the slowest shard completes. By using this method, we have a very simple way of deciding when we can get a cluster-wide consistent snapshot. In other words, when the slowest member completes their piece of the workload. Here’s the same workload from Figure 4, but with the MongoDB Consistent Backup Tool methodology:

MongoDB PIT Backups
MongoDB PIT Backups

 

TIP: The amount of time that it takes to perform these backups is often decided by two factors:

  1. How evenly distributed the data is across the shards (balanced)
  2. How much data each shard contains (whether or not it’s balanced).

The takeaway here is that you may need to shard so that each shard has a manageable volume of data. This allows you to hit your backup/restore windows more easily.

…The Proverbial “Monkey Wrench”

There’s always a “gotcha” just when you think you’ve got your mind around any difficult concept. Of course, this is no different.

There is one very critical concept in sharding that we didn’t cover: tracking what data lies on which shard. This is important for routing the workload to the right place, and balancing the data across the shards. In MongoDB, this is completed by the config servers. If you cannot reach (or recover) your config servers, your entire cluster is lost! For obvious reasons, you need to back them up as well. With the Percona Labs MongoDB Consistent Backup Tool, there are actually two modes used to backup config servers: v3.2 and greater, and legacy. The reason is that in v3.2, config servers went from mirrors to a standard replica set. In v3.2 mode, we just treat the config servers like another replica set. They have their own mongodump and oplog tail thread. They get a backup that is consistent to the same point in time as all other shards in the cluster. If you’re on a version of MongoDB prior to v3.2, and you’re interested in an explanation of legacy mode, please refer back to David’s blog post.

The Wrap Up

We’ve examined the problems with getting consistent backups in a running MongoDB environment in this and the previous blog posts. Whether you have a single replica set or a sharded cluster, you should have a much better understanding of what the problems are and how Percona has you covered. If you’re still confused, or you’d just like to ask some additional questions, drop a comment in the section below. Or shoot me a tweet @jontobs, and I’ll make sure to get back to you.

Jan
04
2017
--

MongoDB 3.4: Sharding Improvements

Sharding Improvements

In this blog post, we will discuss some of the Sharding improvements in the recent MongoDB 3.4 GA release.

Sharding?

Let’s go over what MongoDB Sharding “is” at a simplified, high level.

The concept of “sharding” exists to allow MongoDB to scale to very large data sets that may exceed the available resources of a single node or replica set. When a MongoDB collection is sharding-enabled, it’s data is broken into ranges called “chunks.” These are intended to be evenly distributed across many nodes or replica sets (called “shards”). MongoDB computes the ranges of a given chunk based on a mandatory document-key called a “shard key.” The shard key is used in all read and write queries to route a database request to the right shard.

The MongoDB ecosystem introduced additional architectural components so that this could happen:

  1. A shard. A single MongoDB node or replica set used for storing the cluster data. There are usually many shards in a cluster and more shards can be added/removed to scale.
  2. “mongos” router. A sharding-aware router for handling client traffic. There can be one or more mongos instances in a cluster.
  3. The “config servers”. Used for storing the cluster metadata. Config servers are essentially regular MongoDB servers dedicated to storing the cluster metadata within the “config” database. Database traffic does not access these servers, only the mongos.

Under sharding, all client database traffic is directed to one or more of the mongos router process(es), which use the cluster metadata, such as the chunk ranges, the members of the cluster, etc., to service requests while keeping the details of sharding transparent to the client driver. Without the cluster metadata, the routing process(es) do not know where the data is, making the config servers a critical component of sharding. Due to this, at least three config servers are required for full fault tolerance.

Sharding: Chunk Balancer

To ensure chunks are always balanced among the cluster shards, a process named the “chunk balancer” (or simply “balancer”) runs periodically, moving chunks from shard to shard to ensure data is evenly distributed. When a chunk is balanced, the balancer doesn’t actually move any data, it merely coordinates the transfer between the source and destination shard and updates the cluster metadata when chunks have moved.

Before MongoDB 3.4, the chunk balancer would run on whichever mongos process could acquire a cluster-wide balancer lock first. From my perspective this was a poor architectural decision for these reasons:

  1. Predictability. Due to the first-to-lock nature, the mongos process running the balancer is essentially chosen at random. This can complicate troubleshooting as you try to chase down which mongos process is the active balancer to see what it is doing, it’s logs, etc. As a further example: it is common in some deployments for the mongos process to run locally on application servers and in large organizations it is common for a DBA to not have access to application hosts – something I’ve ran into many times myself.
  2. Efficiency. mongos was supposed to be a stateless router, not a critical administrative process! As all client traffic passes in-line through the mongos process, it is important for it to be as simple, reliable and efficient as possible.
  3. Reliability. in order to operate, the mongos process must read and write cluster metadata hosted within the config servers. As mongos is almost always running on a physically separate host from the config servers, any disruption (network, hardware, etc) in between the balancer and config server nodes will break balancing!

Luckily, MongoDB 3.4 has come to check this problem (and many others) off of my holiday wish list!

MongoDB 3.4: Chunk Balancer Moved to Config Servers

In MongoDB 3.4, the chunk balancer was moved to the Primary config server, bringing these solutions to my concerns about the chunk balancer:

  1. Predictability. The balancer is always running in a single, predictable place: the primary config server.
  2. Efficiency. Removing the balancer from “mongos” allows it to worry about routing only. Also, as config servers are generally dedicated nodes that are never directly hit by client database traffic, in my opinion this is a more efficient place for the balancer to run.
  3. Reliability. Perhaps the biggest win I see with this change is the balancer can no longer lose connectivity with the cluster metadata that is stored on separate hosts. The balancer now runs inside the same node as the metadata!
  4. Centralized. As a freebie, now all the background/administrative components of Sharding are in one place!

Note: although we expect the overhead of the balancer to be negligible, keep in mind that a minor overhead is added to the config server Primary-node due to this change.

See more about this change here: https://docs.mongodb.com/manual/release-notes/3.4/#balancer-on-config-server-primary

MongoDB 3.4: Required Config Server Replica Set

In MongoDB releases before 3.2, the set of cluster config servers received updates using a mode called Sync Cluster Connection Config (SCCC) to ensure all nodes received the same change. This essentially meant that any updates to cluster metadata would be sent N x times from the mongos to the config servers in a fan-out pattern. This is another legacy design choice that always confused me, considering MongoDB already has facilities for reliably replicating data: MongoDB Replication. Plus without transactions in MongoDB, there are some areas where SCCC can fail.

Luckily MongoDB 3.2 introduced Replica-Set based config servers as an optional feature. This moved us away from the SCCC fan-out mode to traditional replication and write concerns for consistency. This brought many benefits: rebuilding a config server node became simpler, backups became more straightforward and flexible and the move towards a consistent method of achieving consistent updates simplified the architecture.

MongoDB 3.4 requires Replica-Set based config servers, and removed the SCCC mode entirely. This might require some changes for some, but I think the benefits outweigh the cost. For more details on how to upgrade from SCCC to Replica-Set based config servers, see this article.

Note: the balancer in MongoDB 3.4 always runs on the config server that is the ‘PRIMARY’ of the replica set.

MongoDB 3.4: Parallel Balancing

As intended, MongoDB Sharded Clusters can get very big, with 10s, 100s or even 1000s of shards. Historically MongoDB’s balancer worked in serial, meaning it could only coordinate 1 x chunk balancing round at any given time within the cluster. On very large clusters, this limitation poses a huge throughput limitation on balancing: all chunk moves have to wait in a serial queue.

In MongoDB 3.4, the chunk balancer can now perform several chunk moves in parallel given they’re between a unique source and destination shard. Given shards: A, B, C and D, this means that a migration from A -> B can now happen at the same time as a migration from C -> D as they’re mutually exclusive source and destination shards. Of course, you need four or more shards to really see the benefit of this change.

Of course, on large clusters this change could introduce a significant change in network bandwidth usage. This is due to the ability for several balancing operations to occur at once. Be sure to test your network capacity with this change.

See more about this change here: https://docs.mongodb.com/manual/release-notes/3.4/#faster-balancing

Conclusion

Of course, there were many other improvements to sharding and other areas in 3.4. We hope to cover more in the future. These are just some of my personal highlights.

For more information about what has changed in the new GA release, see: MongoDB 3.4 Release Notes. Also, please check out our beta release of Percona Server for MongoDB 3.4. This includes all the improvements in MongoDB 3.4 plus additional storage engines and features.

Links

Sep
22
2016
--

Percona Live Europe featured talk with Anthony Yeh — Launching Vitess: How to run YouTube’s MySQL sharding engine

Percona Live Europe featured talk

Percona Live Europe featured talkWelcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Anthony Yeh, Software Engineer, Google. His talk will be on Launching Vitess: How to run YouTube’s MySQL sharding engine. Vitess is YouTube’s solution for scaling MySQL horizontally through sharding, built as a general-purpose, open-source project. Now that Vitess 2.0 has reached general availability, they’re moving beyond “getting started” guides and working with users to develop and document best practices for launching Vitess in their own production environments.

I had a chance to speak with Anthony and learn a bit more about Vitess:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Anthony: Before joining YouTube as a software engineer, I worked on photonic integrated circuits as a graduate student researcher at U.C. Berkeley. So I guess you could say I took a rather circuitous path to the database field. My co-presenter Dan and I have that in common. If you see him at the conference, I recommend asking him about his story.

I don’t actually think of myself as being in database development though; that’s probably more Sugu‘s area. I treat Vitess as just another distributed system, and my job is to make it more automated, more reliable, and easier to administer. My favorite part of this job is when open-source contributors send us new features and plug-ins, and all I have to do is review them. Keep those pull requests coming!

Percona: Your talk is going to be on “Launching Vitess: How to run YouTube’s MySQL sharding engine.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?

Anthony: I joined Vitess a little over two years ago, right when they decided to expand the team’s focus to include external usability as a key goal. The idea was to transform Vitess from a piece of YouTube infrastructure that happens to be open-source, into an open-source solution that YouTube happens to use.

At first, the biggest challenge was getting people to tell us what they needed to make Vitess work well in their environments. Attending Percona Live is a great way to keep a pulse on how the industry uses MySQL, and talk with exactly the people who can give us that feedback. Progress really picked up early this year when companies like Flipkart and Pixel Federation started not only trying out Vitess on their systems, but contributing back features, plug-ins, and connectors.

My half of the talk will summarize all the things we’ve learned from these early adopters about migrating to Vitess and running it in various environments. We also convinced one of our Site Reliability Engineers to give the second half of the talk, to share firsthand what it’s like to run Vitess in production.

Percona: What new features and fixes can people look forward to in the latest release?

Anthony: The biggest new feature in Vitess 2.0 is something that was codenamed “V3” (sorry about the naming confusion). In a nutshell, this completes the transition of all sharding logic from the app into Vitess: at first you had to give us a shard name, then you just had to tell us the sharding key value. Now you just send a regular query and we do the rest.

To make this possible, Vitess has to parse and analyze the query, for which it then builds a distributed execution plan. For queries served by a single shard, the plan collapses to a simple routing decision without extra processing. But for things like cross-shard joins, Vitess will generate new queries and combine results from multiple shards for you, in much the same way your app would otherwise do it.

Percona: Why is sharding beneficial to databases? Are there pros and cons to sharding?

Anthony: The main pro for sharding is horizontal scalability, the holy grail of distributed databases. It offers the promise of a magical knob that you simply turn up when you need more capacity. The biggest cons have usually been that it’s a lot of work to make your app handle sharding, and it multiplies the operational overhead as you add more and more database servers.

The goal of Vitess is to create a generalized solution to these problems, so we can all stop building one-off sharding layers within our apps, and replace a sea of management scripts with a holistic, self-healing distributed database.

Percona: Vitess is billed as being for web applications based in cloud and dedicated hardware infrastructures. Was it designed specifically for one or the other, and does it work better for certain environments?

Anthony: Vitess started out on dedicated YouTube hardware and later moved into Borg, which is Google’s internal precursor to Kubernetes. So we know from experience that it works in both types of environments. But like any distributed system, there are lots of benefits to running Vitess under some kind of cluster orchestration system. We provide sample configs to get you started on Kubernetes, but we would love to also have examples for other orchestration platforms like Mesos, Swarm, or Nomad, and we’d welcome contributions in this area.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Anthony: I hope to meet people who have ideas about how to make Vitess better, and I look forward to learning more about how others are solving similar problems.

You can read more about Anthony and Vitess on the Vitess blog.

Want to find out more about Anthony, Vitess, YouTube and and sharding? Register for Percona Live Europe 2016, and come see his talk Launching Vitess: How to run YouTube’s MySQL sharding engine.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com