Nov
29
2018
--

MySQL High Availability: Stale Reads and How to Fix Them

solutions for MySQL Stale Reads

solutions for MySQL Stale ReadsContinuing on the series of blog posts about MySQL High Availability, today we will talk about stale reads and how to overcome this issue.

The Problem

Stale reads is a read operation that fetches an incorrect value from a source that has not synchronized an update operation to the value (source Wiktionary).

A practical scenario is when your application applies INSERT or UPDATE data to your master/writer node, and has to read it immediately after. If this particular read is served from another server in the replication/cluster topology, the data is either not there yet (in case of an INSERT) or it still provides the old value (in case of an UPDATE).

If your application or part of your application is sensitive to stale reads, then this is something to consider when implementing HA/load balancing.

How NOT to fix stale reads

While working with customers, we have seen a few incorrect attempts to fix the issue:

SELECT SLEEP(X)

The most common incorrect approach that we see in Percona support is when customers add a sleep between the write and the read. This may work in some cases, but it’s not 100% reliable for all scenarios, and it can add latency when there is no need.

Let’s review an example where by the time you query your slave, the data is already applied and you have configured your transaction to start with a SELECT SLEEP(1). In this case, you just added 1000ms latency when there was no need for it.

Another example could be when the slave is lagging behind for more than whatever you configured as the parameter on the sleep command. In this case, you will have to create a login to keep trying the sleep until the slave has received the data: potentially it could take several seconds.

Reference: SELECT SLEEP.

Semisync replication

By default, MySQL replication is asynchronous, and this is exactly what causes the stale read. However, MySQL distributes a plugin that can make the replication semi-synchronous. We have seen customers enabling it hoping the stale reads problem will go away. In fact, that is not the case. The semi-synchronous plugin only ensures that at least one slave has received it (IO Thread has streamed the binlog event to relay log), but the action of applying the event is done asynchronously. In other words, stale reads are still a problem with semi-sync replication.

Reference: Semisync replication.

How to PROPERLY fix stale reads

There are several ways to fix/overcome this situation, and each one has its pros and cons:

1) MASTER_POS_WAIT

Consists of executing a SHOW MASTER STATUS right after your write, getting the binlog file and position, connecting on a slave, and executing the SELECT MASTER_POS_WAIT function, passing the binlog file and position as parameters. The execution will block until the slave has applied the position via the function. You can optionally pass a timeout to exit the function in case of exceeding this timeout.

Pros:

  • Works on all MySQL versions
  • No prerequisites

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, and can add significant latency to queries in cases where a slave/node is too far behind.

Reference: MASTER_POS_WAIT.

2) WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

Requires GTID: this is similar to the previous approach, but in this case, we need to track the executed GTID from the master (also available on SHOW MASTER STATUS).

Pros:

  • Works on all MySQL versions.

Cons:

  • Requires an application code rewrite.
  • It’s a blocking operation, can add significant latency to queries in cases where a slave/node is too far behind.
  • As it requires GTID, it only works on versions from 5.6 onwards.

Reference: WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS

3) Querying slave_relay_log_info

Consists of enabling relay_log_info_repository=TABLE and sync_relay_log_info=1 on the slave, and using a similar approach to option 1. After the write, execute  SHOW MASTER STATUS, connect to the slave, and query mysql.slave_relay_log_info , passing the binlog name and position to verify if the slave is already applying a position after the one you got from SHOW MASTER STATUS.

Pros:

  • This is not a blocking operation.
  • In cases where the slave is missing the position you require, you can try to connect to another slave and repeat the process. There is even an option to fail over back to the master if none of the slaves have the said position.

Cons:

  • Requires an application code rewrite.
  • In cases of checking multiple slaves, this can add significant latency.

Reference: slave_relay_log_info.

4) wsrep-sync-wait

Requires Galera/Percona XtraDB Cluster: Consists of setting a global/session variable to enforce consistency. This will block execution of subsequent queries until the node has applied all write-sets from it’s applier queue. It can be configured to trigger on multiple commands, such as SELECT, INSERT, and so on.

Pros:

  • Easy to implement. Built-in as a SESSION variable.

Cons:

  • Requires an application code rewrite in the event that you want to implement the solution on per session basis.
  • It’s a blocking operation, and can add significant latency to queries if a slave/node is too far behind.

Reference: wsrep-sync-wait

5) ProxySQL 2.0 GTID consistent reads

Requires MySQL 5.7 and GTID: MySQL 5.7 returns the GTID generated by a commit as part of the OK package. ProxySQL with the help of binlog readers installed on MySQL servers can keep track of which GTID the slave has already applied. With this information + the GTID received from the OK package at the moment of the write, ProxySQL will decide if it will route a subsequent read to one of the slaves/read nodes or if the master/write node will serve the read.

Pros:

  • Transparent to the application – no code changes are required.
  • Adds minimal latency.

Cons:

  • This still a new feature of ProxySQL 2.0, which is not yet GA.

Referece: GTID consistent reads.

Conclusions

Undesirable issues can arise from adding HA and distributing the load across multiple servers. Stale reads can cause an impact on applications sensitive to them. We have demonstrated various approaches you can use to overcome them.


Photo by Tim Evans on Unsplash

Aug
22
2018
--

Webinar Thurs 8/23: MySQL vs MongoDB – Choosing the Right Technology for Your Application

mongodb vs mysql which to choose

mongodb vs mysql which to choosePlease join Percona’s CEO, Peter Zaitsev as he presents MySQL vs MongoDB – Choosing the Right Technology for Your Application on Thursday, August 23, 2018, at 10:30 AM PDT (UTC-7) / 1:30 PM EDT (UTC-4).

Are you considering to adopt the most popular open source relational database or the most popular open source NoSQL database? Which one is right for your particular application?

In this presentation, we will look into advantages and disadvantages of both and examine the applications where MySQL or MongoDB are the most appropriate choice.

Register Now

The post Webinar Thurs 8/23: MySQL vs MongoDB – Choosing the Right Technology for Your Application appeared first on Percona Database Performance Blog.

Aug
16
2018
--

MongoDB: how to use the JSON Schema Validator

JSON Schema Validator for MongoDB

JSON Schema Validator for MongoDBThe flexibility of MongoDB as a schemaless database is one of its strengths. In early versions, it was left to application developers to ensure that any necessary data validation is implemented. With the introduction of JSON Schema Validator there are new techniques to enforce data integrity for MongoDB. In this article, we use examples to show you how to use the JSON Schema Validator to introduce validation checks at the database level—and consider the pros and cons of doing so.

Why validate?

MongoDB is a schemaless database. This means that we don’t have to define a fixed schema for a collection. We just need to insert a JSON document into a collection and that’s all. Documents in the same collection can have a completely different set of fields, and even the same fields can have different types on different documents. The same object can be a string in some documents and can be a number in other documents.

The schemaless feature has given MongoDB great flexibility and the capability to adapt the database to the changing needs of applications. Let’s say that this flexibility is one of the main reasons to use MongoDB. Relational databases are not so flexible: you always need to define a schema at first. Then, when you need to add new columns, create new tables or change existing architecture to respond to the needs of the applications it’s sometimes a very hard task.

The real world can often be messy and MongoDB can really help, but in most cases the real world requires some kind of backbone architecture too. In real applications built on MongoDB there is always some kind of “fixed schema” or “validation rules” in collections and in documents. It’s possible to have in a collection two documents that represent two completely different things.

Well, it’s technically possible, but it doesn’t make sense in most cases for the application. Most of the arguments for enforcing a schema on the data are well known: schemas maintain structure, giving a clear idea of what’s going into the database, reducing preventable bugs and allowing for cleaner code. Schemas are a form of self-documenting code, as they describe exactly what type of data something should be, and they let you know what checks will be performed. It’s good to be flexible, but behind the scenes we need some strong regulations.

So, what we need to do is to find a balance between flexibility and schema validation. In real world applications, we need to define a sort of “backbone schema” for our data and retain the possibility to be flexible to manage specific particularities. In the past developers implemented schema validation in their applications, but starting from version 3.6, MongoDB supports the JSON Schema Validator. We can rely on it to define a fixed schema and validation rules directly into the database and free the applications to take care of it.

Let’s have a look at how it works.

JSON Schema Validator

In fact, a “Validation Schema” was already introduced in 3.2 but the new “JSON Schema Validator” introduced in the 3.6 release is by far the best and a friendly way to manage validations in MongoDB.

What we need to do is to define the rules using the operator $jsonSchema in the db.createCollection command. The $jsonSchema operator requires a JSON document where we specify all the rules to be applied on each inserted or updated document: for example what are the required fields, what type the fields must be, what are the ranges of the values, what pattern a specific field must have, and so on.

Let’s have a look at the following example where we create a collection people defining validation rules with JSON Schema Validator.

db.createCollection( "people" , {
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "name", "surname", "email" ],
      properties: {
         name: {
            bsonType: "string",
            description: "required and must be a string" },
         surname: {
            bsonType: "string",
            description: "required and must be a string" },
         email: {
            bsonType: "string",
            pattern: "^.+\@.+$",
            description: "required and must be a valid email address" },
         year_of_birth: {
            bsonType: "int",
            minimum: 1900,
            maximum: 2018,
            description: "the value must be in the range 1900-2018" },
         gender: {
            enum: [ "M", "F" ],
            description: "can be only M or F" }
      }
   }
}})

Based on what we have defined, only 3 fields are strictly required in every document of the collection: name, surname, and email. In particular, the email field must have a specific pattern to be sure the content is a valid address. (Note: to validate an email address you need a more complex regular expression, here we just use a simpler version just to check there is the @ symbol). Other fields are not required but in case someone inserts them, we have defined a validation rule.

Let’s try to do some example inserting documents to test if everything is working as expected.

Insert a document with one of the required fields missing:

MongoDB > db.people.insert( { name : "John", surname : "Smith" } )
    WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Insert a document with all the required fields but with an invalid email address

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith.gmail.com" } )
   WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Finally, insert a valid document

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith@gmail.com" } )
WriteResult({ "nInserted" : 1 })

Let’s try now to do more inserts including of other fields.

MongoDB > db.people.insert( { name : "Bruce", surname : "Dickinson", email : "bruce@gmail.com", year_of_birth : NumberInt(1958), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Corrado", surname : "Pandiani", email : "corrado.pandiani@percona.com", year_of_birth : NumberInt(1971), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Marie", surname : "Adamson", email : "marie@gmail.com", year_of_birth : NumberInt(1992), gender : "F" } )
WriteResult({ "nInserted" : 1 })

The records were inserted correctly because all the rules on the required fields, and on the other not required fields, were satisfied. Let’s see now a case where the year_of_birth or gender fields are not correct.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(1980), gender : "X" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", year_of_birth : NumberInt(1899), gender : "F" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In the first query gender is X, but the valid values are only M or F. In the second query year of birth is outside the permitted range.

Let’s try now to insert documents with arbitrary extra fields that are not in the JSON Schema Validator.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(2000), gender : "M", shirt_size : "XL", preferred_band : "Coldplay" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", gender : "F", shirt_size : "M", preferred_band : "Maroon Five" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.find().pretty()
{
"_id" : ObjectId("5b6b12e0f213dc83a7f5b5e8"),
"name" : "John",
"surname" : "Smith",
"email" : "john.smith@gmail.com"
}
{
"_id" : ObjectId("5b6b130ff213dc83a7f5b5e9"),
"name" : "Bruce",
"surname" : "Dickinson",
"email" : "bruce@gmail.com",
"year_of_birth" : 1958,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1328f213dc83a7f5b5ea"),
"name" : "Corrado",
"surname" : "Pandiani",
"email" : "corrado.pandiani@percona.com",
"year_of_birth" : 1971,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1356f213dc83a7f5b5ed"),
"name" : "Marie",
"surname" : "Adamson",
"email" : "marie@gmail.com",
"year_of_birth" : 1992,
"gender" : "F"
}
{
"_id" : ObjectId("5b6b1455f213dc83a7f5b5f0"),
"name" : "Tom",
"surname" : "Tom",
"email" : "tom@gmail.com",
"year_of_birth" : 2000,
"gender" : "M",
"shirt_size" : "XL",
"preferred_band" : "Coldplay"
}
{
"_id" : ObjectId("5b6b1476f213dc83a7f5b5f1"),
"name" : "Luise",
"surname" : "Luise",
"email" : "tom@gmail.com",
"gender" : "F",
"shirt_size" : "M",
"preferred_band" : "Maroon Five"
}

As we can see, we have the flexibility to add new fields with no restrictions on the permitted values.

Having a really fixed schema

The behavior we have seen so far to permit the addition of extra fields that are not in the validation rules is the default. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the additionalProperties: false parameter in the createCollection command.

In the following example, we create a validator to permit only the required fields. No other extra fields are permitted.

db.createCollection( "people2" , {
   validator: {
     $jsonSchema: {
        bsonType: "object",
        additionalProperties: false,
        properties: {
           _id : {
              bsonType: "objectId" },
           name: {
              bsonType: "string",
              description: "required and must be a string" },
           age: {
              bsonType: "int",
              minimum: 0,
              maximum: 100,
              description: "required and must be in the range 0-100" }
        }
     }
}})

Note a couple of differences:

  • we don’t need to specify the list of required fields; using additionalProperties: false forces all the fields to be required by default
  • we need to put explicitly even the _id field

As you can notice in the following test, we are no longer allowed to add extra fields. We are forced to insert documents always with the same two fields name and age.

MongoDB > db.people2.insert( {name : "George", age: NumberInt(30)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people2.insert( {name : "Maria", age: NumberInt(35), surname: "Peterson"} )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In this case we don’t have flexibility, and that is the main benefit of having a NoSQL database like MongoDB.

Well, it’s up to you to use it or not. It depends on the nature and goals of your application. I wouldn’t recommend it in most cases.

Add validation to existing collections

We have seen so far how to create a new collection with validation rules, But what about the existing collections? How can we add rules?

This is quite trivial. The syntax to use in $jsonSchema remains the same, we just need to use the collMod command instead of createCollection. The following example shows how to create validation rules on an existing collection.

First we create a simple new collection people3, inserting some documents.

MongoDB > db.people3.insert( {name: "Corrado", surname: "Pandiani", year_of_birth: NumberLong(1971)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Tom", surname: "Cruise", year_of_birth: NumberLong(1961), gender: "M"} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Kevin", surname: "Bacon", year_of_birth: NumberLong(1964), gender: "M", shirt_size: "L"} )
WriteResult({ "nInserted" : 1 })

Let’s create the validator.

MongoDB > db.runCommand( { collMod: "people3",
   validator: {
      $jsonSchema : {
         bsonType: "object",
         required: [ "name", "surname", "gender" ],
         properties: {
            name: {
               bsonType: "string",
               description: "required and must be a string" },
            surname: {
               bsonType: "string",
               description: "required and must be a string" },
            gender: {
               enum: [ "M", "F" ],
               description: "required and must be M or F" }
         }
       }
},
validationLevel: "moderate",
validationAction: "warn"
})

The two new options validationLevel and validationAction are important in this case.

validationLevel can have the following values:

  • “off” : validation is not applied
  • “strict”: it’s the default value. Validation applies to all inserts and updates
  • “moderated”: validation applies to all the valid existing documents. Not valid documents are ignored.

When creating validation rules on existing collections, the “moderated” value is the safest option.

validationAction can have the following values:

  • “error”: it’s the default value. The document must pass the validation in order to be written
  • “warn”: a document that doesn’t pass the validation is written but a warning message is logged

When adding validation rules to an existing collection the safest option is “warn”

These two options can be applied even with createCollection. We didn’t use them because the default values are good in most of the cases.

How to investigate a collection definition

In case we want to see how a collection was defined, and, in particular, what the validator rules are, the command db.getCollectionInfos() can be used. The following example shows how to investigate the “schema” we have created for the people collection.

MongoDB > db.getCollectionInfos( {name: "people"} )
[
  {
    "name" : "people",
    "type" : "collection",
    "options" : {
      "validator" : {
        "$jsonSchema" : {
          "bsonType" : "object",
          "required" : [
            "name",
            "surname",
            "email"
          ],
          "properties" : {
            "name" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "surname" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "email" : {
              "bsonType" : "string",
              "pattern" : "^.+@.+$",
              "description" : "required and must be a valid email address"
             },
             "year_of_birth" : {
               "bsonType" : "int",
               "minimum" : 1900,
               "maximum" : 2018,
               "description" : "the value must be in the range 1900-2018"
             },
             "gender" : {
               "enum" : [
                 "M",
                 "F"
               ],
             "description" : "can be only M or F"
        }
      }
    }
  }
},
"info" : {
  "readOnly" : false,
  "uuid" : UUID("5b98c6f0-2c9e-4c10-a3f8-6c1e7eafd2b4")
},
"idIndex" : {
  "v" : 2,
  "key" : {
    "_id" : 1
  },
"name" : "_id_",
"ns" : "test.people"
}
}
]

Limitations and restrictions

Validators cannot be defined for collections in the following databases: admin, local, config.

Validators cannot be defined for system.* collections.

A limitation in the current implementation of JSON Schema Validator is that the error messages are not very good in terms of helping you to understand which of the rules are not satisfied by the document. This should be confirmed manually by doing some tests, and that’s not so easy when dealing with complex documents. Having more specific error strings, hopefully taken from the validator definition, could be very useful when debugging application errors and warnings. This is definitely something that should be improved in the next releases.

While waiting for improvements, someone has developed a wrapper for the mongo client to gather more defined error strings. You can have a look at https://www.npmjs.com/package/mongo-schemer. You can test it and use it, but pay attention to the clause “Running in prod is not recommended due to the overhead of validating documents against the schema“.

Conclusions

Doing schema validation in the application remains, in general, a best practice, but JSON Schema Validator is a good tool to enforce validation directly into the database.

Hence even though it needs some improvements, the JSON Schema feature is good enough for most of the common cases. We suggest to test it and use it when you really need to create a backbone structure for your data.

While you are here…

You might also enjoy these other articles about MongoDB 3.6

 

The post MongoDB: how to use the JSON Schema Validator appeared first on Percona Database Performance Blog.

Aug
15
2018
--

Webinar Thurs 16/8: Developing an App on MongoDB: Tips and Tricks

Please join Percona’s Sr. Technical Operations Architect Tim Vaillancourt as he presents Developing an App on MongoDB: Tips and Tricks on Thursday, August 16th, 2018, at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

A lot of developers prefer using MongoDB to other open source databases when developing applications. But why? How do you work with MongoDB to create a well-functioning application?

This webinar will help developers understand what MongoDB does and how it processes requests from applications.

In this webinar, we will cover:

  • Data, Queries and Indexes
  • Using indices efficiently
  • Reducing index and storage size with correct data types
  • The aggregation framework
  • Using the Explain and Operation Profiling features
  • MongoDB features to avoid
  • Using Read and Write Concerns for Integrity
  • Performance
  • Scaling read queries using Read Preference
  • What is MongoDB Sharding?
  • Using Percona Monitoring and Management (PMM) to visualize database usage
  • MongoDB users and built-in roles for application security
  • Using SRV DNS record support

By the end of the lesson, you will know how to avoid common problems with MongoDB in the application stage, instead of fixing it in production.

Register Now

The post Webinar Thurs 16/8: Developing an App on MongoDB: Tips and Tricks appeared first on Percona Database Performance Blog.

Jun
14
2018
--

What is the Top Cause of Application Downtime Today?

Application outages lurking monster

Application outages lurking monsterI frequently talk to our customer base about what keeps them up at night. While there is a large variance of answers, they tend to fall into one of two categories. The first is the conditioned fear of some monster lurking behind the scenes that could pounce at any time. The second, of course, is the actual monster of downtime on a critical system. Ask most tech folks and they will tell you outages seem to only happen late at night or early in the morning. And that they do keep them up.

Entire companies and product lines have been built around providing those in the IT world with some ability to sleep at night. Modern enterprises have spent millions to mitigate the risk and prevent their businesses from having a really bad day because of an outage. Cloud providers are attuned to the downtime dilemma and spend lots of time, money, and effort to build in redundancy and make “High Availability” (HA) as easy as possible. The frequency of “hardware” or server issues continues to dwindle.

Where does the downtime issue start?

In my discussions, most companies I have talked to say their number one cause of outages and customer interruptions is ultimately related to the deployment of new or upgraded code. Often I hear the operations team has little or no involvement with an application until it’s put into production. It is a bit ironic that this is also the area where companies tend to drastically under-invest. They opt instead to invest in ways to “Scale Out or Up”. Or perhaps how to survive asteroids hitting two out three of their data centers.

Failing over broken or slow code from one server to another does not fix it. Adding more servers to distribute the load can mitigate a problem, but can also escalate the cost dramatically. In most cases, the solutions they apply don’t address the primary cause of the problems.

While there are some fantastic tools out there that can help with getting better visibility into code level issues — such as New Relic, AppDynamics and others — the real problem is that these often end up being used to diagnose issues after they have appeared in production. Most companies carry out some amount of testing before releasing code, but typically it is a fraction of what they should be doing. Working for a company that specializes in open source databases, we get a lot of calls on issues that have prevented companies’ end users from using critical applications. Many of these problems are fixable before they cost a loss of revenue and reputation.

I think it’s time technology companies start to rethink our QA, Testing, and Pre-Deployment requirements. How much time, effort, and money can we save if we catch these “monsters” before they make it into production?

Not to mention how much better our operations team will sleep . . .

The post What is the Top Cause of Application Downtime Today? appeared first on Percona Database Performance Blog.

Jun
13
2018
--

Zone Based Sharding in MongoDB

MongoDB shard zones

MongoDB shard zonesIn this blog post, we will discuss about how to use zone based sharding to deploy a sharded MongoDB cluster in a customized manner so that the queries and data will be redirected per geographical groupings. This feature of MongoDB is a part of its Data Center Awareness, that allows queries to be routed to particular MongoDB deployments considering physical locations or configurations of mongod instances.

Before moving on, let’s have an overview of this feature. You might already have some questions about zone based sharding. Was it recently introduced? If zone-based sharding is something we should use, then what about tag-aware sharding?

MongoDB supported tag-aware sharding from even the initial versions of MongoDB. This means tagging a range of shard keys values, associating that range with a shard, and redirecting operations to that specific tagged shard. This tag-aware sharding, since version 3.4, is referred to as ZONES. So, the only change is its name, and this is the reason sh.addShardTag(shard, tag) method is being used.

How it works

  1. With the help of a shard key, MongoDB allows you to create zones of sharded data – also known as shard zones.
  2. Each zone can be associated with one or more shards.
  3. Similarly, a shard can associate with any number of non-conflicting zones.
  4. MongoDB migrates chunks to the zone range in the selected shards.
  5. MongoDB routes read and write to a particular zone range that resides in particular shards.

Useful for what kind of deployments/applications?

  1. In cases where data needs to be routed to a particular shard due to some hardware configuration restrictions.
  2. Zones can be useful if there is the need to isolate specific data to a particular shard. For example, in the case of GDPR compliance that requires businesses to protect data and privacy for an individual within the EU.
  3. If an application is being used geographically and you want a query to route to the nearest shards for both reads and writes.

Let’s consider a Scenario

Consider the scenario of a school where students are experts in Biology, but most students are experts in Maths. So we have more data for the maths students compare to Biology students. In this example, deployment requires that Maths students data should route to the shard with the better configuration for a large amount of data. Both read and write will be served by specific shards.  All the Biology students will be served by another shard. To implement this, we will add a tag to deploy the zones to the shards.

For this scenario we have an environment with:

DB: “school”

Collection: “students”

Fields: “sId”, “subject”, “marks” and so on..

Indexed Fields: “subject” and “sId”

We enable sharding:

sh.enableSharding("school")

And create a shardkey: “subject” and “sId” 

sh.shardCollection("school.students", {subject: 1, sId: 1});

We have two shards in our test environment

shards:

{  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1 }
{  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1 }

Zone Deployment

1) Disable balancer

To prevent migration of the chunks across the cluster, disable the balancer for the “students” collection:

mongos> sh.disableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Before proceeding further make sure the balancer is not running. It is not a mandatory process, but it is always a good practice to make sure no migration of chunks takes place while configuring zones

mongos> sh.isBalancerRunning()
false

2) Add shard to the zone

A zone can be associated with a particular shard in the form of a tag, using the sh.addShardTag(), so a tag will be added to each shard. Here we are considering two zones so the tags “MATHS” and “BIOLOGY” need to be added.

mongos> sh.addShardTag( "shard0000" , "MATHS");
{ "ok" : 1 }
mongos> sh.addShardTag( "shard0001" , "BIOLOGY");
{ "ok" : 1 }

We can see zones are assigned in the form of tags as required against each shard.

mongos> sh.status()
 shards:
        {  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1,  "tags" : [ "MATHS" ] }
        {  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1,  "tags" : [ "BIOLOGY" ] }

3) Define ranges for each zone

Each zone covers one or more ranges of shard key values. Note: each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.

mongos> sh.addTagRange(
	"school.students",
	{ "subject" : "maths", "sId" : MinKey},
	{ "subject" : "maths", "sId" : MaxKey},
	"MATHS"
)
{ "ok" : 1 }
mongos> sh.addTagRange(
	"school.students",
	{ "subject" : "biology", "sId" : MinKey},
	{ "subject" : "biology", "sId" : MaxKey},
"BIOLOGY"
)
{ "ok" : 1 }

4) Enable balancer

Now enable the balancer so the chunks will migrate across the shards as per the requirement and all the read and write queries will be routed to the particular shards.

mongos> sh.enableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
mongos> sh.isBalancerRunning()
true

Let’s check how documents get routed as per the tags:

We have inserted 6 documents, 4 documents with “subject”:”maths” and 2 documents with “subject”:”biology”

mongos> db.students.find({"subject":"maths"}).count()
4
mongos> db.students.find({"subject":"biology"}).count()
2

Checking the shard distribution for the students collection:

mongos> db.students.getShardDistribution()
Shard shard0000 at 127.0.0.1:27003
data : 236B docs : 4 chunks : 4
estimated data per chunk : 59B
estimated docs per chunk : 1
Shard shard0001 at 127.0.0.1:27004
data : 122B docs : 2 chunks : 1
estimated data per chunk : 122B
estimated docs per chunk : 2

So in this test case, all the queries for the students collection have routed as per the tag used, with four documents inserted into shard0000 and two documents inserted to shard0001.

Any queries related to MATHS will route to shard0000 and queries related to BIOLOGY will route to shard0001, hence the load will be distributed as per the configuration of the shard, keeping the database performance optimized.

Sharding MongoDB using zones is a great feature provided by MongoDB. With the help of zones, data can be isolated to the specific shards. Or if we have any kind of hardware or configuration restrictions to the shards, it is a possible solution for routing the operations.

The post Zone Based Sharding in MongoDB appeared first on Percona Database Performance Blog.

Jun
09
2014
--

Architecture and Design of MySQL-powered applications: June 11 Webinar

The architecture of MySQL-powered applications is one of my favorite topics to talk about. It’s a very important topic because if you do not get the architecture right then you’re very likely to fail with your project – either from the standpoint of failing with performance, high availability or security requirements… or failing to deliver on time and at the planned cost.

It’s also a great topic because there is so much knowledge available these days about MySQL-powered applications. MySQL has been around for a rather long time compared with many other solutions – and now we know what architectures have enabled people to build successful MySQL-powered applications and grow them to hundreds of millions of users as well as what applications did not work out.

This level of maturity really allows us to essentially take “off-the-shelf” MySQL architectures that can be used to build very successful applications with very low risk, at least when it comes to the database backend.

Architecture and Design of MySQL Powered Applications: June 11 webinar

June 11 at 10 a.m. Pacific

On Wednesday, June 11 at 10 a.m. Pacific I’ll be leading  a webinar titled, “Architecture and Design of MySQL Powered Applications” during which I’ll be covering a lot of high-level scenarios such as “architectures that have been proven to be successful.” I’ll also discuss the architectures best suited for different kinds of applications and different kind of teams. Additionally, I’ll address what’s changed in the MySQL space in the last few years and how those changes have impacted modern MySQL architecture design approaches.

I hope to see you June 11. Register now to reserve your spot!

The post Architecture and Design of MySQL-powered applications: June 11 Webinar appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com