Percona Live 2019 – Save the Date!

Austin Texas

Austin State Capitol

After much speculation following the announcement in Santa Clara earlier this year, we are delighted to announce Percona Live 2019 will be taking place in Austin, Texas.

Save the dates in your diary for May, 28-30 2019!

The conference will take place just after Memorial Day at The Hyatt Regency, Austin on the shores of Lady Bird Lake.

This is also an ideal central location for those who wish to extend their stay and explore what Austin has to offer! Call for papers, ticket sales and sponsorship opportunities will be announced soon, so stay tuned!

In other Percona Live news, we’re less than 4 weeks away from this year’s European conference taking place in Frankfurt, Germany on 5-7 November. The tutorials and breakout sessions have been announced, and you can view the full schedule here. Tickets are still on sale so don’t miss out, book yours here today!



Percona Monitoring and Management (PMM) 1.15.0 Is Now Available

Percona Monitoring and Management

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL® and MongoDB® performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL® and MongoDB® servers to ensure that your data works as efficiently as possible.

Percona Monitoring and Management

This release offers two new features for both the MySQL Community and Percona Customers:

  • MySQL Custom Queries – Turn a SELECT into a dashboard!
  • Server and Client logs – Collect troubleshooting logs for Percona Support

We addressed 17 new features and improvements, and fixed 17 bugs.

MySQL Custom Queries

In 1.15 we are introducing the ability to take a SQL SELECT statement and turn the result set into metric series in PMM.  The queries are executed at the LOW RESOLUTION level, which by default is every 60 seconds.  A key advantage is that you can extend PMM to profile metrics unique to your environment (see users table example), or to introduce support for a table that isn’t part of PMM yet. This feature is on by default and only requires that you edit the configuration file and use vaild YAML syntax.  The configuration file is in /usr/local/percona/pmm-client/queries-mysqld.yml.

Example – Application users table

We’re going to take a fictional MySQL users table that also tracks the number of upvotes and downvotes, and we’ll convert this into two metric series, with a set of seven labels, where each label can also store a value.

Browsing metrics series using Advanced Data Exploration Dashboard

Lets look at the output so we understand the goal – take data from a MySQL table and store in PMM, then display as a metric series.  Using the Advanced Data Exploration Dashboard you can review your metric series. Exploring the metric series  app1_users_metrics_downvotes we see the following:

PMM Advanced Data Exploration Dashboard

MySQL table

Lets assume you have the following users table that includes true/false, string, and integer types.

SELECT * FROM `users`
| id | app  | user_type    | last_name | first_name | logged_in | active_subscription | banned | upvotes | downvotes |
|  1 | app2 | unprivileged | Marley    | Bob        |         1 |                   1 |      0 |     100 |        25 |
|  2 | app3 | moderator    | Young     | Neil       |         1 |                   1 |      1 |     150 |        10 |
|  3 | app4 | unprivileged | OConnor   | Sinead     |         1 |                   1 |      0 |      25 |        50 |
|  4 | app1 | unprivileged | Yorke     | Thom       |         0 |                   1 |      0 |     100 |       100 |
|  5 | app5 | admin        | Buckley   | Jeff       |         1 |                   1 |      0 |     175 |         0 |

Explaining the YAML syntax

We’ll go through a simple example and mention what’s required for each line.  The metric series is constructed based on the first line and appends the column name to form metric series.  Therefore the number of metric series per table will be the count of columns that are of type GAUGE or COUNTER.  This metric series will be called app1_users_metrics_downvotes:

app1_users_metrics:                                 ## leading section of your metric series.
  query: "SELECT * FROM app1.users"                 ## Your query. Don't forget the schema name.
  metrics:                                          ## Required line to start the list of metric items
    - downvotes:                                    ## Name of the column returned by the query. Will be appended to the metric series.
        usage: "COUNTER"                            ## Column value type.  COUNTER will make this a metric series.
        description: "Number of upvotes"            ## Helpful description of the column.

Full queries-mysqld.yml example

Each column in the SELECT is named in this example, but that isn’t required, you can use a SELECT * as well.  Notice the format of schema.table for the query is included.

  query: "SELECT app,first_name,last_name,logged_in,active_subscription,banned,upvotes,downvotes FROM app1.users"
    - app:
        usage: "LABEL"
        description: "Name of the Application"
    - user_type:
        usage: "LABEL"
        description: "User's privilege level within the Application"
    - first_name:
        usage: "LABEL"
        description: "User's First Name"
    - last_name:
        usage: "LABEL"
        description: "User's Last Name"
    - logged_in:
        usage: "LABEL"
        description: "User's logged in or out status"
    - active_subscription:
        usage: "LABEL"
        description: "Whether User has an active subscription or not"
    - banned:
        usage: "LABEL"
        description: "Whether user is banned or not"
    - upvotes:
        usage: "COUNTER"
        description: "Count of upvotes the User has earned.  Upvotes once granted cannot be revoked, so the number can only increase."
    - downvotes:
        usage: "GAUGE"
        description: "Count of downvotes the User has earned.  Downvotes can be revoked so the number can increase as well as decrease."

We hope you enjoy this feature, and we welcome your feedback via the Percona forums!

Server and Client logs

We’ve enhanced the volume of data collected from both the Server and Client perspectives.  Each service provides a set of files designed to be shared with Percona Support while you work on an issue.


From the Server, we’ve improved the logs.zip service to include:

  • Prometheus targets
  • Consul nodes, QAN API instances
  • Amazon RDS and Aurora instances
  • Version
  • Server configuration
  • Percona Toolkit commands

You retrieve the link from your PMM server using this format:   https://pmmdemo.percona.com/managed/logs.zip


On the Client side we’ve added a new action called summary which fetches logs, network, and Percona Toolkit output in order to share with Percona Support. To initiate a Client side collection, execute:

pmm-admin summary

The output will be a file you can use to attach to your Support ticket.  The single file will look something like this:


New Features and Improvements

  • PMM-2913 – Provide ability to execute Custom Queries against MySQL – Credit to wrouesnel for the framework of this feature in wrouesnel/postgres_exporter!
  • PMM-2904 – Improve PMM Server Diagnostics for Support
  • PMM-2860 – Improve pmm-client Diagnostics for Support
  • PMM-1754Provide functionality to easily select query and copy it to clipboard in QAN
  • PMM-1855Add swap to AMI
  • PMM-3013Rename PXC Overview graph Sequence numbers of transactions to IST Progress
  • PMM-2726 – Abort data collection in Exporters based on Prometheus Timeout – MySQLd Exporter
  • PMM-3003 – PostgreSQL Overview Dashboard Tooltip fixes
  • PMM-2936Some improvements for Query Analytics Settings screen
  • PMM-3029PostgreSQL Dashboard Improvements

Fixed Bugs

  • PMM-2976Upgrading to PMM 1.14.x fails if dashboards from Grafana 4.x are present on an installation
  • PMM-2969rds_exporter becomes throttled by CloudWatch API
  • PMM-1443The credentials for a secured server are exposed without explicit request
  • PMM-3006Monitoring over 1000 instances is displayed imperfectly on the label
  • PMM-3011PMM’s default MongoDB DSN is localhost, which is not resolved to IPv4 on modern systems
  • PMM-2211Bad display when using old range in QAN
  • PMM-1664Infinite loading with wrong queryID
  • PMM-2715Since pmm-client-1.9.0, pmm-admin detects CentOS/RHEL 6 installations using linux-upstart as service manager and ignores SysV scripts
  • PMM-2839Tablestats safety precaution does not work for RDS/Aurora instances
  • PMM-2845pmm-admin purge causes client to panic
  • PMM-2968pmm-admin list shows empty data source column for mysql:metrics
  • PMM-3043 Total Time percentage is incorrectly shown as a decimal fraction
  • PMM-3082Prometheus Scrape Interval Variance chart doesn’t display data

How to get PMM Server

PMM is available for installation using three methods:

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.


MongoDB Replica set Scenarios and Internals

MongoDB replica sets replication internals r

MongoDB replica sets replication internals rThe MongoDB® replica set is a group of nodes with one set as the primary node, and all other nodes set as secondary nodes. Only the primary node accepts “write” operations, while other nodes can only serve “read” operations according to the read preferences defined. In this blog post, we’ll focus on some MongoDB replica set scenarios, and take a look at the internals.

Example configuration

We will refer to a three node replica set that includes one primary node and two secondary nodes running as:

"members" : [
"_id" : 0,
"name" : "",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3533,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"electionTime" : Timestamp(1537797392, 2),
"electionDate" : ISODate("2018-09-24T13:56:32Z"),
"configVersion" : 3,
"self" : true
"_id" : 1,
"name" : "",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 3063,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.664Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "",
"configVersion" : 3
"_id" : 2,
"name" : "",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2979,
"optime" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
"optimeDurable" : {
"ts" : Timestamp(1537800584, 1),
"t" : NumberLong(1)
"optimeDate" : ISODate("2018-09-24T14:49:44Z"),
"optimeDurableDate" : ISODate("2018-09-24T14:49:44Z"),
"lastHeartbeat" : ISODate("2018-09-24T14:49:45.539Z"),
"lastHeartbeatRecv" : ISODate("2018-09-24T14:49:44.989Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "",
"configVersion" : 3

Here, the primary is running on port 25001, and the two secondaries are running on ports 25002 and 25003 on the same host.

Secondary nodes can only sync from Primary?

No, it’s not mandatory. Each secondary can replicate data from the primary or any other secondary to the node that is syncing. This term is also known as chaining, and by default, this is enabled.

In the above replica set, you can see that secondary node


  is syncing from another secondary node



"syncingTo" : "" 

This can also be found in the logs as here the parameter

chainingAllowed :true

   is the default setting.

settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ba8ed10d4fddccfedeb7492') } }


That means that a secondary member node is able to replicate from another secondary member node instead of from the primary node. This helps to reduce the load from the primary. If the replication lag is not tolerable, then chaining could be disabled.

For more details about chaining and the steps to disable it please refer to my earlier blog post here.

Ok, then how does the secondary node select the source to sync from?

If Chaining is False

When chaining is explicitly set to be false, then the secondary node will sync from the primary node only or could be overridden temporarily.

If Chaining is True

  • Before choosing any sync node, TopologyCoordinator performs validations like:
    • Whether chaining is set to true or false.
    • If that particular node is part of the current replica set configurations.
    • Identify the node ahead with oplog with the lowest ping time.
    • The source code that includes validation is here.
  • Once the validation is done, SyncSourceSelector relies on SyncSourceResolver which contains the result and details for the new sync source
  • To get the details and response, SyncSourceResolver coordinates with ReplicationCoordinator
  • This ReplicationCoordinator is responsible for the replication, and co-ordinates with TopologyCoordinator
  • The TopologyCoordinator is responsible for topology of the cluster. It finds the primary oplog time and checks for the maxSyncSourceLagSecs
  • It will reject the source to sync from if the maxSyncSourceLagSecs  is greater than the newest oplog entry. The code for this can be found here
  • If the criteria for the source selection is not fulfilled, then BackgroundSync thread waits and restarts the whole process again to get the sync source.

Example for “unable to find a member to sync from” then, in the next attempt, finding a candidate to sync from

This can be found in the log like this. On receiving the message from rsBackgroundSync thread

could not find member to sync from

, the whole internal process restarts and finds a member to sync from i.e.

sync source candidate:

, which means it is now syncing from node running on port 25001.

2018-09-24T13:58:43.197+0000 I REPL     [rsSync] transition to RECOVERING
2018-09-24T13:58:43.198+0000 I REPL     [rsBackgroundSync] could not find member to sync from
2018-09-24T13:58:43.201+0000 I REPL     [rsSync] transition to SECONDARY
2018-09-24T13:58:59.208+0000 I REPL     [rsBackgroundSync] sync source candidate:

  • Once the sync source node is selected, SyncSourceResolver probes the sync source to confirm that it is able to fetch the oplogs.
  • RollbackID is also fetched i.e. rbid  after the first batch is returned by oplogfetcher.
  • If all eligible sync sources are too fresh, such as during initial sync, then the syncSourceStatus Oplog start is missing and earliestOpTimeSeen will set a new minValid.
  • This minValid is also set in the case of rollback and abrupt shutdown.
  • If the node has a minValid entry then this is checked for the eligible sync source node.

Example showing the selection of a new sync source when the existing source is found to be invalid

Here, as the logs show, during sync the node chooses a new sync source. This is because it found the original sync source is not ahead, so not does not contain recent oplogs from which to sync.

2018-09-25T15:20:55.424+0000 I REPL     [replication-1] Choosing new sync source because our current sync source,, has an OpTime ({ ts: Timestamp 1537879296000|1, t: 4 }) which is not ahead of ours ({ ts: Timestamp 1537879296000|1, t: 4 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)

2018-09-25T15:20:55.425+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source (config version: 3; last applied optime: { ts: Timestamp 1537879296000|1, t: 4 }; sync source index: -1; primary index: -1) is no longer valid

  • If the secondary node is too far behind the eligible sync source node, then the node will enter maintenance node and then resync needs to be call manually.
  • Once the sync source is chosen, BackgroundSync starts oplogFetcher.

Example for oplogFetcher

Here is an example of fetching oplog from the “oplog.rs” collection, and checking for the greater than required timestamp.

2018-09-26T10:35:07.372+0000 I COMMAND  [conn113] command local.oplog.rs command: getMore { getMore: 20830044306, collection: "oplog.rs", maxTimeMS: 5000, term: 7, lastKnownCommittedOpTime: { ts: Timestamp 1537955038000|1, t: 7 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1537903865000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 7, readConcern: { afterOpTime: { ts: Timestamp 1537903865000|1, t: 6 } } } planSummary: COLLSCAN cursorid:20830044306 keysExamined:0 docsExamined:0 numYields:1 nreturned:0 reslen:451 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } protocol:op_command 3063398ms

When and what details replica set nodes communicate with each other?

At a regular interval, all the nodes communicate with each other to check the status of the primary node, check the status of the sync source, to get the oplogs and so on.

ReplicationCoordinator has ReplicaSetConfig that has a list of all the replica set nodes, and each node has a copy of it. This makes nodes aware of other nodes under same replica set.

This is how nodes communicate in more detail:

Heartbeats: This checks the status of other nodes i.e. alive or die

heartbeatInterval: Every node, at an interval of two seconds, sends the other nodes a heartbeat to make them aware that “yes I am alive!”

heartbeatTimeoutSecs: This is a timeout, and means that if the heartbeat is not returned in 10 seconds then that node is marked as inaccessible or simply die.

Every heartbeat is identified by these replica set details:

  • replica set config version
  • replica set name
  • Sender host address
  • id from the replicasetconfig

The source code could be referred to from here.

When the remote node receives the heartbeat, it processes this data and validates if the details are correct. It then prepares a ReplSetHeartbeatResponse, that includes:

  • Name of the replica set, config version, and optime details
  • Details about primary node as per the receiving node.
  • Sync source details and state of receiving node

This heartbeat data is processed, and if primary details are found then the election gets postponed.

TopologyCoordinator checks for the heartbeat data and confirms if the node is OK or NOT. If the node is OK then no action is taken. Otherwise it needs to be reconfigured or else initiate a priority takeover based on the config.

Response from oplog fetcher

To get the oplogs from the sync source, nodes communicate with each other. This oplog fetcher fetches oplogs through “find” and “getMore”. This will only affect the downstream node that gets metadata from its sync source to update its view from the replica set.

OplogQueryMetadata only comes with OplogFetcher responses

OplogQueryMetadata comes with OplogFetcher response and ReplSetMetadata comes with all the replica set details including configversion and replication commands.

Communicate to update Position commands:

This is to get an update for replication progress. ReplicationCoordinatorExternalState creates SyncSourceFeedback sends replSetUpdatePosition commands.

It includes Oplog details, Replicaset config version, and replica set metadata.

If a new node is added to the existing replica set, how will that node get the data?

If a new node is added to the existing replica set then the “initial sync” process takes place. This initial sync can be done in two ways:

  1. Just add the new node to the replicaset and let initial sync threads restore the data. Then it syncs from the oplogs until it reaches the secondary state.
  2. Copy the data from the recent data directory to the node, and restart this new node. Then it will also sync from the oplogs until it reaches the secondary state.

This is how it works internally

When “initial sync” or “rsync” is called by ReplicationCoordinator  then the node goes to “STARTUP2” state, and this initial sync is done in DataReplicator

  • A sync source is selected to get the data from, then it drops all the databases except the local database, and oplogs are recreated.
  • DatabasesCloner asks syncsource for a list of the databases, and for each database it creates DatabaseCloner.
  • For each DatabaseCloner it creates CollectionCloner to clone the collections
  • This CollectionCloner calls ListIndexes on the syncsource and creates a CollectionBulkLoader for parallel index creation while data cloning
  • The node also checks for the sync source rollback id. If rollback occurred, then it restarts the initial sync. Otherwise, datareplicator is done with its work and then replicationCoordinator assumes the role for ongoing replication.

Example for the “initial sync” :

Here node enters  

"STARTUP2"- "transition to STARTUP2"

Then sync source gets selected and drops all the databases except the local database.  Next, replication oplog is created and CollectionCloner is called.

Local database not dropped: because every node has its own “local” database with its own and other nodes’ information, based on itself, this database is not replicated to other nodes.

2018-09-26T17:57:09.571+0000 I REPL     [ReplicationExecutor] transition to STARTUP2
2018-09-26T17:57:14.589+0000 I REPL     [replication-1] sync source candidate:
2018-09-26T17:57:14.590+0000 I STORAGE  [replication-1] dropAllDatabasesExceptLocal 1
2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB... 2018-09-26T17:57:14.633+0000 I REPL     [replication-0] CollectionCloner::start called, on ns:admin.system.version

Finished fetching all the oplogs, and finishing up initial sync.

2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Finished fetching oplog during initial sync: CallbackCanceled: Callback canceled. Last fetched optime and hash: { ts: Timestamp 1537984626000|1, t: 9 }[-1139925876765058240]
2018-09-26T17:57:15.685+0000 I REPL     [replication-0] Initial sync attempt finishing up.

What are oplogs and where do these reside?

oplogs stands for “operation logs”. We have used this term so many times in this blog post as these are the mandatory logs for the replica set. These operations are in the capped collection called “oplog.rs”  that resides in “local” database.

Below, this is how oplogs are stored in the collection “oplog.rs” that includes details for timestamp, operations, namespace, output.

rplint:PRIMARY> use local
rplint:PRIMARY> show collections
rplint:PRIMARY> db.oplog.rs.findOne()
 "ts" : Timestamp(1537797392, 1),
 "h" : NumberLong("-169301588285533642"),
 "v" : 2,
 "op" : "n",
 "ns" : "",
 "o" : {
 "msg" : "initiating set"

It consists of rolling update operations coming to the database. Then these oplogs replicate to the secondary node(s) to maintain the high availability of the data in case of failover.

When the replica MongoDB instance starts, it creates an oplog ocdefault size. For Wired tiger, the default size is 5% of disk space, with a lower bound size of 990MB. So here in the example it creates 990MB of data. If you’d like to learn more about oplog size then please refer here

2018-09-26T17:57:14.592+0000 I REPL     [replication-1] creating replication oplog of size: 990MB...

What if the same oplog is applied multiple times, will that not lead to inconsistent data?

Fortunately, oplogs are Idempotent that means the value will remain unchanged, or will provide the same output, even when applied multiple times.

Let’s check an example:

For the $inc operator that will increment the value by 1 for the filed “item”, if this oplog is applied multiple times then the result might lead to an inconsistent record if this is not Idempotent. However, rather than increasing the item value multiple times, it is actually applied only once.

rplint:PRIMARY> use db1
//inserting one document
rplint:PRIMARY> db.col1.insert({item:1, name:"abc"})
//updating document by incrementing item value with 1
rplint:PRIMARY> db.col1.update({name:"abc"},{$inc:{item:1}})
//updated value is now item:2
rplint:PRIMARY> db.col1.find()
{ "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 2, "name" : "abc" }

This is how these operations are stored in oplog, here this $inc value is stored in oplog as $set

rplint:PRIMARY> db.oplog.rs.find({ns:"db1.col1"})
//insert operation
{ "ts" : Timestamp(1537987964, 2), "t" : NumberLong(9), "h" : NumberLong("8083740413874479202"), "v" : 2, "op" : "i", "ns" : "db1.col1", "o" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16"), "item" : 1, "name" : "abc" } }
//$inc operation is changed as ""$set" : { "item" : 2"
{ "ts" : Timestamp(1537988022, 1), "t" : NumberLong(9), "h" : NumberLong("-1432987813358665721"), "v" : 2, "op" : "u", "ns" : "db1.col1", "o2" : { "_id" : ObjectId("5babd57cce2ef78096ac8e16") }, "o" : { "$set" : { "item" : 2 } } }

That means that however many  times it is applied, it will generate the same results, so no inconsistent data!

I hope this blog post helps you to understand multiple scenarios for MongoDB replica sets, and how data replicates to the nodes.


Upcoming Webinar Thurs 10/11: Build Highly Scalable IoT Architectures with Percona Server for MongoDB

Highly Scalable IoT Architectures

Highly Scalable IoT ArchitecturesPlease join Percona’s Product Manager for Percona Server for MongoDB, Jeff Sandstrom; Sr. Tech Ops Architect for MongoDB, Tim Vaillancourt; and Mesosphere’s Senior Director of Community and Evangelism, Matt Jarvis, on Thursday, October 11, 2018 at 10:00 AM PDT (UTC–7) / 1:00 PM EDT (UTC–4), as they demonstrate how to build highly scalable Internet of Things architectures with Percona Server for MongoDB on DC/OS.


Percona Server for MongoDB is a free and open-source drop-in replacement for MongoDB Community Edition. It combines all the features and benefits of MongoDB Community Edition with enterprise-class features from Percona, including an in-memory engine, log redaction, auditing, and hot backups.

Mesosphere DC/OS is an enterprise-grade, datacenter-scale operating system, providing a single platform for running containers, data services, and distributed applications on a single unified computing environment.

In this webinar, we’ll:

  • Review the benefits of Percona Server for MongoDB
  • Discuss a variety of use cases for Percona Server for MongoDB on DC/OS
  • Demonstrate exactly how you can use Percona Server for MongoDB on DC/OS to capture data from Internet of Things devices
  • Tell you how you can participate in the beta program for this exciting solution

Register for this webinar to learn how to build highly scalable IoT architectures with Percona Server for MongoDB on DC/OS.


MongoDB-Disable Chained Replication

MongoDB chained replication

MongoDB chained replicationIn this blog post, we will learn what MongoDB chained replication is, why you might choose to disable it, and the steps you need to take to do so.

What is chain replication?

Chain Replication in MongoDB, as the name suggests, means that a secondary member is able to replicate from another secondary member instead of a primary.

Default settings

By default, chained replication is enabled in MongoDB. It helps to reduce the load from the primary but it may lead to a replication lag. When enabled, the secondary node selects its target using the ping time for the closest node.

Reasons to disable chained replication

The main reason to disable chain replication is replication lag. In other words, the length of the delay between MongoDB writing an operation on the primary and replicating the same operation to the secondary.

In either case—chained replication enabled or disabled—replication works in the same way when the primary node fails: the secondary server will promote to the primary. Therefore, writing and reading of data from the application is not affected.

Steps to disable chained replication

1) Check the current status of chained replication in replica set configuration for “settings” like this:

PRIMARY> rs.config().settings
"chainingAllowed" : true,

2) Disable chained replication, set “chainingAllowed” to false and then reconfig to implement changes.

PRIMARY> cg = rs.config()
PRIMARY> cg.settings.chainingAllowed = false
PRIMARY> rs.reconfig(cg)

3) Check again for the current status of chained replication and its done.

PRIMARY> rs.config().settings
	"chainingAllowed" : false,

Can I override sync source target even after disabling chaining?

Yes, even after you have disabled chained replication, you can still override sync target, though only temporarily. That means it will be overridden until:

  • mognod instance restarts
  • established connection between sync source and secondary node.
  • Additional: if chaining is enabled and sync source falls more than 30 seconds behind another member then the SyncSourceResolver will choose other member having recent oplogs to sync from.

Override sync source

Parameter “replSetSyncFrom” could be used, for example, the secondary node is syncing from host

  and we would like to sync it from

1) Check for the current host it is syncing from:

PRIMARY> rs.status()
			"_id" : 1,
			"name" : "",
			"syncingTo" : "",
			"syncSourceHost" : "",

2) Login to that mongod, and execute:

SECONDARY> db.adminCommand( { replSetSyncFrom: "" })

3) Check replica set status again

SECONDARY> rs.status()
			"_id" : 1,
			"name" : "",
			"syncingTo" : "",
			"syncSourceHost" : "",

This is how we can override the sync source in case of testing, maintenance or while the replica is not syncing from the required host.

I hope this blog helps you to understand how to disable chained replication or override the sync source for the specific purpose or reason. The preferred setting of the chainingAllowed parameter is true as it reduces the load from the primary node and also a default setting.


Percona Live Europe 2018 Session Programme Published

PLE 2018 Full Agenda Announced

PLE 2018 Full Agenda AnnouncedOffering over 110 conference sessions across Tuesday, 6 and Wednesday, 7 November, and a full tutorial day on Monday 5 November, we hope you’ll find that this fantastic line up of talks for Percona Live Europe 2018 to be one of our best yet! Innovation in technology continues to arrive at an accelerated rate, and you’ll find plenty to help you connect with the latest developments in open source database technologies at this acclaimed annual event.

Representatives from companies at the leading edge of our industry use the platform offered by Percona Live to showcase their latest developments and share plans for the future. If your career is dependent upon the use of open source database technologies you should not miss this conference!

Conference Session Schedule

Conference sessions will take place on Tuesday and Wednesday, November 6-7 and will feature more than 110 in-depth talks by industry experts. Conference session examples include:

  • Deep Dive on MySQL Databases on Amazon RDS – Chayan Biswas, AWS
  • MySQL 8.0 Performance: Scalability & Benchmarks – Dimitri Kravtchuk, Oracle
  • MySQL 8 New Features: Temptable engine – Pep Pla, Pythian
  • Artificial Intelligence Database Performance Tuning – Roel Van de Paar, Percona
  • pg_chameleon – MySQL to PostgreSQL replica made easy – Federico Campoli, Loxodata
  • Highway to Hell or Stairway to Cloud? – Alexander Kukushkin, Zalando
  • Zero to Serverless in 60 Seconds – Sandor Maurice, AWS
  • A Year in Google Cloud – Carmen Mason, Alan Mason, Vital Source Technologies
  • Advanced MySQL Data at Rest Encryption in Percona Server for MySQL – Iwo Panowicz, Percona, and Bart?omiej Ole?, Severalnines
  • Monitoring Kubernetes with Prometheus – Henri Dubois-Ferriere, Sysdig
  • How We Use and Improve Percona XtraBackup at Alibaba Cloud – Bo Wang, Alibaba Cloud
  • Shard 101 – Adamo Tonete, Percona
  • Top 10 Mistakes When Migrating From Oracle to PostgreSQL – Jim Mlodgenski, AWS
  • Explaining the Postgres Query Optimizer – Bruce Momjian, EnterpriseDB
  • MariaDB 10.3 Optimizer and Beyond – Vicentiu Ciorbaru, MariaDB FoundationHA and Clustering Solution: ProxySQL as an Intelligent Router for Galera and Group Replication – René Cannaò, ProxySQL
  • MongoDB WiredTiger WriteConflicts – Paul Agombin, ObjectRocket
  • PostgreSQL Enterprise Features – Michael Banck, credativ GmbH
  • What’s New in MySQL 8.0 Security – Georgi Kodinov, Oracle
  • The MariaDB Foundation and Security – Finding and Fixing Vulnerabilities the Open Source Way – Otto Kekäläinen, MariaDB Foundation
  • ClickHouse 2018: How to Stop Waiting for Your Queries to Complete and Start Having Fun – Alexander Zaitsev, Altinity
  • Open Source Databases and Non-Volatile Memory – Frank Ober, Intel Memory Group
  • MyRocks Production Case Studies at Facebook – Yoshinori Matsunobu, Facebook
  • Need for Speed: Boosting Apache Cassandra’s Performance Using Netty – Dinesh Joshi, Apache Cassandra
  • Demystifying MySQL Replication Crash Safety – Jean-François Gagné, Messagebird

See the full list of sessions

Tutorial schedule

Tutorials will take place throughout the day on Monday, November 5, 2018. Tutorial session examples include:

  • Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics – Jaime Crespo, Wikimedia Foundation
  • ElasticSearch 101 – Antonios Giannopoulos, ObjectRocket
  • MySQL InnoDB Cluster in a Nutshell: The Saga Continues with 8.0 – Frédéric Descamps, Oracle
  • High Availability PostgreSQL and Kubernetes with Google Cloud – Alexis Guajardo, Google
  • Best Practices for High Availability – Alex Rubin and Alex Poritskiy, Percona

See the full list of tutorials.


We are grateful for the support of our sponsors:

  • Platinum – AWS
  • Silver – Altinity, PingCap
  • Start Up – Severalnines
  • Branding – Intel, Idera
  • Expo – Postgres EU

If you would like to join them Sponsorship opportunities for Percona Live Open Source Database Conference Europe 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

Ready to register? What are you waiting for? Costs will only get higher!
Register now!




Percona Server for MongoDB 3.4.17-2.15 Is Now Available

Percona Server for MongoDB

Percona Server for MongoDB 3.4Percona announces the release of Percona Server for MongoDB 3.4.17-2.15 on September 27, 2018. Download the latest version from the Percona website or the Percona Software Repositories.

Percona Server for MongoDB 3.4 is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.4 Community Edition. It supports MongoDB 3.4 protocols and drivers.

Percona Server for MongoDB extends MongoDB Community Edition functionality by including the Percona Memory Engine and MongoRocks storage engines, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.17. There are no additional improvements or new features on top of those upstream fixes.

The post Percona Server for MongoDB 3.4.17-2.15 Is Now Available appeared first on Percona Database Performance Blog.


Automating MongoDB Log Rotation

MongoDB Log Rotation

MongoDB Log RotationIn this blog post, we will look at how to do MongoDB® log rotation in the right—and simplest—way.

Log writing is important for any application to track history. But when the log file size grows larger, it can cause disk space issues. For database servers especially, it may cause performance issues as the database needs to write to a large file continuously. By scheduling regular log rotation, we can avoid such situations proactively and keep the log file size below a predetermined threshold.

MongoDB Log File

In MongoDB, the log is not rotated automatically so we need to rotate it manually. Usually, the log size of the MongoDB server depends on the level of information configured and the slow log configured. By default, commands taking more than 100ms, or whatever the value set for the slowOpThresholdMs parameter, are written into the MongoDB log file. Let’s see how to automate log rotation on Linux based servers.

Rotate MongoDB Log Methods

The following two methods could be used to rotate the log in MongoDB. The first uses the command shown below from within mongo shell:

> use admin
> db.adminCommand( { logRotate : 1 } )

The alternative is to use SIGUSR1 signal to rotate the logs for a single process in Linux/Unix-based systems:

# kill -SIGUSR1 ${pidof mongod)

The behaviour of log rotation in MongoDB differs according to the value of the parameter logRotate which was introduced in version 3.0 (note that this should not be confused with the logRotate command that we’ve seen above). The two values are:

  • rename – renames the log file and creates a new file specified by the logpath parameter to write further logs
  • reopen – closes and reopens the log file following the typical Linux/Unix log rotation behavior. You also need to enable logAppend if you choose reopen. You should use reopen when using the Linux/Unix log rotate utility to avoid log loss.

In versions 2.6 and earlier, the default behavior when issuing the logRotate command is the same as when using rename, i.e. it renames the original log file from mongod.log to mongod.log.xxxx-xx-xxTxx-xx-xx format where x is filled with the rotation date time, and creates a new log file mongod.log to continue to write logs. You can see an example of this below:

mongodb-osx-x86_64-3.2.19/bin/mongod --fork \
  --dbpath=/usr/local/opt/mongo/data6 \
  --logpath=/usr/local/opt/mongo/data6/mongod.log \
  --logappend --logRotate rename
about to fork child process, waiting until server is ready for connections.
forked process: 59324
child process started successfully, parent exiting
ls -ltrh data6/mongod.log*
-rw-r--r-- 1 vinodhkrish admin 4.9K Sep 14 16:57 mongod.log
MongoDB shell version: 3.2.19
connecting to: test
> db.adminCommand( { logRotate : 1 } )
{ "ok" : 1 }
> exit
ls -ltrh data6/mongod.log*
-rw-r--r-- 1 vinodhkrish admin 4.9K Sep 14 16:57 mongod.log.2018-09-14T11-29-54
-rw-r--r-- 1 vinodhkrish admin 1.0K Sep 14 16:59 mongod.log

Using the above method, we need to compress and move the rotated log file manually. You can of course write a script to automate this. But when using the logrotate=reopen option, the mongod.log is just closed and opened again. In this case, you need to use the command alongside with Linux’s logrotate utility to avoid the loss of log writing in the course of the log rotation operation. We will see more about this in the next section.

Automating MongoDB logRotate using the logrotate utility

I wasn’t a fan of this second method for long time! But MongoDB log rotation seems to work well when using Linux/Unix’s logrotate tool. Now I prefer this approach, since it doesn’t need the complex script writing that’s needed for the first log rotation method described above. Let’s see in detail how to configure log rotation with Linux/Unix’s logrotate utility.

MongoDB 3.x versions

Start MongoDB with the following options:

  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
  logRotate: reopen
  pidFilePath: /var/run/mongodb/mongod.pid

As mentioned in the section about Linux’s logrotation utility, you need to create a separate config file /etc/logrotate.d/mongod.conf for MongoDB’s log file rotation. Add the content shown below into that config file:

/var/log/mongodb/mongod.log {
  size 100M
  rotate 10
  create 640 mongod mongod
    /bin/kill -SIGUSR1 `cat /var/run/mongodb/mongod.pid 2>/dev/null` >/dev/null 2>&1

In this config file, we assume that log path is set as /var/log/mongodb/mongod.log in /etc/mongod.conf file, and instruct Linux’s logrotation  utility to do the following:

  • Check the size, and start rotation if the log file is greater than 100M
  • Move the mongod.log file to mongod.log.1
  • Create a new mongod.log file with mongod permissions
  • Compress the files from mongod.log.2 but retain up to mongod.log.10 as per delaycompress and rotate 10
  • MongoDB continues to write to the old file mongod.log.1 (based on Linux’s inode) – remember that now there is no mongod.log file
  • In postrotate, it sends the kill -SIGUSR1 signal to mongod mentioned with the pid file, and thus mongod creates the mongod.log and starts writing to it. So make sure you have the pid file path set to the same as pidFilepath from the /etc/mongod.conf file

Please test the logrotate manually using the created /etc/logrotate.d/mongod.conf file to make sure it is working as expected. Here’s how:

cd /var/log/mongodb/
ls -ltrh
total 4.0K
-rw-r-----. 1 mongod mongod 1.3K Sep 14 12:45 mongod.log
logrotate -f /etc/logrotate.d/mongod
ls -ltrh
total 8.0K
-rw-r-----. 1 mongod mongod 1.4K Sep 14 12:58 mongod.log.1
-rw-r-----. 1 mongod mongod 1.3K Sep 14 12:58 mongod.log
logrotate -f /etc/logrotate.d/mongod
ls -ltrh
total 12K
-rw-r-----. 1 mongod mongod 491 Sep 14 12:58 mongod.log.2.gz
-rw-r-----. 1 mongod mongod 1.4K Sep 14 12:58 mongod.log.1
-rw-r-----. 1 mongod mongod 1.3K Sep 14 12:58 mongod.log


Adaptations for MongoDB 2.x and earlier, or when using logRotate=rename

Since the introduction of logRotate parameter in MongoDB 3.0, the log rotate script needs an extra step when you are using logRotate=rename or when using <=2.x versions.

Start the MongoDB with the following options (for 2.4 and earlier, ) :


Start MongoDB with the following options YAML format (YAML config introduced from version 2.6) :

  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
  pidFilePath: /var/run/mongodb/mongod.pid

The config file /etc/logrotate.d/mongod.conf for MongoDB’s log file rotation should be set up like this:

/var/log/mongodb/mongod.log {
  size 100M
  rotate 10
  create 640 mongod mongod
    /bin/kill -SIGUSR1 `cat /var/run/mongodb/mongod.pid 2>/dev/null` >/dev/null 2>&1
    find /var/log/mongodb -type f -size 0 -regextype posix-awk \
-regex "^\/var\/log\/mongodb\/mongod\.log\.[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}-[0-9]{2}-[0-9]{2}$" \
-execdir rm {} \; >/dev/null 2>&1

In this case, the logrotate utility behaves as follows:

  • Check for the size, and start rotation if the log file size exceeds 100M
  • Move mongod.log file to mongod.log.1
  • Create a new mongod.log file with mongod permissions
  • MongoDB continues to write to the old file, mongod.log.1
  • In postrotate, when the SIGUSR1 signal is sent, mongod rotates the log file. This includes renaming the new mongod.log file (0 bytes) created by logrotate to mongod.log.xxxx-xx-xxTxx-xx-xx format and creating a new mongod.log file to which, now, mongod starts writing the logs.
  • the Linux command

      identifies mongod.log.xxxx-xx-xxTxx-xx-xx file formats that are sized at 0 bytes, and these are removed

If you enjoyed this blog…

You might also benefit from this recorded webinar led by my colleague Tim Vaillancourt MongoDB Backup and Recovery Field Guide or perhaps the Percona Solution Brief Security for MongoDB.


The post Automating MongoDB Log Rotation appeared first on Percona Database Performance Blog.


Upcoming Webinar Thurs 9/27: Best Practices Using Indexes in MongoDB

Best Practices Using Indexes in MongoDB

Best Practices Using Indexes in MongoDBPlease join Percona’s Principal Architect Alex Rubin as he presents Best Practices Using Indexes in MongoDB on Thursday, September 27th at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).


Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.

In this webinar we’ll discuss the following:

  • How MongoDB uses indexes
  • What types of indexes exist in MongoDB
  • How to find slow queries
  • Which index to create to speed up your queries and how to verify that the index will be used
  • How to monitor and evaluate the performance of your indexes

Register for this webinar on Best Practices Using Indexes in MongoDB.

The post Upcoming Webinar Thurs 9/27: Best Practices Using Indexes in MongoDB appeared first on Percona Database Performance Blog.


Microsoft updates its planet-scale Cosmos DB database service

Cosmos DB is undoubtedly one of the most interesting products in Microsoft’s Azure portfolio. It’s a fully managed, globally distributed multi-model database that offers throughput guarantees, a number of different consistency models and high read and write availability guarantees. Now that’s a mouthful, but basically, it means that developers can build a truly global product, write database updates to Cosmos DB and rest assured that every other user across the world will see those updates within 20 milliseconds or so. And to write their applications, they can pretend that Cosmos DB is a SQL- or MongoDB-compatible database, for example.

CosmosDB officially launched in May 2017, though in many ways it’s an evolution of Microsoft’s existing Document DB product, which was far less flexible. Today, a lot of Microsoft’s own products run on CosmosDB, including the Azure Portal itself, as well as Skype, Office 365 and Xbox.

Today, Microsoft is extending Cosmos DB with the launch of its multi-master replication feature into general availability, as well as support for the Cassandra API, giving developers yet another option to bring existing products to CosmosDB, which in this case are those written for Cassandra.

Microsoft now also promises 99.999 percent read and write availability. Previously, it’s read availability promise was 99.99 percent. And while that may not seem like a big difference, it does show that after more of a year of operating Cosmos DB with customers, Microsoft now feels more confident that it’s a highly stable system. In addition, Microsoft is also updating its write latency SLA and now promises less than 10 milliseconds at the 99th percentile.

“If you have write-heavy workloads, spanning multiple geos, and you need this near real-time ingest of your data, this becomes extremely attractive for IoT, web, mobile gaming scenarios,” Microsoft CosmosDB architect and product manager Rimma Nehme told me. She also stressed that she believes Microsoft’s SLA definitions are far more stringent than those of its competitors.

The highlight of the update, though, is multi-master replication. “We believe that we’re really the first operational database out there in the marketplace that runs on such a scale and will enable globally scalable multi-master available to the customers,” Nehme said. “The underlying protocols were designed to be multi-master from the very beginning.”

Why is this such a big deal? With this, developers can designate every region they run Cosmos DB in as a master in its own right, making for a far more scalable system in terms of being able to write updates to the database. There’s no need to first write to a single master node, which may be far away, and then have that node push the update to every other region. Instead, applications can write to the nearest region, and Cosmos DB handles everything from there. If there are conflicts, the user can decide how those should be resolved based on their own needs.

Nehme noted that all of this still plays well with CosmosDB’s existing set of consistency models. If you don’t spend your days thinking about database consistency models, then this may sound arcane, but there’s a whole area of computer science that focuses on little else but how to best handle a scenario where two users virtually simultaneously try to change the same cell in a distributed database.

Unlike other databases, Cosmos DB allows for a variety of consistency models, ranging from strong to eventual, with three intermediary models. And it actually turns out that most CosmosDB users opt for one of those intermediary models.

Interestingly, when I talked to Leslie Lamport, the Turing award winner who developed some of the fundamental concepts behind these consistency models (and the popular LaTeX document preparation system), he wasn’t all that sure that the developers are making the right choice. “I don’t know whether they really understand the consequences or whether their customers are going to be in for some surprises,” he told me. “If they’re smart, they are getting just the amount of consistency that they need. If they’re not smart, it means they’re trying to gain some efficiency and their users might not be happy about that.” He noted that when you give up strong consistency, it’s often hard to understand what exactly is happening.

But strong consistency comes with its drawbacks, too, which leads to higher latency. “For strong consistency there are a certain number of roundtrip message delays that you can’t avoid,” Lamport noted.

The CosmosDB team isn’t just building on some of the fundamental work Lamport did around databases, but it’s also making extensive use of TLA+, the formal specification language Lamport developed in the late 90s. Microsoft, as well as Amazon and others, are now training their engineers to use TLA+ to describe their algorithms mathematically before they implement them in whatever language they prefer.

“Because [CosmosDB is] a massively complicated system, there is no way to ensure the correctness of it because we are humans, and trying to hold all of these failure conditions and the complexity in any one person’s — one engineer’s — head, is impossible,” Microsoft Technical Follow Dharma Shukla noted. “TLA+ is huge in terms of getting the design done correctly, specified and validated using the TLA+ tools even before a single line of code is written. You cover all of those hundreds of thousands of edge cases that can potentially lead to data loss or availability loss, or race conditions that you had never thought about, but that two or three years ago after you have deployed the code can lead to some data corruption for customers. That would be disastrous.”

“Programming languages have a very precise goal, which is to be able to write code. And the thing that I’ve been saying over and over again is that programming is more than just coding,” Lamport added. “It’s not just coding, that’s the easy part of programming. The hard part of programming is getting the algorithms right.”

Lamport also noted that he deliberately chose to make TLA+ look like mathematics, not like another programming languages. “It really forces people to think above the code level,” Lamport noted and added that engineers often tell him that it changes the way they think.

As for those companies that don’t use TLA+ or a similar methodology, Lamport says he’s worried. “I’m really comforted that [Microsoft] is using TLA+ because I don’t see how anyone could do it without using that kind of mathematical thinking — and I worry about what the other systems that we wind up using built by other organizations — I worry about how reliable they are.”

more Microsoft Ignite 2018 coverage

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com