Feb
27
2017
--

Ambitious Alibaba takes aim at the kings of cloud computing

HANGZHOU, CHINA - OCTOBER 13:  (CHINA OUT) Jack Ma, chairman of Alibaba Group Holding Ltd., speaks during the launching ceremony of the Alibaba's Tmall 11.11 Global Shopping Festival at the company's headquarters on October 13, 2015 in Hangzhou, Zhejiang Province of China. Alibaba will open offices in three European countries and expand further in the U.S as it seeks to revive growth and reassure jittery investors. The 11.11 Global Shopping event this year will cover more than 200 countries throughout the world in large-scale businesses.  (Photo by VCG/VCG via Getty Images) When you think of the biggest cloud players in the world, one company you might not consider is Alibaba, the Chinese e-commerce giant that held a record $25 billion U.S. IPO in 2014. Alibaba entered the cloud computing business in 2009, just three years after Amazon launched its cloud division, AWS — and Alibaba’s cloud computing efforts are among the ambitious projects that… Read More

Feb
27
2017
--

Workfit raises $5.5 million seed round to be your AI meeting assistant

Businessman leaning on conference room table and talking to co-workers at business meeting Conversational AI is pushing deeper into enterprise with Workfit, a new startup promising to make conference call follow-ups and mid-meeting CRM updates as easy as playing a song or checking the weather on Google Home or Amazon Echo. Battery Ventures, Greycroft Partners, Salesforce Ventures and a number of angels joined together to finance a $5.5 million seed investment in the… Read More

Feb
25
2017
--

Veeva defied detractors when it launched a cloud life sciences biz a decade ago

Veeva 10th birthday cake It’s safe to say that Veeva is not a household name, but 10 years ago this month the company launched the first of a set of enterprise tools, like content management and CRM, designed for the unique requirements of the pharmaceutical and life sciences industry. You have to remember, this was a fairly revolutionary notion a decade ago when very few companies were running enterprise… Read More

Feb
24
2017
--

Installing Percona Monitoring and Management (PMM) for the First Time

Percona Monitoring and Management 2

Percona Monitoring and ManagementThis post is another in the series on Percona’s MongoDB 3.4 bundle release. This post is meant to walk a prospective user through the benefits of Percona Monitoring and Management (PMM), how it’s architected and the simple install process. By the end of this post, you should have a good idea of what PMM is, where it can add value in your environment and how you can get PMM going quickly.

Percona Monitoring and Management (PMM) is Percona’s open-source tool for monitoring and alerting on database performance and the components that contribute to it. PMM monitors MySQL (Percona Server and MySQL CE), Amazon RDS/Aurora, MongoDB (Percona Server and MongoDB CE), Percona XtraDB/Galera Cluster, ProxySQL, and Linux.

What is it?

Percona Monitoring and Management is an amalgamation of exciting, best in class, open-source tools and Percona “engineering wizardry,” designed to make it easier to monitor and manage your environment. The real value to our users is the amount of time we’ve spent integrating the tools, plus the pre-built dashboards we’ve constructed that leverage the ten years of performance optimization experience. What you get is a tool that is ready to go out of the box, and installs in minutes. If you’re still not convinced, like ALL Percona software it’s completely FREE!

Sound good? I can hear you nodding your head. Let’s take a quick look at the architecture.

What’s it made of?

PMM, at a high-level, is made up of two basic components: the client and the server. The PMM Client is installed on the database servers themselves and is used to collect metrics. The client contains technology specific exporters (which collect and export data), and an “admin interface” (which makes the management of the PMM platform very simple). The PMM server is a “pre-integrated unit” (Docker, VM or AWS AMI) that contains four components that gather the metrics from the exporters on the PMM client(s). The PMM server contains Consul, Grafana, Prometheus and a Query Analytics Engine that Percona has developed. Here is a graphic from the architecture section of our documentation. In order to keep this post to a manageable length, please refer to that page if you’d like a more “in-depth” explanation.

How do I use it?

PMM is very easy to access once it has been installed (more on the install process below). You will simply open up the web browser of your choice and connect to the PMM Landing Page by typing

http://<ip_address_of _PMM_server>

. That takes you to the PMM landing page, where you can access all of PMM’s tools. If you’d like to get a look into the user experience, we’ve set up a great demo site so you can easily test it out.

Where should I use it?

There’s a good chance that you already have a monitoring/alerting platform for your production workloads. If not, you should set one up immediately and start analyzing trends in your environment. If you’re confident in your production monitoring solution, there is still a use for PMM in an often overlooked area: development and testing.

When speaking with users, we often hear that their development and test environments run their most demanding workloads. This is often due to stress testing and benchmarking. The goal of these workloads is usually to break something. This allows you to set expectations for normal, and thus abnormal, behavior in your production environment. Once you have a good idea of what’s “normal” and the critical factors involved, you can alert around those parameters to identify “abnormal” patterns before they cause user issues in production. The reason that monitoring is critical in your dev/test environment(s) is that you want to easily spot inflection points in your workload, which signal impending disaster. Dashboards are the easiest way for humans to consume and analyze this data.

Are you sold? Let’s get to the easiest part: installation.

How do you install it?

PMM is very easy to install and configure for two main reasons. The first is that the components (mentioned above) take some time to install, so we spent the time to integrate everything and ship it as a unit: one server install and a client install per host. The second is that we’re targeting customers looking to monitor MySQL and MongoDB installations for high-availability and performance. The fact that it’s a targeted solution makes pre-configuring it to monitor for best practices much easier. I believe we’ve all seen a particular solution that tries to do a little of everything, and thus actually does no particular thing well. This is the type of tool that we DO NOT want PMM to be. Now, onto the installation procedure.

There are four basic steps to get PMM monitoring your infrastructure. I do not want to recreate the Deployment Guide in order to maintain the future relevancy of this post. However, I’ll link to the relevant sections of the documentation so you can cut to the chase. Also, underneath each step, I’ll list some key takeaways that will save you time now and in the future.

  1. Install the integrated PMM server in the flavor of your choice (Docker, VM or AWS AMI)
    1. Percona recommends Docker to deploy PMM server as of v1.1
      1. As of right now, using Docker will make the PMM server upgrade experience seamless.
      2. Using the default version of Docker from your package manager may cause unexpected behavior. We recommend using the latest stable version from Docker’s repositories (instructions from Docker).
    2. PMM server AMI and VM are “experimental” in PMM v1.1
    3. When you open the “Metrics Monitor” for the first time, it will ask for credentials (user: admin pwd: admin).
  2. Install the PMM client on every database instance that you want to monitor.
    1. Install with your package manager for easier upgrades when a new version of PMM is released.
  3. Connect the PMM client to the PMM Server.
    1. Think of this step as sending configuration information from the client to the server. This means you are telling the client the address of the PMM server, not the other way around.
  4. Start data collection services on the PMM client.
    1. Collection services are enabled per database technology (MySQL, MongoDB, ProxySQL, etc.) on each database host.
    2. Make sure to set permissions for PMM client to monitor the database: Cannot connect to MySQL: Error 1045: Access denied for user ‘jon’@’localhost’ (using password: NO)
      1. Setting proper credentials uses this syntax sudo pmm-admin add <service_type> –user xxxx –password xxxx
    3. There’s good information about PMM client options in the “Managing PMM Client” section of the documentation for advanced configurations/troubleshooting.

What’s next?

That’s really up to you, and what makes sense for your needs. However, here are a few suggestions to get the most out of PMM.

  1. Set up alerting in Grafana on the PMM server. This is still an experimental function in Grafana, but it works. I’d start with Barrett Chambers’ post on setting up email alerting, and refine it with  Peter Zaitsev’s post.
  2. Set up more hosts to test the full functionality of PMM. We have completely free, high-performance versions of MySQL, MongoDB, Percona XtraDB Cluster (PXC) and ProxySQL (for MySQL proxy/load balancing).
  3. Start load testing the database with benchmarking tools to build your troubleshooting skills. Try to break something to learn what troubling trends look like. When you find them, set up alerts to give you enough time to fix them.
Feb
24
2017
--

Quest for Better Replication in MySQL: Galera vs. Group Replication

Group Replication

Group ReplicationUPDATE: Some of the language in the original post was considered overly-critical of Oracle by some community members. This was not my intent, and I’ve modified the language to be less so. I’ve also changed term “synchronous” (which the use of is inaccurate and misleading) to “virtually synchronous.” This term is more accurate and already used by both technologies’ founders, and should be less misleading.

I also wanted to thank Jean-François Gagné for pointing out the incorrect sentence about multi-threaded slaves in Group Replication, which I also corrected accordingly.

In today’s blog post, I will briefly compare two major virtually synchronous replication technologies available today for MySQL.

More Than Asynchronous Replication

Thanks to the Galera plugin, founded by the Codership team, we’ve had the choice between asynchronous and virtually synchronous replication in the MySQL ecosystem for quite a few years already. Moreover, we can choose between at least three software providers: Codership, MariaDB and Percona, each with its own Galera implementation.

The situation recently became much more interesting when MySQL Group Replication went into GA (stable) stage in December 2016.

Oracle, the upstream MySQL provider, introduced its own replication implementation that is very similar in concept. Unlike the others mentioned above, it isn’t based on Galera. Group Replication was built from the ground up as a new solution. MySQL Group Replication shares many very similar concepts to Galera. This post doesn’t cover MySQL Cluster, another and fully-synchronous solution, that existed much earlier then Galera — it is a much different solution for different use cases.

In this post, I will point out a couple of interesting differences between Group Replication and Galera, which hopefully will be helpful to those considering switching from one to another (or if they are planning to test them).

This is certainly not a full list of all the differences, but rather things I found interesting during my explorations.

It is also important to know that Group Replication has evolved a lot before it went GA (its whole cluster layer was replaced). I won’t mention how things looked before the GA stage, and will just concentrate on latest available 5.7.17 version. I will not spend too much time on how Galera implementations looked in the past, and will use Percona XtraDB Cluster 5.7 as a reference.

Multi-Master vs. Master-Slave

Galera has always been multi-master by default, so it does not matter to which node you write. Many users use a single writer due to workload specifics and multi-master limitations, but Galera has no single master mode per se.

Group Replication, on the other hand, promotes just one member as primary (master) by default, and other members are put into read-only mode automatically. This is what happens if we try to change data on non-master node:

mysql> truncate test.t1;
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

To change from single primary mode to multi-primary (multi-master), you have to start group replication with the 

group_replication_single_primary_mode

variable disabled.
Another interesting fact is you do not have any influence on which cluster member will be the master in single primary mode: the cluster auto-elects it. You can only check it with a query:

mysql> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME like 'group_replication%';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 329333cd-d6d9-11e6-bdd2-0242ac130002 |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

Or just:

mysql> show status like 'group%';
+----------------------------------+--------------------------------------+
| Variable_name                    | Value                                |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 329333cd-d6d9-11e6-bdd2-0242ac130002 |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

To show the hostname instead of UUID, here:

mysql> select member_host as "primary master" from performance_schema.global_status join performance_schema.replication_group_members where variable_name='group_replication_primary_member' and member_id=variable_value;
+----------------+
| primary master |
+----------------+
| f18ff539956d   |
+----------------+
1 row in set (0.00 sec)

Replication: Majority vs. All

Galera delivers write transactions synchronously to ALL nodes in the cluster. (Later, applying happens asynchronously in both technologies.) However, Group Replication needs just a majority of the nodes confirming the transaction. This means a transaction commit on the writer succeeds and returns to the client even if a minority of nodes still have not received it.

In the example of a three-node cluster, if one node crashes or loses the network connection, the two others continue to accept writes (or just the primary node in Single-Primary mode) even before a faulty node is removed from the cluster.

If the separated node is the primary one, it denies writes due to the lack of a quorum (it will report the error

ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.

). If one of the nodes receives a quorum, it will be elected to primary after the faulty node is removed from the cluster, and will then accept writes.

With that said, the “majority” rule in Group Replication means that there isn’t a guarantee that you won’t lose any data if the majority nodes are lost. There is a chance these could apply some transactions that aren’t delivered to the minority at the moment they crash.

In Galera, a single node network interruption makes the others wait for it, and pending writes can be committed once either the connection is restored or the faulty node removed from cluster after the timeout. So the chance of losing data in a similar scenario is lower, as transactions always reach all nodes. Data can be lost in Percona XtraDB Cluster only in a really bad luck scenario: a network split happens, the remaining majority of nodes form a quorum, the cluster reconfigures and allows new writes, and then shortly after the majority part is damaged.

Schema Requirements

For both technologies, one of the requirements is that all tables must be InnoDB and have a primary key. This requirement is now enforced by default in both Group Replication and Percona XtraDB Cluster 5.7. Let’s look at the differences.

Percona XtraDB Cluster:

mysql> create table nopk (a char(10));
Query OK, 0 rows affected (0.08 sec)
mysql> insert into nopk values ("aaa");
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (test.nopk) without an explicit primary key with pxc_strict_mode = ENFORCING or MASTER
mysql> create table m1 (id int primary key) engine=myisam;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into m1 values(1);
ERROR 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (test.m1) that resides in non-transactional storage engine with pxc_strict_mode = ENFORCING or MASTER
mysql> set global pxc_strict_mode=0;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into nopk values ("aaa");
Query OK, 1 row affected (0.00 sec)
mysql> insert into m1 values(1);
Query OK, 1 row affected (0.00 sec)

Before Percona XtraDB Cluster 5.7 (or in other Galera implementations), there were no such enforced restrictions. Users unaware of these requirements often ended up with problems.

Group Replication:

mysql> create table nopk (a char(10));
Query OK, 0 rows affected (0.04 sec)
mysql> insert into nopk values ("aaa");
ERROR 3098 (HY000): The table does not comply with the requirements by an external plugin.
2017-01-15T22:48:25.241119Z 139 [ERROR] Plugin group_replication reported: 'Table nopk does not have any PRIMARY KEY. This is not compatible with Group Replication'
mysql> create table m1 (id int primary key) engine=myisam;
ERROR 3161 (HY000): Storage engine MyISAM is disabled (Table creation is disallowed).

I am not aware of any way to disable these restrictions in Group Replication.

GTID

Galera has it’s own Global Transaction ID, which has existed since MySQL 5.5, and is independent from MySQL’s GTID feature introduced in MySQL 5.6. If MySQL’s GTID is enabled on a Galera-based cluster, both numerations exist with their own sequences and UUIDs.

Group Replication is based on a native MySQL GTID feature, and relies on it. Interestingly, a separate sequence block range (initially 1M) is pre-assigned for each cluster member.

WAN Support

The MySQL Group Replication documentation isn’t very optimistic on WAN support, claiming that both “Low latency, high bandwidth network connections are a requirement” and “Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.” These statements are found here and here. However there is network traffic optimization: Message Compression.

I don’t see group communication level tunings available yet, as we find in the Galera evs.* series of

wsrep_provider_options

.

Galera founders actually encourage trying it in geo-distributed environments, and some WAN-dedicated settings are available (the most important being WAN segments).

But both technologies need a reliable network for good performance.

State Transfers

Galera has two types of state transfers that allow syncing data to nodes when needed: incremental (IST) and full (SST). Incremental is used when a node has been out of a cluster for some time, and once it rejoins the other nodes has the missing write sets still in Galera cache. Full SST is helpful if incremental is not possible, especially when a new node is added to the cluster. SST automatically provisions the node with fresh data taken as a snapshot from one of the running nodes (donor). The most common SST method is using Percona XtraBackup, which takes a fast and non-blocking binary data snapshot (hot backup).

In Group Replication, state transfers are fully based on binary logs with GTID positions. If there is no donor with all of the binary logs (included the ones for new nodes), a DBA has to first provision the new node with initial data snapshot. Otherwise, the joiner will fail with a very familiar error:

2017-01-16T23:01:40.517372Z 50 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236

The official documentation mentions that provisioning the node before adding it to the cluster may speed up joining (the recovery stage). Another difference is that in the case of state transfer failure, a Galera joiner will abort after the first try, and will shutdown its mysqld instance. The Group Replication joiner will then fall-back to another donor in an attempt to succeed. Here I found something slightly annoying: if no donor can satisfy joiner demands, it will still keep trying the same donors over and over, for a fixed number of attempts:

[root@cd81c1dadb18 /]# grep 'Attempt' /var/log/mysqld.log |tail
2017-01-16T22:57:38.329541Z 12 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2017-01-16T22:57:38.539984Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
2017-01-16T22:57:38.806862Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'
2017-01-16T22:58:39.024568Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 4/10'
2017-01-16T22:58:39.249039Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 5/10'
2017-01-16T22:59:39.503086Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 6/10'
2017-01-16T22:59:39.736605Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 7/10'
2017-01-16T23:00:39.981073Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 8/10'
2017-01-16T23:00:40.176729Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 9/10'
2017-01-16T23:01:40.404785Z 12 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/10'

After the last try, even though it fails, mysqld keeps running and allows client connections…

Auto Increment Settings

Galera adjusts the auto_increment_increment and auto_increment_offset values according to the number of members in a cluster. So, for a 3-node cluster,

auto_increment_increment

  will be “3” and

auto_increment_offset

  from “1” to “3” (depending on the node). If a number of nodes change later, these are updated immediately. This feature can be disabled using the 

wsrep_auto_increment_control

 setting. If needed, these settings can be set manually.

Interestingly, in Group Replication the

auto_increment_increment

 seems to be fixed at 7, and only

auto_increment_offset

 is set differently on each node. This is the case even in the default Single-Primary mode! this seems like a waste of available IDs, so make sure that you adjust the

group_replication_auto_increment_increment

 setting to a saner number before you start using Group Replication in production.

Multi-Threaded Slave Side Applying

Galera developed its own multi-threaded slave feature, even in 5.5 versions, for workloads that include tables in the same database. It is controlled with the  wsrep_slave_threads variable. Group Replication uses a feature introduced in MySQL 5.7, where the number of applier threads is controlled with slave_parallel_workers. Galera will do multi-threaded replication based on potential conflicts of changed/locked rows. Group Replication parallelism is based on an improved LOGICAL_CLOCK scheduler, which uses information from writesets dependencies. This can allow it to achieve much better results than in normal asynchronous replication MTS mode. More details can be found here: http://mysqlhighavailability.com/zooming-in-on-group-replication-performance/

Flow Control

Both technologies use a technique to throttle writes when nodes are slow in applying them. Interestingly, the default size of the allowed applier queue in both is much different:

Moreover, Group Replication provides separate certifier queue size, also eligible for the Flow Control trigger:

 group_replication_flow_control_certifier_threshold

. One thing I found difficult, is checking the actual applier queue size, as the only exposed one via performance_schema.replication_group_member_stats is the

Count_Transactions_in_queue

 (which only shows the certifier queue).

Network Hiccup/Partition Handling

In Galera, when the network connection between nodes is lost, those who still have a quorum will form a new cluster view. Those who lost a quorum keep trying to re-connect to the primary component. Once the connection is restored, separated nodes will sync back using IST and rejoin the cluster automatically.

This doesn’t seem to be the case for Group Replication. Separated nodes that lose the quorum will be expelled from the cluster, and won’t join back automatically once the network connection is restored. In its error log we can see:

2017-01-17T11:12:18.562305Z 0 [ERROR] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2017-01-17T11:12:18.631225Z 0 [Note] Plugin group_replication reported: 'getstart group_id ce427319'
2017-01-17T11:12:21.735374Z 0 [Note] Plugin group_replication reported: 'state 4330 action xa_terminate'
2017-01-17T11:12:21.735519Z 0 [Note] Plugin group_replication reported: 'new state x_start'
2017-01-17T11:12:21.735527Z 0 [Note] Plugin group_replication reported: 'state 4257 action xa_exit'
2017-01-17T11:12:21.735553Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2017-01-17T11:12:21.735558Z 0 [Note] Plugin group_replication reported: 'new state x_start'

Its status changes to:

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ERROR |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
1 row in set (0.00 sec)

It seems the only way to bring it back into the cluster is to manually restart Group Replication:

mysql> START GROUP_REPLICATION;
ERROR 3093 (HY000): The START GROUP_REPLICATION command failed since the group is already running.
mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (5.00 sec)
mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.96 sec)
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ONLINE |
| group_replication_applier | 329333cd-d6d9-11e6-bdd2-0242ac130002 | f18ff539956d | 3306 | ONLINE |
| group_replication_applier | ae148d90-d6da-11e6-897e-0242ac130003 | 0af7a73f4d6b | 3306 | ONLINE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
3 rows in set (0.00 sec

Note that in the above output, after the network failure, Group Replication did not stop. It waits in an error state. Moreover, in Group Replication a partitioned node keeps serving dirty reads as if nothing happened (for non-super users):

cd81c1dadb18 {test} ((none)) > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
| group_replication_applier | 24d6ef6f-dc3f-11e6-abfa-0242ac130004 | cd81c1dadb18 | 3306 | ERROR |
+---------------------------+--------------------------------------+--------------+-------------+--------------+
1 row in set (0.00 sec)
cd81c1dadb18 {test} ((none)) > select * from test1.t1;
+----+-------+
| id | a |
+----+-------+
| 1 | dasda |
| 3 | dasda |
+----+-------+
2 rows in set (0.00 sec)
cd81c1dadb18 {test} ((none)) > show grants;
+-------------------------------------------------------------------------------+
| Grants for test@% |
+-------------------------------------------------------------------------------+
| GRANT SELECT, INSERT, UPDATE, DELETE, REPLICATION CLIENT ON *.* TO 'test'@'%' |
+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)

A privileged user can disable

super_read_only

, but then it won’t be able to write:

cd81c1dadb18 {root} ((none)) > insert into test1.t1 set a="split brain";
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
cd81c1dadb18 {root} ((none)) > select * from test1.t1;
+----+-------+
| id | a |
+----+-------+
| 1 | dasda |
| 3 | dasda |
+----+-------+
2 rows in set (0.00 sec)

I found an interesting thing here, which I consider to be a bug. In this case, a partitioned node can actually perform DDL, despite the error:

cd81c1dadb18 {root} ((none)) > show tables in test1;
+-----------------+
| Tables_in_test1 |
+-----------------+
| nopk |
| t1 |
+-----------------+
2 rows in set (0.01 sec)
cd81c1dadb18 {root} ((none)) > create table test1.split_brain (id int primary key);
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
cd81c1dadb18 {root} ((none)) > show tables in test1;
+-----------------+
| Tables_in_test1 |
+-----------------+
| nopk |
| split_brain |
| t1 |
+-----------------+
3 rows in set (0.00 sec)

In a Galera-based cluster, you are automatically protected from that, and a partitioned node refuses to allow both reads and writes. It throws an error: 

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

. You can force dirty reads using the 

wsrep_dirty_reads

 variable.

There many more subtle (and less subtle) differences between these technologies – but this blog post is long enough already. Maybe next time ?

Article with Similar Subject

http://lefred.be/content/group-replication-vs-galera/

Feb
24
2017
--

SoFi confirms $500 million in new funding as it pushes beyond lending

unicorn-money Online finance startup SoFi got its start refinancing student loans but gradually has been adding other services to members. To expand into new regions and move closer to becoming a full-service financial services company, SoFi has confirmed that it raised an additional $500 million in equity financing led by Silver Lake. Read More

Feb
24
2017
--

Communication platform Layer raises $15M and acquires interactive messaging startup Cola

IMG_9233 Layer, the messaging platform that won TechCrunch’s Startup Battlefield back in 2013, is making two big announcement today: It’s raised $15 million in Series B funding, and it’s acquiring another startup, Cola. Layer makes it easy for businesses to add messaging capabilities to their iOS, Android and web products — customers include Trunk Club, Staples and Udacity. In… Read More

Feb
23
2017
--

Percona MongoDB 3.4 Bundle Release: Percona Server for MongoDB 3.4 Features Explored

Percona Server for MongoDB

Percona MongoDB 3.4 Bundle ReleaseThis blog post continues the series on the Percona MongoDB 3.4 bundle release. This release includes Percona Server for MongoDB, Percona Monitoring and Management, and Percona Toolkit. In this post, we’ll look at the features included in Percona Server for MongoDB.

I apologize for the long blog, but there is a good deal of important information to cover. Not just about what new features exist, but also why that are so important. I have tried to break this down into clear areas for you to cover the most amount of data, while also linking to further reading on these topics.

The first and biggest new feature for many people is the addition of collation in MongoDB. Wikipedia says about collation:

Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books.

What this is saying is a collation is an ordering of characters for a given character set. Different languages order the alphabet differently or even have different base characters (such as Asian, Middle Eastern and other regions) that are not English-native. Collations are critical for multi-language support and sorting of non-English words for index ordering.

Sharding

General

All members of a cluster are aware of sharding (all members, sharding set name, etc.). Due to this, the

sharding.clusterRole

 must be defined on all shard nodes, a new requirement.

Mongos processes MUST connect to 3.4 mongod instances (shard and config nodes). 3.2 and lower is not possible.

Config Servers

Balancer on Config Server PRIMARY

In MongoDB 3.4, the cluster balancer is moved from the mongos processes (any) to the config server PRIMARY member.

Moving to a config-server-based balancer has the following benefits:

Predictability: the balancer process is always the config server PRIMARY. Before 3.4, any mongos processes could become the balancer, often chosen at random. This made troubleshooting difficult.

Lighter “mongos” process: the mongos/shard router benefits from being as light and thin as possible. This removes some code and potential for breakage from “mongos.”

Efficiency: config servers have dedicated nodes with very low resource utilization and no direct client traffic, for the most part. Moving the balancer to the config server set moves usage away from critical “router” processes.

Reliability: balancing relies on fewer components. Now the balancer can operate on the “config” database metadata locally, without the chance of network interruptions breaking balancing.

Config servers are a more permanent member of a cluster, unlikely to scale up/down or often change, unlike “mongos” processes that may be located on app hosts, etc.

Config Server Replica Set Required

In MongoDB 3.4, the former “mirror” config server strategy (SCCC) is no longer supported. This means all sharded clusters must use a replica-set-based set of config servers.

Using a replica-set based config server set has the following benefits:

Adding and removing config servers is greatly simplified.

Config servers have oplogs (useful for investigations).

Simplicity/Consistency: removing mirrored/SCCC config servers simplifies the high-level and code-level architecture.

Chunk Migration / Balancing Example

(from docs.mongodb.com)

Parallel Migrations

Previous to MongoDB 3.4, the balancer could only perform a single chunk migration at any given time. When a chunk migrates, a “source” shard and a “destination” shard are chosen. The balancer coordinates moving the chunks from the source to the target. In a large cluster with many shards, this is inefficient because a migration only involves two shards and a cluster may contain 10s or 100s of shards.

In MongoDB 3.4, the balancer can now perform many chunk migrations at the same time in parallel — as long as they do not involve the same source and destination shards. This means that in clusters with more than two shards, many chunk migrations can now occur at the same time when they’re mutually exclusive to one another. The effective outcome is (Number of Shards / 2) -1 == number of max parallel migrations: or an increase in the speed of the migration process.

For example, if you have ten shards, then 10/2 = 5 and  5-1 = 4. So you can have four concurrent moveChunks or balancing actions.

Tags and Zone

Sharding Zones supersedes tag-aware sharding. There is mostly no changes to the functionality. This is mostly a naming change and some new helper functions.

New commands/shell-methods added:

addShardToZone / sh.addShardToZone().

removeShardFromZone / sh.removeShardFromZone().

updateZoneKeyRange / sh.updateZoneKeyRange() + sh.removeRangeFromZone().

You might recall  MongoDB has for a long time supported the idea of shard and replication tags. They break into two main areas: hardware-aware tags and access pattern tags. The idea behind hardware-aware tags was that you could have one shard with slow disks, and as data ages, you have a process to move documents to a collection that lives on that shard (or tell specific ranges to live on that shard). Then your other shards could be faster (and multiples of them) to better handle the high-speed processing of current data.

The other is a case based more in replication, where you want to allow BI and other reporting systems access to your data without damaging your primary customer interactions. To do this, you could tag a node in a replica set to be

{reporting: true}

, and all reporting queries would use this tag to prevent affecting the same nodes the user-generated work would live on. Zones is this same idea, simplified into a better-understood term. For now, there is no major difference between these areas, but it could be something to look at more in the 3.6 and 3.8 MongoDB versions.

Replication

New “linearizable” Read Concern: reflects all successful writes issued with a “majority” and acknowledged before the start of the read operation.

Adjustable Catchup for Newly Elected Primary: the time limit for a newly elected primary to catch up with the other replica set members that might have more recent writes.

Write Concern Majority Journal Default replset-config option: determines the behavior of the 

{ w: "majority" }

 write concern if the write concern does not explicitly specify the journal option j.

Initial-sync improvements:

Now the initial sync builds the indexes as the documents are copied.

Improvements to the retry logic make it more resilient to intermittent failures on the network.

Data Types

MongoDB 3.4 adds support for the decimal128 format with the new decimal data type. The decimal128 format supports numbers with up to 34 decimal digits (i.e., significant digits) and an exponent range of ?6143 to +6144.

When performing comparisons among different numerical types, MongoDB conducts a comparison of the exact stored numerical values without first converting values to a common type.

Unlike the double data type, which only stores an approximation of the decimal values, the decimal data type stores the exact value. For example, a decimal

NumberDecimal("9.99")

 has a precise value of 9.99, whereas a double 9.99 would have an approximate value of 9.9900000000000002131628….

To test for decimal type, use the $type operator with the literal “decimal” or 19
db.inventory.find( { price: { $type: "decimal" } } )
New Number Wrapper Object Type
db.inventory.insert( {_id: 1, item: "The Scream", price: NumberDecimal("9.99"), quantity: 4 } )

To use the new decimal data type with a MongoDB driver, an upgrade to a driver version that supports the feature is necessary.

Aggregation Changes

Stages

Recursive Search

MongoDB 3.4 introduces a stage to the aggregation pipeline that allows for recursive searches.

Stage
Description
$graphLookup   Performs a recursive search on a collection. To each output document, adds a new array field that contains the traversal results of the recursive search for that document.

Faceted Search

Faceted search allows for the categorization of documents into classifications. For example, given a collection of inventory documents, you might want to classify items by a single category (such as by the price range), or by multiple groups (such as by price range as well as separately by the departments).

3.4 introduces stages to the aggregation pipeline that allow for faceted search.

Stage
Description
$bucket Categorizes or groups incoming documents into buckets that represent a range of values for a specified expression.
$bucketAuto Categorizes or groups incoming documents into a specified number of buckets that constitute a range of values for a specified expression. MongoDB automatically determines the bucket boundaries.
$facet Processes multiple pipelines on the input documents and outputs a document that contains the results of these pipelines. By specifying facet-related stages ($bucket$bucketAuto, and$sortByCount) in these pipelines, $facet allows for multi-faceted search.
$sortByCount   Categorizes or groups incoming documents by a specified expression to compute the count for each group. Output documents are sorted in descending order by the count.

 

Reshaping Documents

MongoDB 3.4 introduces stages to the aggregation pipeline that facilitate replacing documents as well as adding new fields.

Stage
Description
$addFields Adds new fields to documents. The stage outputs documents that contain all existing fields from the input documents as well as the newly added fields.
$replaceRoot   Replaces a document with the specified document. You can specify a document embedded in the input document to promote the embedded document to the top level.

Count

MongoDB 3.4 introduces a new stage to the aggregation pipeline that facilitates counting document.

Stage
Description
$count   Returns a document that contains a count of the number of documents input to the stage.

Operators

Array Operators

Operator
Description
$in Returns a boolean that indicates if a specified value is in an array.
$indexOfArray    Searches an array for an occurrence of a specified value and returns the array index (zero-based) of the first occurrence.
$range Returns an array whose elements are a generated sequence of numbers.
$reverseArray Returns an output array whose elements are those of the input array but in reverse order.
$reduce Takes an array as input and applies an expression to each item in the array to return the final result of the expression.
$zip Returns an output array where each element is itself an array, consisting of elements of the corresponding array index position from the input arrays.

Date Operators

Operator
Description
$isoDayOfWeek   Returns the ISO 8601-weekday number, ranging from 1 (for Monday) to 7 (for Sunday).
$isoWeek Returns the ISO 8601 week number, which can range from 1 to 53. Week numbers start at 1with the week (Monday through Sunday) that contains the year’s first Thursday.
$isoWeekYear Returns the ISO 8601 year number, where the year starts on the Monday of week 1 (ISO 8601) and ends with the Sundays of the last week (ISO 8601).

String Operators

Operator
Description
$indexOfBytes   Searches a string for an occurrence of a substring and returns the UTF-8 byte index (zero-based) of the first occurrence.
$indexOfCP Searches a string for an occurrence of a substring and returns the UTF-8 code point index (zero-based) of the first occurrence.
$split Splits a string by a specified delimiter into string components and returns an array of the string components.
$strLenBytes Returns the number of UTF-8 bytes for a string.
$strLenCP Returns the number of UTF-8 code points for a string.
$substrBytes Returns the substring of a string. The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string for the length specified.
$substrCP Returns the substring of a string. The substring starts with the character at the specified UTF-8 code point index (zero-based) in the string for the length specified.

Others/Misc

Other new operators:

$switch: Evaluates, in sequential order, the case expressions of the specified branches to enter the first branch for which the case expression evaluates to “true”.

$collStats: Returns statistics regarding a collection or view.

$type: Returns a string which specifies the BSON Types of the argument.

$project: Adds support for field exclusion in the output document. Previously, you could only exclude the _id field in the stage.

Views

MongoDB 3.4 adds support for creating read-only views from existing collections or other views. To specify or define a view, MongoDB 3.4 introduces:

    • theViewOn and pipeline options to the existing create command:
      • db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline> } )
    • or if specifying a default collation for the view:
      • db.runCommand( { create: <view>, viewOn: <source>, pipeline: <pipeline>, collation: <collation> } )
    • and a corresponding  mongo shell helper db.createView():
      • db.createView(<view>, <source>, <pipeline>, <collation>)

For more information on creating views, see Views.

Feb
23
2017
--

DeepCoder builds programs using code it finds lying around

markus-spiske-207946 Like all great programmers I get most of my code from StackOverflow questions. Can’t figure out how to add authentication to Flask? Easy. Want to shut down sendmail? Boom. Now, thanks to all the code on the Internet, a robot can be as smart as a $180,000 coder. The system, called DeepCoder, basically searches a corpus of code to build a project that works to spec. It’s been used… Read More

Feb
23
2017
--

FOSSA scores $2.2M to help developers manage open source licenses

Programmer coding. FOSSA wants to help developers manage the tricky terrain of open source license management, and today it announced a $2.2 million seed round. The company also announced that its license management product by the same name was available in open Beta. Let’s start with the funding, which was led by Bain Capital Ventures along with an all-star list of participants including Salesforce… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com