Jul
14
2017
--

Percona Monitoring and Management 1.2.0 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.2.0 on July 14, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

PMM Server 1.2.0 introduced the following changes:

New Features

  • PMM-737: New graphs in System Overview dashboard:
      • Memory Advanced Details
      • Saturation Metrics

  • PMM-1090: Added ESXi support for PMM Server virtual appliance.

UI Fixes

  • PMM-707: Fixed QPS metric in MySQL Overview dashboard to always show queries per second regardless of the selected interval.
  • PMM-708: Fixed tooltips for graphs that displayed incorrectly.
  • PMM-739PMM-797: Fixed PMM Server update feature on the landing page.
  • PMM-823: Fixed arrow padding for collapsible blocks in QAN.
  • PMM-887: Disabled the Add button when no table is specified for showing query info in QAN.
  • PMM-888: Disabled the Apply button in QAN settings when nothing is changed.
  • PMM-889: Fixed the switch between UTC and local time zone in the QAN time range selector.
  • PMM-909: Added message No query example when no example for a query is available in QAN.
  • PMM-933: Fixed empty tooltips for Per Query Stats column in the query details section of QAN.
  • PMM-937: Removed the percentage of total query time in query details for the TOTAL entry in QAN (because it is 100% by definition).
  • PMM-951: Fixed the InnoDB Page Splits graph formula in the MySQL InnoDB Metrics Advanced dashboard.
  • PMM-953: Enabled stacking for graphs in MySQL Performance Schema dashboard.
  • PMM-954: Renamed Top Users by Connections graph in MySQL User Statistics dashboard to Top Users by Connections Created and added the Connections/sec label to the Y-axis.
  • PMM-957: Refined titles for Client Connections and Client Questions graphs in ProxySQL Overview dashboard to mentioned that they show metrics for all host groups (not only the selected one).
  • PMM-961: Fixed the formula for Client Connections graph in ProxySQL Overview dashboard.
  • PMM-964: Fixed the gaps for high zoom levels in MySQL Connections graph on the MySQL Overview dashboard.
  • PMM-976: Fixed Orchestrator handling by supervisorctl.
  • PMM-1129: Updated the MySQL Replication dashboard to support new connection_name label introduced in mysqld_exporter for multi-source replication monitoring.
  • PMM-1054: Fixed typo in the tooltip for the Settings button in QAN.
  • PMM-1055: Fixed link to Query Analytics from Metrics Monitor when running PMM Server as a virtual appliance.
  • PMM-1086: Removed HTML code that showed up in the QAN time range selector.

Bug Fixes

  • PMM-547: Added warning page to Query Analytics app when there are no PMM Clients running the QAN service.
  • PMM-799: Fixed Orchestrator to show correct version.
  • PMM-1031: Fixed initialization of Query Profile section in QAN that broke after upgrading Angular.
  • PMM-1087: Fixed QAN package building.

Other Improvements

  • PMM-348: Added daily log rotation for nginx.
  • PMM-968: Added Prometheus build information.
  • PMM-969: Updated the Prometheus memory usage settings to leverage new flag. For more information about setting memory consumption by PMM, see FAQ.

Changes in PMM Client

PMM Client 1.2.0 introduced the following changes:

New Features

  • PMM-1114: Added PMM Client packages for Debian 9 (“stretch”).

Bug Fixes

  • PMM-481PMM-1132: Fixed fingerprinting for queries with multi-line comments.
  • PMM-623: Fixed mongodb_exporter to display correct version.
  • PMM-927: Fixed bug with empty metrics for MongoDB query analytics.
  • PMM-1126: Fixed promu build for node_exporter.
  • PMM-1201: Fixed node_exporter version.

Other Improvements

  • PMM-783: Directed mongodb_exporter log messages to stderr and excluded many generic messages from the default INFO logging level.
  • PMM-756: Merged upstream node_exporter version 0.14.0.
    PMM deprecated several collectors in this release:

    • gmond – Out of scope.
    • megacli – Requires forking, to be moved to textfile collection.
    • ntp – Out of scope.

    It also introduced the following breaking change:

    • Collector errors are now a separate metric: node_scrape_collector_success, not a label on node_exporter_scrape_duration_seconds
  • PMM-1011: Merged upstream mysqld_exporter version 0.10.0.
    This release introduced the following breaking change:

    • mysql_slave_... metrics now include an additional connection_name label to support MariaDB multi-source replication.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

Percona Monitoring and Management is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Jun
21
2017
--

Percona Monitoring and Management 1.1.5 is Now Available

Percona Monitoring and Management (PMM)

Percona announces the release of Percona Monitoring and Management 1.1.5 on June 21, 2017.

For installation instructions, see the Deployment Guide.


Changes in PMM Server

  • PMM-667: Fixed the Latency graph in the ProxySQL Overview dashboard to plot microsecond values instead of milliseconds.

  • PMM-800: Fixed the InnoDB Page Splits graph in the MySQL InnoDB Metrics Advanced dashboard to show correct page merge success ratio.

  • PMM-1007: Added links to Query Analytics from MySQL Overview and MongoDB Overview dashboards. The links also pass selected host and time period values.

    NOTE: These links currently open QAN2, which is still considered experimental.

Changes in PMM Client

  • PMM-931: Fixed pmm-admin script when adding MongoDB metrics monitoring for secondary in a replica set.

About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

May
29
2017
--

Percona Monitoring and Management 1.1.4 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.4 on May 29, 2017.

For installation instructions, see the Deployment Guide.

This release includes experimental support for MongoDB in Query Analytics, including updated QAN interface.

Query Analytics for MongoDB

To enable MongoDB query analytics, use the mongodb:queries alias when adding the service. As an experimental feature, it also requires the --dev-enable option:

sudo pmm-admin add --dev-enable mongodb:queries

NOTE: Currently, it monitors only collections that are present when you enable MongoDB query analytics. Query data for collections that you add later is not gathered. This is a known issue and it will be fixed in the future.

Query Analytics Redesign

The QAN web interface was updated for better usability and functionality (including the new MongoDB query analytics data). The new UI is experimental and available by specifying /qan2 after the URL of PMM Server.

New Query Analytics web interface

NOTE: The button on the main landing page still points to the old QAN interface.

You can check out the new QAN web UI at https://pmmdemo.percona.com/qan2

New in PMM Server

  • PMM-724: Added the Index Condition Pushdown (ICP) graph to the MySQL InnoDB Metrics dashboard.
  • PMM-734: Fixed the MySQL Active Threads graph in the MySQL Overview dashboard.
  • PMM-807: Fixed the MySQL Connections graph in the MySQL Overview dashboard.
  • PMM-850: Updated the MongoDB RocksDB and MongoDB WiredTiger dashboards.
  • Removed the InnoDB Deadlocks and Index Collection Pushdown graphs from the MariaDB dashboard.
  • Added tooltips with descriptions for graphs in the MySQL Query Response Time dashboard.Similar tooltips will be gradually added to all graphs.

New in PMM Client

  • PMM-801: Improved PMM Client upgrade process to preserve credentials that are used by services.
  • Added options for pmm-admin to enable MongoDB cluster connections.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Apr
21
2017
--

Percona Monitoring and Management 1.1.3 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.3 on April 21, 2017.

For installation instructions, see the Deployment Guide.

This release includes several new graphs in dashboards related to InnoDB and MongoDB operation, as well as smaller fixes and improvements.

New in PMM Server

  • PMM-649: Added the InnoDB Page Splits and InnoDB Page Reorgs graphs to the MySQL InnoDB Metrics Advanced dashboard.
  • Added the following graphs to the MongoDB ReplSet dashboard:
    • Oplog Getmore Time
    • Oplog Operations
    • Oplog Processing Time
    • Oplog Buffered Operations
    • Oplog Buffer Capacity
  • Added descriptions for graphs in the following dashboards:
    • MongoDB Overview
    • MongoDB ReplSet
    • PMM Demo

Changes in PMM Client

  • PMM-491: Improved pmm-admin error messages.
  • PMM-523: Added the --verbose option for pmm-admin add.
  • PMM-592: Added the --force option for pmm-admin stop.
  • PMM-702: Added the db.serverStatus().metrics.repl.executor stats to mongodb_exporter. These new stats will be used for graphs in future releases.
  • PMM-731: Added real-time checks to pmm-admin check-network output.
  • The following commands no longer require connection to PMM Server:
    • pmm-admin start --all
    • pmm-admin stop --all
    • pmm-admin restart --all
    • pmm-admin show-passwords

    NOTE: If you want to start, stop, or restart a specific service, connection to PMM Server is still required.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Apr
03
2017
--

Percona Monitoring and Management 1.1.2 is Now Available

Percona Monitoring and Management

Percona Monitoring and ManagementPercona announces the release of Percona Monitoring and Management 1.1.2 on April 3, 2017.

For installation instructions, see the Deployment Guide.

This release includes several new dashboards in Metrics Monitor, updated versions of software components used in PMM Server, and a number of small bug fixes.

Thank You to the Community!

We would like to mention some of the key contributors in this release, and thank the community for continued support of PMM:

New Dashboards and Graphs

This release includes the following new dashboards:

MongoDB MMAPv1 Dashboard

MongoDB InMemory Dashboard

  • MariaDB dashboard includes three new graphs for the Aria storage engine. There will be a detailed blog post about monitoring possibilities with these new graphs:

Aria Pagecache reads and writes

Aria Pagecache Blocks

Aria Transaction Log Syncs

The new MariaDB dashboard also includes three new graphs for monitoring InnoDB within MariaDB. We are planning to move them into one of the existing InnoDB dashboards in the next PMM release:

  • The InnoDB Defragmentation graph shows how OPTIMIZE TABLE impacts defragmentation on tables when running MariaDB with innodb_file_per_table=1 and innodb_defragment=1.

InnoDB Defragmentation

  • The InnoDB Online DDL graph includes metrics related to online DDL operations when using ALTER TABLE ... ALGORITHM=INPLACE in MariaDB.

InnoDB Online DDL

  • The InnoDB Deadlocks Detected graph currently works only with MariaDB 10.1. We are planning to add support for MariaDB 10.2, Percona Server, and MySQL in the next PMM release.

InnoDB Deadlocks Detected

  • The Index Condition Pushdown graph shows how InnoDB leverages the Index Condition Pushdown (ICP) routines. Currently this graph works only with MariaDB, but we are planning to add support for Percona Server and MySQL in the next PMM release.

Index Condition Pushdown (ICP)

Updated Software

PMM is based on several third-party open-source software components. We ensure that PMM includes the latest versions of these components in every release, making it the most secure, stable and feature-rich database monitoring platform possible. Here are some highlights of changes in the latest releases:

  • Grafana 4.2 (from 4.1.1)
    • HipChat integration
    • Templating improvements
    • Alerting enhancements
  • Consul 0.7.5 (from 0.7.3)
    • Bug fix for serious server panic
  • Prometheus 1.5.2 (from 1.5.1)
    • Prometheus binaries are built with Go1.7.5
    • Fixed two panic conditions and one series corruption bug
  • Orchestrator 2.0.3 (from 2.0.1)
    • GTID improvements
    • Logging enhancements
    • Improved timing resolution and faster discoveries

Other Changes in PMM Server

  • Migrated the PMM Server docker container to use CentOS 7 as the base operating system.
  • Changed the entry point so that supervisor is PID 1.
  • PMM-633: Set the following default values in my.cnf:
    [mysqld]
    # Default MySQL Settings
    innodb_buffer_pool_size=128M
    innodb_log_file_size=5M
    innodb_flush_log_at_trx_commit=1
    innodb_file_per_table=1
    innodb_flush_method=O_DIRECT
    # Disable Query Cache by default
    query_cache_size=0
    query_cache_type=0
  • PMM-676: Added descriptions for graphs in Disk Performance and Galera dashboards.

Changes in PMM Client

  • Fixed pmm-admin remove --all to clear all saved credentials.
  • Several fixes to mongodb_exporter including PMM-629 and PMM-642.
  • PMM-504: Added ability to change the name of a client with running services:
    $ sudo pmm-admin config --client-name new_name --force

    WARNING: Some Metrics Monitor data may be lost when renaming a running client.

About Percona Monitoring and Management

Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

Dec
16
2016
--

Percona Live 2017 Sneak Peek Schedule Up Now! See the Available Sessions!

Percona Live 2017

Percona Live 2017We are excited to announce that the sneak peek schedule for the Percona Live 2017 Open Source Database Conference is up! The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The Percona Live Open Source Database Conference 2017 is the premier event for the rich and diverse MySQL, MongoDB and open source database ecosystems. This conference provides an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience.

Below are some of our top picks for MySQL, MongoDB and open source database sessions:

Tutorials

MySQL 101 Tracks

MongoDB 101 Tracks

Breakout Talks

Register for the Percona Live Open Source Database Conference here.

Early Bird Discounts

Just a reminder to everyone out there: our Early Bird discount rate for the Percona Live Open Source Database Conference 2017 is only available ‘til January 8, 2017, 11:30 pm PST! This rate gets you all the excellent and amazing opportunities that Percona Live offers, at a very reasonable price!

Sponsor Percona Live

Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, MongoDB and open source database event. Sponsors become a part of a dynamic and growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Nov
09
2016
--

Orchestrator and ProxySQL

Orchestrator and ProxySQL

In this blog post, I am going to show you how can you use Orchestrator and ProxySQL together.

In my previous blog post, I showed how to use bash scripts and move virtual IPs with Orchestrator. As in that post, I assume you already have Orchestrator working. If not, you can find the installation steps here.

In the case of a failover, Orchestrator changes the MySQL topology and promotes a new master. But who lets the application know about this change? This is where ProxySQL helps us.

ProxySQL

You can find the ProxySQL install steps here. In our test, we use the following topology:

screen-shot-2016-11-01-at-14-27-09

For this topology we need the next rules in “ProxySQL”:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.106',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.105',601,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.56.105',600,3306,1000,0);
INSERT INTO mysql_replication_hostgroups VALUES (600,601,'');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_w',600,1);
insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_r',601,1);
insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_w','Testpass1.',1,600,'test',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_r','Testpass1.',1,601,'test',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_rw','Testpass1.',1,600,'test',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

See the connection pool:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host       | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.56.105 | 3306     | ONLINE | 4        | 0        | 4      | 0       | 2833    | 224351          | 0               | 3242       |
| 601       | 192.168.56.107 | 3306     | ONLINE | 1        | 1        | 11     | 0       | 275443  | 11785750        | 766914785       | 431        |
| 601       | 192.168.56.106 | 3306     | ONLINE | 1        | 1        | 10     | 0       | 262509  | 11182777        | 712120599       | 1343       |
| 601       | 192.168.56.105 | 3306     | ONLINE | 1        | 1        | 2      | 0       | 40598   | 1733059         | 111830195       | 3242       |
+-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
4 rows in set (0.00 sec)

It shows us “192.168.57.105” is in “hostgroup” 600, which means that server is the master.

How does ProxySQL decide who the new master is?

ProxySQL does not know what the topology looks like, which is really important. ProxySQL is monitoring the “read_only” variables on the MySQL servers, and the server where

read_only=off

 is going to get the writes. If the old master went down and we changed our topology, we have to change the read_only variables on the new master. Of course, applications like MHA or Orchestrator can do that for us.

We have two possibilities here: the master went down, or we want to promote a new master.

Master is down

If the master goes down, Orchestrator is going to change the topology and set the

read_only = OFF

 on the promoted master. ProxySQL is going to realize the master went down and send the write traffic to the server where

read_only=OFF

.

Let’s do a test. After we stopped MySQL on “192.168.56.105”, Orchestrator promoted “192.168.56.106” as the new master. ProxySQL is using it now as a master:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host       | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.56.106 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 790        |
| 601       | 192.168.56.107 | 3306     | ONLINE  | 0        | 0        | 13     | 0       | 277953  | 11894400        | 774312665       | 445        |
| 601       | 192.168.56.106 | 3306     | ONLINE  | 0        | 0        | 10     | 0       | 265056  | 11290802        | 718935768       | 790        |
| 601       | 192.168.56.105 | 3306     | SHUNNED | 0        | 0        | 2      | 0       | 42961   | 1833016         | 117959313       | 355        |
+-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
4 rows in set (0.00 sec)

This happens quickly and does not require any application, VIP or DNS modification.

Promoting a new Master

When we perform a

graceful-master-takeover

 with Orchestrator, it promotes a slave as a new master, removes the old master from the replicaset and sets

read_only=ON

.

From Orchestrator’s point of view, this is great. It promoted a slave as a new master, and old master is not part of the replicaset anymore. But as I mentioned earlier, ProxySQL does not know what the replicaset looks like.

It only knows we changed the

read_only

 variables on some servers. It is going to send reads to the old master, but it does not have up-to-date data anymore. This is not good at all.

We have two options to avoid this.

Remove master from read hostgroup

If the master is not part of the read hostgroup, ProxySQL won’t send any traffic there after we promote a new master. But in this case, if we lose the slaves, ProxySQL cannot redirect the reads to the master. If we have a lot of slaves, and the replication stopped on the saves because of an error or mistake, the master probably won’t be able to handle all the read traffic. But if we only have a few slaves, it would be good if the master can also handle reads if there is an issue on the slaves.

Using Scheduler

In this great blog post from Marco Tusa, we can see that ProxySQL can use “Schedulers”. We can use the same idea here as well. I wrote a script based on Marco’s that can recognize if the old master is no longer a part of the replicaset.

The script checks the followings:

  • read_only=ON

     – the server is read-only (on the slave servers, this has to be ON)

  • repl_lag

      is NULL – on the master, this should be NULL (if the

    seconds_behind_master

     is not defined, ProxySQL will report

    repl_lag

     is NULL)

If the

read_only=ON

, it means the server is not the master at the moment. But if the

repl_lag

 is NULL, it means the server is not replicating from anywhere, and it probably was a master. It has to be removed from the Hostgroup.

Adding a Scheduler

INSERT  INTO scheduler (id,interval_ms,filename,arg1) values (10,2000,"/var/lib/proxysql/server_monitor.pl","-u=admin -p=admin -h=127.0.0.1 -G=601 -P=6032 --debug=0  --log=/var/lib/proxysql/server_check");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

The script has parameters like username, password or port. But we also have to define the read Hostgroup (-G).

Let’s see what happens with ProsySQL after we run the command

orchestrator -c graceful-master-takeover -i rep1 -d rep2

 :

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc;
+-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host       | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.56.106 | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 504        |
| 601       | 192.168.56.107 | 3306     | ONLINE       | 0        | 2        | 2      | 0       | 6784    | 238075          | 2175559         | 454        |
| 601       | 192.168.56.106 | 3306     | ONLINE       | 0        | 0        | 2      | 0       | 6761    | 237409          | 2147005         | 504        |
| 601       | 192.168.56.105 | 3306     | OFFLINE_HARD | 0        | 0        | 2      | 0       | 6170    | 216001          | 0               | 435        |
+-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
4 rows in set (0.00 sec)

As we can see, the status changed to

OFFLINE_HARD

:

mysql> select * from mysql_servers;
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname       | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 601          | 192.168.56.107 | 3306 | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
| 601          | 192.168.56.106 | 3306 | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
| 9601         | 192.168.56.105 | 3306 | ONLINE | 1000   | 0           | 1000            | 0                   | 0       | 0              |         |
| 600          | 192.168.56.106 | 3306 | ONLINE | 1000   | 0           | 1000            | 10                  | 0       | 0              |         |
+--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.00 sec)

This is because we changed the “hostgroup_id” to 9601. This is what we wanted so that the old master won’t get more traffic.

Conclusion

Because ProxySQL redirects the traffic based on the

read_only

  variables, it is important to start the servers with

read_only=ON

 (even on the master). In that case, we can avoid getting writes on many servers at the same time.

If we want to use

graceful-master-takeover

 with Orchestrator, we have to use a scheduler that can remove the old master from the read Hostgroup.

Nov
03
2016
--

Orchestrator: Moving VIPs During Failover

Orchestrator

OrchestratorIn this post, I’ll discuss how to moving VIPs during a failover using Orchestrator.

In our previous post, we showed you how Orchestrator works. In this post, I am going to give you a proof-of-concept on how Orchestrator can move VIPs in case of failover. For this post, I’m assuming the Orchestrator is already installed and able to manage the topology.

Hooks

Orchestrator is a topology manager. Nothing less nothing more. In the case of failover, it will reorganize the topology, promote a new master and connect the slaves to it. But it won’t do any DNS changes, and it won’t move VIPs (or anything else).

However, Orchestrator supports hooks. Hooks are external scripts that can be invoked through the recovery process. There are six different hooks:

  • OnFailureDetectionProcesses
  • PreFailoverProcesses
  • PostIntermediateMasterFailoverProcesses
  • PostMasterFailoverProcesses
  • PostFailoverProcesses
  • PostUnsuccessfulFailoverProcesses

More details are in the Orchestrator manual.

With these hooks, we can call our own external scripts, which fit in our architecture and can make modifications or let the application knows who is the new master.

There are different ways to do this:

  • Updating a CNAME: if a CNAME is pointing to the master, an external script can easily do a DNS update after failover.
  • Moving a VIP to the new master: this solution is similar to a MHA and MHA-helper script (this post will discuss this solution).
Parameters

When Orchestrator calls an external script, it can also use parameters. Here is an example using the parameters available with “PostFailoverProcesses”:

{failureType}, {failureDescription}, {failedHost}, {failureCluster}, {failureClusterAlias}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countSlaves}, {slaveHosts}, {isDowntimed}, {isSuccessful}, {lostSlaves}

Without these parameters, we wouldn’t know who the new master is and which host died.

Moving VIPs

As I already mentioned, in this post I am going to show you how can you move VIPs with Orchestrator. I think many people are familiar with MHA. This solution is a bit similar to what MHA and MHA-helper does.

The main requirement is, at the same time, the main disadvantage. This solution requires SSH access from the Orchestrator node to the MySQL servers.

Adding User

First we have to add a user on the MySQL servers and Orchestrator node (you can change the username):

useradd -m orchuser -s /bin/bash

Adding sudo permissions:

vi /etc/sudoers.d/orch
Defaults !requiretty
orchuser ALL=(ALL) NOPASSWD: /usr/sbin/arping,/sbin/ip,/bin/ping

We have to add the public key from the Orchestrator node on the MySQL servers to the “/home/orchuser/.ssh/authorized_keys” file.

Now we can SSH from the Orchestrator server to the others without a password:

ssh orchuser@192.168.56.106

Failover Script

Now we need a failover script. I wrote two small bash scripts that can do it for us.

The first one called orc_hook.sh. Orchestrator calls this script like so:

vi /etc/orchestrator.conf.json
...
"PostFailoverProcesses": [
"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log",
"/usr/local/bin/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
],
...

Because Orchestrator can handle multiple clusters, we have to define some cluster parameters:

vi /usr/local/bin/orch_hook.sh
...
rep=( eth0 "192.168.56.121" orchuser )

Where “rep” is the name of the cluster, “eth0” is the name of the interface where the VIP should be added, “192.168.56.121” is the VIP on this cluster and “orchuser” is the SSH user. If we have multiple clusters, we have to add more arrays like this with the cluster details.

Orchestrator executes this script with parameters:

/usr/local/bin/orch_hook.sh DeadMaster rep mysql1 mysql2

After the script recognized the cluster, it calls the next script.

The next script is named orch_vip.sh. This is called by “orch_hook.sh” and it is going to move the VIP to the new master. It is executed like this:

/usr/local/bin/orch_vip.sh -d 1 -n mysql2 -i eth0 -I 192.168.56.121 -u orchuser -o mysql1

  • -d 1 the master is dead
  • -n mysql2is the new master
  • -i eth0 the network interface
  • -I 192.168.56.121 is the VIP
  • -u orchuser is the SSH user
  • -o mysql1 is the old master

The script requires the “arping” and “mail” commands.

Conclusion

With these two small scripts, Orchestrator is able to move VIPs from the master to the new master, and the application can work again. However this script is not production ready, and there could be cases that it cannot handle. You can test it, but use it at your own risk.

I would appreciate any comments or pull requests so we can make it better. But stay tuned: in my next blog post I am going to show you how Orchestrator can work with “ProxySQL.”

Apr
26
2016
--

How We Made MySQL Great Again, or Upgrading MySQL with Orchestrator

Upgrading MySQL with Orchestrator

Upgrading MySQL with OrchestratorIn this blog post, we’ll discuss upgrading MySQL with Orchestrator.

I recently had a client, Life360, that wanted to upgrade from Percona Server 5.5 to Percona Server 5.6, and implement GTID in their high transaction environment. They had co-masters and multiple read slaves.

Orchestrator made this job much easier for us. My colleague, Tibi, recently posted about Orchestrator here and here.

Daniel from Life360 saw Orchestrator and was very interested. So here is how he setup Orchestrator in his own words:

I did some initial testing with the vagrant boxes provided in the Orchestrator repo, to see how to configure the agents and get the Orchestrator server to do what we want.

I then moved to install the Orchestrator server, Orchestrator backend on RDS, and deploy the clients on the slaves and masters in our Amazon VPC MySQL instances.

Once the server setup was done, the clients were auto-detected through CNAME discovery of the masters, and the agents talked to the server (it took a while as CNAMES wasn’t working as expected, but that’s fixed in the new server version).

We were pretty amazed at the number of actions you can do through orchestrator itself, such as: moving slaves to a different master through drag and drop, enabling GTID on a node with the push of a button, setting up GTID based failover, taking LVM snapshots using Orchestrator Agent, etc.

We went ahead and tested the master change on drag and drop, and after a few successful attempts, we even brought it back to where it was initially. After those tests, we were pretty confident that we could leverage Orchestrator as one of our main tools to assist in the coming upgrade.

Here is a screenshot of the initial setup:

image02

Manjot: Once Daniel had Orchestrator setup, he wanted to leverage it to help with the MySQL upgrade. We set out to create a plan that worked within his constraints and still kept best practices in mind.

First, we installed Percona Server 5.6 fresh on our dedicated backup slave. That first 5.6 slave was created with MyDumper to achieve forward compatibility and not have any legacy tablespaces. Since MyDumper was already installed with the Percona Backup Service that Life360 has, this was fairly easy to accomplish.

The MyDumper slave rebuild works in the following way:

To take a mydumper backup:

  1. Go to your desired backups directory
  2. Install mydumper (sudo apt-get install mydumper)
  3. mydumper -t 8 -L mydumper.log –compress

To restore:

  1. Make sure MyDumper is installed: sudo apt-get install mydumper
  2. Copy the MyDumper backups over to a backups dir
  3. Export your BACKUP_DIR as env var
  4. Run this to restore with MyLoader (from https://gist.github.com/Dnile/4658b338d4a101cbe2eeb5080ebddf8e):
    #!/usr/bin/env sh
    cd $BACKUP_DIR
    export DESTHOST=127.0.0.1
    export BACKUP_DIR=/vol_mysql/backups
    mysqld --skip-grant-tables &
    for i in `ls -1 *-schema.dump.gz | cut -d'-' -f1`; do mysql -h $DESTHOST -e "CREATE DATABASE IF NOT EXISTS $i"; zcat $i-schema.dump.gz | mysql -h $DESTHOST $i; zcat $i-schema-post.dump.gz | mysql -h $DESTHOST $i; done
    /usr/bin/myloader --host=$DESTHOST --directory=$BACKUP_DIR --enable-binlog --threads=10 --queries-per-transaction=20 -v 3
    chown -R mysql:mysql /var/lib/mysql/

Once the first 5.6 slave was caught up, we used Xtrabackup to backup 5.6 and then restored to each slave, cycling them out of the read slave pool one at a time.

Once all the slaves were upgraded, we created a new 5.6 master and had it replicate off our primary 5.5 master.

Then we moved all of the slaves to replicate off the new 5.6 master.

Life360 had long cron jobs that ran on the second 5.5 master. We moved the cron applications to write to the primary 5.5 master, and locked all tables. We then stopped replication on the second co-master. Daniel stopped MySQL and decommissioned it.

We then moved all application writes to the new 5.6 master. While Orchestrator can use external scripts to move IPs, we used a manual process here to change application DSNs and HAProxy configuration.

On the 5.5 master that remained, we used Orchestrator to set it to read only.

image01

Daniel says this didn’t do a whole lot to get rid of connections that were still open on this server.

On the new master, we used the stop slave and reset slave buttons in the Orchestrator panel so it would no longer slave from the old master.

Once some of the thousands of connections had moved to the new master, we stopped MySQL on the 5.5 master, which took care of the rest and the application “gracefully” reconnected to the new 5.6 master.

There was some write downtime, as some connections did not drop off until they were forced to because php-fpm refused to let go. There is also always a high volume of transactions in this environment.

At this point our topology looks like this (ignore the globe icons for now):

image00

But as always Daniel wanted MOAR. It was time for GTID. While we could have done this during the upgrade, Life360 wanted to manage risk and not make too many production changes at one time.

We followed Percona’s guide, Online GTID Deployment, but used Orchestrator to shuffle the old and new masters and toggle read_only on and off. This made our job a lot easier and faster, and saved us from any downtime.

The globes in the topology screenshot above show that the slaves are now using GTID replication.

Orchestrator makes upgrades and changes much easier than before, just use caution and understand what it is doing in the background.

 

Mar
08
2016
--

Orchestrator: MySQL Replication Topology Manager

Orchestrator MySQL topology manager

Orchestrator MySQL replication topology managerThis blog post discusses Orchestrator: MySQL Replication Topology Manager.

What is Orchestrator?

Orchestrator is a replication topology manager for MySQL.

It has many great features:

  • The topology and status of the replication tree is automatically detected and monitored
  • Either a GUI, CLI or API can be used to check the status and perform operations
  • Supports automatic failover of the master, and the replication tree can be fixed when servers in the tree fail – either manually or automatically
  • It is not dependent on any specific version or flavor of MySQL (MySQL, Percona Server, MariaDB or even MaxScale binlog servers)
  • Orchestrator supports many different types of topologies, from a single master -> slave  to complex multi-layered replication trees consisting of hundreds of servers
  • Orchestrator can make topology changes and will do so based on the state at that moment; it does not require a configuration to be defined with what corresponds to the database topology
  • The GUI is not only there to report the status – one of the cooler things you can do is change replication just by doing a drag and drop in the web interface (of course you can do this and much more through the CLI and API as well)

Here’s a gif that demonstrates this (click on an image to see a larger version):

Orchestrator Drag N Drop

Orchestrator’s manual is quite extensive and detailed, so the goal of this blogpost is not to go through every installation and configuration step. It will just give a global overview on how Orchestrator works, while mentioning some important and interesting settings.

How Does It Work?

Orchestrator is a go application (binaries, including

rpm

  and

deb

  packages are available for download).

It requires it’s own MySQL database as a backend server to store all information related to the Orchestrator managed database cluster topologies.

There should be at least one Orchestrator daemon, but it is recommended to run many Orchestrator daemons on different servers at the same time – they will all use the same backend database but only one Orchestrator is going to be “active” at any given moment in time. (You can check who is active under the

Status

  menu on the web interface, or in the database in the

active_node

  table.)

Using MySQL As Database Backend, Isn’t That A SPOF?

If the Orchestrator MySQL database is gone, it doesn’t mean the monitored MySQL clusters stop working. Orchestrator just won’t be able to control the replication topologies anymore. This is similar to how MHA works: everything will work but you can not perform a failover until MHA is back up again.

At this moment, it’s required to have a MySQL backend and there is no clear/tested support for having this in high availability (HA) as well. This might change in the future.

Database Server Installation Requirements

Orchestrator only needs a MySQL user with limited privileges (

SUPER, PROCESS, REPLICATION SLAVE, RELOAD

) to connect to the database servers. With those permissions, it is able to check the replication status of the node and perform replication changes if necessary. It supports different ways of replication: binlog file positions, MySQL&MariaDB GTID, Pseudo GTID and Binlog servers.

There is no need to install any extra software on the database servers.

Automatic Master Failure Recovery

One example of what Orchestrator can do is promote a slave if a master is down. It will choose the most up to date slave to be promoted.

Let’s see what it looks like:

dtdggKGF0f

In this test we lost

rep1

 (master) and Orchestrator promoted

rep4

  to be the new master, and started replicating the other servers from the new master.

With the default settings, if

rep1

 comes back 

rep4

  is going to continue the replication from

rep1

. This behavior can be changed with the setting 

ApplyMySQLPromotionAfterMasterFailover:True

 in the configuration.

Command Line Interface

Orchestrator has a nice command line interface too. Here are some examples:

Print the topology:

> orchestrator -c topology -i rep1:3306 cli
rep1:3306   [OK,5.6.27-75.0-log,ROW,>>]
+ rep2:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep3:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep4:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep5:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]

Move a slave:

orchestrator -c relocate -i rep2:3306 -d rep4:3306

Print the topology again:

> orchestrator -c topology -i rep1:3306 cli
rep1:3306     [OK,5.6.27-75.0-log,ROW,>>]
+ rep3:3306   [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep4:3306   [OK,5.6.27-75.0-log,ROW,>>,GTID]
  + rep2:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]
+ rep5:3306   [OK,5.6.27-75.0-log,ROW,>>,GTID]

As we can see, 

rep2

  now is replicating from

rep4

 .

Long Queries

One nice addition to the GUI is how it displays slow queries on all servers inside the replication tree. You can even kill bad queries from within the GUI.

Orchestrator GUI Long Queries

Orchestrator Configuration Settings

Orchestrator’s daemon configuration can be found in 

/etc/orchestrator.conf.json

. There are many configuration options, some of which we elaborate here:

  • SlaveLagQuery

      – Custom queries can be defined to check slave lag.

  • AgentAutoDiscover

      – If set to

    True

     , Orchestrator will auto-discover the topology.

  • HTTPAuthPassword

      and

    HTTPAuthUser

      –  Avoids everybody being able to access the Web GUI and change your topology.

  • RecoveryPeriodBlockSeconds

      – Avoids flapping.

  • RecoverMasterClusterFilters

      –  Defines which clusters should auto failover/recover.

  • PreFailoverProcesses

      – Orchestrator will execute this command before the failover.

  • PostFailoverProcesses

      – Orchestrator will execute this command after the failover.

  • ApplyMySQLPromotionAfterMasterFailover

      –  Detaches the promoted slave after failover.

  • DataCenterPattern

      – If there are multiple data centers, you can mark them using a pattern (they will get different colors in the GUI):Orchestrator thinks every server is in a different DC now.

Limitations

While being a very feature-rich application, there are still some missing features and limitations of which we should be aware.

One of the key missing features is that there is no easy way to promote a slave to be the new master. This could be useful in scenarios where the master server has to be upgraded, there is a planned failover, etc. (this is a known feature request).

Some known limitations:

  • Slaves can not be manually promoted to be a master
  • Does not support multi-source replication
  • Does not support all types of parallel replication
  • At this moment, combining this with Percona XtraDB Cluster (Galera) is not supported

Is Orchestrator Your High Availability Solution?

In order to integrate this in your HA architecture or include in your fail-over processes you still need to manage many aspects manually, which can all be done by using the different hooks available in Orchestrator:

  • Updating application connectivity:
    • VIP handling,
    • Updating DNS
    • Updating Proxy server (MaxScale , HAProxy , ProxySQL…) connections.
  • Automatically setting slaves to read only to avoid writes happening on non-masters and causing data inconsistencies
  • Fencing (STONITH) of the dead master, to avoid split-brain in case a crashed master comes back online (and applications still try to connect to it)
  • If semi-synchronous replication needs to be used to avoid data loss in case of master failure, this has to be manually added to the hooks as well

The work that needs to be done is comparable to having a setup with MHA or MySQLFailover.

This post also doesn’t completely describe the decision process that Orchestrator takes to determine if a server is down or not. The way we understand it right now, one active Orchestrator node will make the decision if a node is down or not. It does check a broken node’s slaves replication state to determine if Orchestrator isn’t the only one losing connectivity (in which it should just do nothing with the production servers). This is already a big improvement compared to MySQLFailover, MHA or even MaxScale’s failoverscripts, but it still might cause some problems in some cases (more information can be found on Shlomi Noach’s blog).

Summary

The amount of flexibility and power and fun that this tool gives you with a very simple installation process is yet to be matched. Shlomi Noach did a great job developing this at Outbrain, Booking.com and now at GitHub.

If you are looking for MySQL Topology Manager, Orchestrator is definitely worth looking at.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com