May
18
2021
--

Google Cloud launches Vertex AI, a new managed machine learning platform

At Google I/O today Google Cloud announced Vertex AI, a new managed machine learning platform that is meant to make it easier for developers to deploy and maintain their AI models. It’s a bit of an odd announcement at I/O, which tends to focus on mobile and web developers and doesn’t traditionally feature a lot of Google Cloud news, but the fact that Google decided to announce Vertex today goes to show how important it thinks this new service is for a wide range of developers.

The launch of Vertex is the result of quite a bit of introspection by the Google Cloud team. “Machine learning in the enterprise is in crisis, in my view,” Craig Wiley, the director of product management for Google Cloud’s AI Platform, told me. “As someone who has worked in that space for a number of years, if you look at the Harvard Business Review or analyst reviews, or what have you — every single one of them comes out saying that the vast majority of companies are either investing or are interested in investing in machine learning and are not getting value from it. That has to change. It has to change.”

Image Credits: Google

Wiley, who was also the general manager of AWS’s SageMaker AI service from 2016 to 2018 before coming to Google in 2019, noted that Google and others who were able to make machine learning work for themselves saw how it can have a transformational impact, but he also noted that the way the big clouds started offering these services was by launching dozens of services, “many of which were dead ends,” according to him (including some of Google’s own). “Ultimately, our goal with Vertex is to reduce the time to ROI for these enterprises, to make sure that they can not just build a model but get real value from the models they’re building.”

Vertex then is meant to be a very flexible platform that allows developers and data scientist across skill levels to quickly train models. Google says it takes about 80% fewer lines of code to train a model versus some of its competitors, for example, and then help them manage the entire lifecycle of these models.

Image Credits: Google

The service is also integrated with Vizier, Google’s AI optimizer that can automatically tune hyperparameters in machine learning models. This greatly reduces the time it takes to tune a model and allows engineers to run more experiments and do so faster.

Vertex also offers a “Feature Store” that helps its users serve, share and reuse the machine learning features and Vertex Experiments to help them accelerate the deployment of their models into producing with faster model selection.

Deployment is backed by a continuous monitoring service and Vertex Pipelines, a rebrand of Google Cloud’s AI Platform Pipelines that helps teams manage the workflows involved in preparing and analyzing data for the models, train them, evaluate them and deploy them to production.

To give a wide variety of developers the right entry points, the service provides three interfaces: a drag-and-drop tool, notebooks for advanced users and — and this may be a bit of a surprise — BigQuery ML, Google’s tool for using standard SQL queries to create and execute machine learning models in its BigQuery data warehouse.

We had two guiding lights while building Vertex AI: get data scientists and engineers out of the orchestration weeds, and create an industry-wide shift that would make everyone get serious about moving AI out of pilot purgatory and into full-scale production,” said Andrew Moore, vice president and general manager of Cloud AI and Industry Solutions at Google Cloud. “We are very proud of what we came up with in this platform, as it enables serious deployments for a new generation of AI that will empower data scientists and engineers to do fulfilling and creative work.”

Sep
09
2015
--

Percona Server audit log plugin best practices

Auditing your database means tracking access and changes to your data and db objects. The Audit Log Plugin has been shipped with Percona Server since 5.5.37/5.6.17, for a little over 12 months. Prior to the Audit Log Plugin, you had to work in darker ways to achieve some incarnation of an audit trail.

We have seen attempts at creating audit trails using approaches such as ‘sniffing the wire’, init files, in-schema ‘on update’ fields, triggers, proxies and trying to parse the traditional logs of MySQL (slow, general, binary, error). All of these attempts miss a piece of the pie, i.e. if you’re sniffing tcp traffic you’ll miss local connections, parsing binary logs you’re missing any reads. Your reasons for audit logging might be down to compliance requirements (HIPAA, PCI DSS) or you may need a way to examine database activity or track the connections incoming.

Over the past months I’ve met many support requests with the answer ‘install an audit plugin’. These requests have been varied but they have ranged from; finding out if a user is still active and if the impact of decommissioning it, the frequency of specific queries and checking if a slave is being written to name but a few.

So then, lets look at installation. In general we desire installation of the Audit Plugin on an existing instance. We discussed in previous Percona Blog posts, the installation of the plugin is trivial but lets recap. Lets perform a couple of basic checks before we run the install command from the client. First, query MySQL for the location of the plugins directory;

mysql> show global variables like 'plugin_dir';
+---------------+--------------------------+
| Variable_name | Value                    |
+---------------+--------------------------+
| plugin_dir    | /usr/lib64/mysql/plugin/ |
+---------------+--------------------------+
1 row in set (0.00 sec)

Once that’s known we’ll check that the audit log plugin shared library is present;

[moore@randy ~]$ ls -l /usr/lib64/mysql/plugin/audit*
-rwxr-xr-x. 1 root root 42976 Jul  1 09:24 /usr/lib64/mysql/plugin/audit_log.so

Great, we are in good shape to move to the client and install;

mysql> install plugin audit_log soname 'audit_log.so';
Query OK, 0 rows affected (0.00 sec)
mysql> select * from mysql.plugin;
+-------------------------------+--------------+
| name                          | dl           |
+-------------------------------+--------------+
| audit_log                     | audit_log.so |
...
+-------------------------------+--------------+
8 rows in set (0.00 sec)

Voila! It’s that simple. So, what does that provide us? Well now thanks to our default variables we’ve got the following options set;

mysql> show global variables like 'audit%';
+---------------------------+---------------+
| Variable_name             | Value         |
+---------------------------+---------------+
| audit_log_buffer_size     | 1048576       |
| audit_log_file            | audit.log     |
| audit_log_flush           | OFF           |
| audit_log_format          | OLD           |
| audit_log_handler         | FILE          |
| audit_log_policy          | ALL           |
| audit_log_rotate_on_size  | 0             |
| audit_log_rotations       | 0             |
| audit_log_strategy        | ASYNCHRONOUS  |
| audit_log_syslog_facility | LOG_USER      |
| audit_log_syslog_ident    | percona-audit |
| audit_log_syslog_priority | LOG_INFO      |
+---------------------------+---------------+
12 rows in set (0.00 sec)

So what we can tell from that output is that our audit plugin is enabled, it’s logging out to the default location ({datadir}/audit.log) and we’re grabbing all events (ALL) on the server and sending the output in XML format (OLD). From the list of variables above we’ve only got one dynamic variable. This means to change the logfile location or the format we need to put these options into our my.cnf and restart the instance. Not very convenient. Personally, it’s my preference to store the audit.log file away from my datadir.

I also dislike the XML formats in favour of the JSON log format. It is also advised, especially on busier systems, to enable the rotation options, audit_log_rotate_on_size and audit_log_rotations so that you don’t end up filling your disk with a huge audit log. Restarting your production instances isn’t extremely convenient but you’ll be happy to learn there is another way.

Let’s rewind to before we installed the plugin. We had checked the existence of our plugin shared library and were itching to run the install command. Now we can open our my.cnf file and add our preferred options prior to installation. Whilst it’s far from a secret, not many will know that the in the plugin installation phase, MySQL will re-read the my.cnf file to check for configuration relevant to the plugin. So let’s add some variables here;

## Audit Logging ##
audit_log_policy=ALL
audit_log_format=JSON
audit_log_file=/var/log/mysql/audit.log
audit_log_rotate_on_size=1024M
audit_log_rotations=10

A quick review of the above. I intend to log all events in JSON format to the /var/log/mysql location. I will rotate each time the active log file hits 1G and this will circulate 10 files meaning I will not have more than 10G of audit logs on my filesystem.

Now with our predefined configuration in the my.cnf we can install the plugin from cold and begin with our preferred options;

mysql> show global variables like 'audit%';
Empty set (0.00 sec)
mysql> install plugin audit_log soname 'audit_log.so';
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'audit%';
+---------------------------+--------------------------+
| Variable_name             | Value                    |
+---------------------------+--------------------------+
| audit_log_buffer_size     | 1048576                  |
| audit_log_file            | /var/log/mysql/audit.log |
| audit_log_flush           | OFF                      |
| audit_log_format          | JSON                     |
| audit_log_handler         | FILE                     |
| audit_log_policy          | ALL                      |
| audit_log_rotate_on_size  | 1073741824               |
| audit_log_rotations       | 10                       |
| audit_log_strategy        | ASYNCHRONOUS             |
| audit_log_syslog_facility | LOG_USER                 |
| audit_log_syslog_ident    | percona-audit            |
| audit_log_syslog_priority | LOG_INFO                 |
+---------------------------+--------------------------+
12 rows in set (0.00 sec)

Something to remember; if you add these variables before installation of the plugin and you restart your instance or suffer a crash, your instance will not start.

[moore@randy ~]$ sudo systemctl restart mysql
[moore@randy ~]$ sudo egrep 'ERROR' /var/log/mysqld.log
2015-09-02 11:55:16 8794 [ERROR] /usr/sbin/mysqld: unknown variable 'audit_log_policy=ALL'
2015-09-02 11:55:16 8794 [ERROR] Aborting

When all up and running we can check that the content is finding it’s way to our log file by opening it up and taking a look. Our JSON output will store a new line of JSON per event, here’s an example:

{"audit_record":{"name":"Query","record":"1067824616_2015-09-02T10:04:26","timestamp":"2015-09-02T10:54:53 UTC","command_class":"show_status","connection_id":"6","status":0,"sqltext":"SHOW /*!50002 GLOBAL */ STATUS","user":"pct[pct] @ localhost [127.0.0.1]","host":"localhost","os_user":"","ip":"127.0.0.1"}}

compare that with the ‘OLD’ XML output format that spans multiple lines making parsing a more difficult task:

<AUDIT_RECORD
 NAME="Query"
 RECORD="2745742_2015-09-02T21:12:10"
 TIMESTAMP="2015-09-02T21:12:22 UTC"
 COMMAND_CLASS="show_status"
 CONNECTION_ID="8"
 STATUS="0"
 SQLTEXT="SHOW /*!50002 GLOBAL */ STATUS"
 USER="pct[pct] @ localhost [127.0.0.1]"
 HOST="localhost"
 OS_USER=""
 IP="127.0.0.1"
/>

Cost

One of the common assumptions of invoking the Audit Plugin is that it’s going to take an almighty hit on load. Logging all connections, queries and admin statements…surely? Well not so true. I spent some time observing the impact to the resources on a humbly specc’d home server. A small machine running quad core Xeon, 32G of RAM and a Samsung PRO SSD with a 72k rpm disk for the logs. Here are a collection of the graphs to illustrate that the impact of turning on the Audit Logging in asynchronous mode, as you will see the results are encouragingly showing little impact on activation of full logging. In each image, audit logging was set off and subsequently on.

percona_cpu_off_onpercona_disk_off_onpercona_off_on

Summary

We can install the Percona Audit plugin with our preferred options on a running system without interrupting it by adding our variables to the my.cnf. By performing this prior to the installing the plugin gives us best practice options without needing to restart the instance for static variables to take effect. Due to the lightweight nature of the audit plugin you can add this new log file to track access and changes to the data without the performance hit of the slow or general log. The audit log is a great aid to debugging and can serve as a security measure and malpractice deterrent.

The post Percona Server audit log plugin best practices appeared first on MySQL Performance Blog.

Sep
03
2015
--

Percona Toolkit and systemd

After some recent work with systemd I’ve realized it’s power and I can come clean that I am a fan. I realize that there are multitudes of posts out there with arguments both for and against systemd but let’s look at some nice ways to have systemd provide us with (but not limited to) pt-kill-as-a-service.

This brief post introduces you to a systemd unit file and how we can leverage them to enable pt-kill at startup, have it start after mysqld and ensure that MySQL is running by using the mysql service as a dependency of pt-kill. By using systemd to handle this we don’t have to complicate matters by ‘monitoring the monitor’ using hacky shell scripts, cron or utilities like monit.

So then, a quick primer on systemd, because lets face it, we’ve all been avoiding it. Systemd is not new but it made recent headlines in the Linux world due to some of the major distros announcing their intentions to migrate upcoming releases to systemd.

What is it? Well due to it’s depth it is best described as a suite of management daemons, libraries and tools that will replace the traditional init scripts. So essentially remember how you start a service, mount a volume or read the system logs…well start forgetting all of that because systemd is disrupting this space. With systemd comes some really neat tricks for administering your machines and I’m really only beginning to see the tip of this iceberg. There is admittedly a lot to learn with systemd but this should serve as pragmatic entrée.

Systemd what? When did this happen?

Linux distribution Date released as default
Arch Linux 000000002012-10-01-0000October 2012
CoreOS 000000002013-10-01-0000October 2013 (v94.0.0)
Debian 000000002015-04-01-0000April 2015 (v8 aka jessie)
Fedora 000000002011-05-01-0000May 2011 (v15)
Gentoo Linux N/A
Mageia 000000002012-05-01-0000May 2012 (v2.0)
openSUSE 000000002012-09-01-0000September 2012 (v12.2)
Red Hat Enterprise Linux 000000002014-06-01-0000June 2014 (v7.0)
Slackware N/A
SUSE Linux Enterprise Server 000000002014-10-01-0000October 2014 (v12)
Ubuntu 000000002015-04-01-0000April 2015 (v15.04)

Lennart Poettering, the name frequently attached with systemd is seeking to modernize the most fundamental process(es) of the Linux startup system, bringing the paradigms of modern computing; concurrency, parallelism and efficiency. The dependency tree of processes and services is more intuitive and the structure of the underlying startup scripts are unified. I feel that the direction proposed by systemd is an evolutionary one which promotes consistency within the startup scripts enabling conventions that can be easier understood by a broader audience.

Systemd and Percona Toolkit

This post aims to show that we can rely on systemd to handle processes such as pt-kill, pt-stalk, and other daemonized scripts that we like to have running perpetually, are fired at startup and can be reinstated after failure.

The scenario is this; I want pt-kill to drop all sleeping connections from a certain application user, lets call them, ‘caffeinebob’, because they never close connections. Due to various reasons we can’t make changes in the application so we’re employing Percona Toolkit favourite, pt-kill, to do this for us. For convenience we want this result to persist across server restarts. In the olden days we might have some cron job that fires a shell script in combination with a sentinal file to ensure it’s running. I’m pretty sure that this kitty could be skinned many ways.

The systemd Unit File

After some research and testing, the below unit file will play nicely on a Centos 7 node with systemd at it’s core. In this example I am running Percona Server 5.6 installed using Percona’s yum repo with the mysql.service unit file generated at installation. I suspect that there could be some systemd deviation with other MySQL variants however, this configuration is working for me.

[Unit]
Description = pt-kill caffeinebob
After=syslog.target mysql.service
Requires=mysql.service
[Service]
Type = simple
PIDFile = /var/run/ptkill.pid
ExecStart = /usr/bin/pt-kill
--daemonize
--pid=/var/run/ptkill.pid
--interval=5
--defaults-file=/root/.my.cnf
--log=/var/log/ptkill.log
--match-user caffeinebob
--busy-time 10
--kill
--print
Restart=on-abort
[Install]
WantedBy=multi-user.target

Let’s examine the above and see what we’re working with. Systemd unit files have various biologies. The example above is a simple Service unit file. This means we are enacting a process controlled and supervised by systemd. The significance of the After directive is that this service will not attempt startup until after syslog.target and mysql.service have been called. The Required directive is makes ptkill.service dependant on the mysql.service startup being successful.

The next part, the [Service] grouping, details the actions to be taken by the service. The Type can be one of many but as it’s a simple call to a script I’ve used the simple type. We are describing the command and the handling of it. The ExecStart is evidently the pt-kill command that we would usually run from the shell prompt or from within a shell script. This is a very corse example because we can opt to parameterize the command with the assistance of an Environment file. Note the use of the Restart directive, used so that systemd can handle a reaction should a failure occur that interrupts the process.

Finally under the [Install] grouping we’re telling systemd that this service should startup on a multi user system, and could be thought of as runlevel 2 or 3 (Multiuser mode).

So providing that we’ve got all the relevant paths, users and dependencies in place, once you reboot your host, mysql.service should in order, initiate mysqld and when that dependency is met, systemd will initiate pt-kill with our desired parameters to cull connections that meet the criteria stipulated in our configuration. This means you rely on systemd to manage pt-kill for you and you don’t necessarily need to remember to start this or similar processes when you restart you node.

Start up & enable

Now to envoke our service manually and add enable it to work on start up we should run the following systemctl commands;

[moore@localhost ~]$ sudo systemctl start ptkill.service
[moore@localhost ~]$ sudo systemctl enable ptkill.service

No feedback but no errors so we can check the status of the service

[moore@localhost ~]$ sudo systemctl status ptkill -l
ptkill.service - keep pt-kill persistent across restarts
   Loaded: loaded (/etc/systemd/system/ptkill.service; enabled)
   Active: active (running) since Wed 2015-08-12 02:39:13 BST; 1h 19min ago
 Main PID: 2628 (perl)
   CGroup: /system.slice/ptkill.service
           ??2628 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Perfect we can also instruct systemd to disable this and|or stop our service when the application is changed and caffeinebob close() all those open connections.

[moore@localhost ~]$ sudo systemctl stop ptkill.service
[moore@localhost ~]$ sudo systemctl disable ptkill.service

Now after successful implementation we see that our process is running delightfully;

[moore@localhost ~]$ ps -ef | grep pt-kill
root      2547     1  0 02:37 ?        00:00:00 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Catch me if I fall

Lets issue a kill signal to the process and observe it’s behaviour using journalctl

[moore@localhost ~]$ sudo kill -SEGV 2547

This will write similar entries into the system log;

[moore@localhost ~]$ sudo journalctl -xn -f
Aug 12 02:39:13 localhost.localdomain sudo[2624]: moore : TTY=pts/1 ; PWD=/home/moore ; USER=root ; COMMAND=/bin/kill -SEGV 2547
Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service: main process exited, code=killed, status=11/SEGV
Aug 12 02:39:13 localhost.localdomain systemd[1]: Unit ptkill.service entered failed state.
Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service holdoff time over, scheduling restart.
Aug 12 02:39:13 localhost.localdomain systemd[1]: Stopping keep pt-kill persistent across restarts...
-- Subject: Unit ptkill.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ptkill.service has begun shutting down.
Aug 12 02:39:13 localhost.localdomain systemd[1]: Starting keep pt-kill persistent across restarts...
-- Subject: Unit ptkill.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ptkill.service has begun starting up.
Aug 12 02:39:13 localhost.localdomain systemd[1]: Started keep pt-kill persistent across restarts.
-- Subject: Unit ptkill.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ptkill.service has finished starting up.
--
-- The start-up result is done.

Pt-kill flaps after the kill signal but systemd has been instructed to restart on failure so we don’t see caffeinebob saturate our processlist with sleeping connections.

Another bonus with this workflow is use within orchestration. Any standardized unit files can be propagated to your fleet of hosts with tools such as Ansible, Chef, Puppet or Saltstack.

Closing note

I’d love to hear from the pragmatists from the systemd world to understand if this approach can be improved or whether there are any flaws in this example unit file that would require addressing. This is very much a new-school of thought for me and feedback is both welcome and encouraged.

Thank you for your time, happy systemd-ing.

The post Percona Toolkit and systemd appeared first on Percona Data Performance Blog.

Dec
04
2014
--

Sneak peek at the Percona Live MySQL Conference & Expo 2015

Sneak peek at the Percona Live MySQL Conference & Expo 2015You know you’ll be there so why not save some $$ by registering now for the Percona Live MySQL Conference & Expo 2015 (April 13-16 in Santa Clara, Calif.). Super Saver registration discounts are available through Dec. 14 at 11:30 p.m. PST. (That’s just 10 days away!)

What to expect this year? The Percona Live 2015 conference committee is putting together another fantastic event for the global MySQL community’s next meeting in April. The full conference agenda will be announced in January, but the initial roster includes:

  • Sunny Bains, Senior Engineering Manager at Oracle; “InnoDB 5.7- What’s New”
  • Yoshinori Matsunobu, Database Engineer at Facebook; “Fast Master Failover Without Data Loss”
  • Jeremy Cole, Senior Systems Engineer at Google, Inc.; “Exploring Your Data With InnoDB Explorer”
  • Tom Krouper, Staff Database Administrator at Twitter; “Upgrading to MySQL 5.6 @ Scale”
  • Jenni Snyder, Database Administrator at Yelp; “Schema changes multiple times a day? OK!”
  • Ike Walker, Database Architect at Flite; “Assembling the Perfect MySQL Toolbox”
  • Jean-François Gagné, Senior System Engineer/Architect at Booking.com; “Binlog Servers at Booking.com”
  • Jeremy Glick, Lead DBA at MyDBAteam, and Andrew Moore, MySQL DBA at Percona; “Using MySQL Audit Plugins and Elasticsearch ELK”
  • Tomáš Komenda, Team Lead and Database Specialist, and Lukáš Putna, Senior Developer and Database Specialist at Seznam.cz; “MySQL and HBase Ecosystem for Real-time Big Data Overviews”
  • Alexander Rubin, Principal Consultant at Percona; “Advanced MySQL Query Tuning”

And while the call for papers deadline has expired, there are still sponsorship opportunities available for the world’s largest annual MySQL event. Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Current sponsors include:

  • Diamond Plus: VMware
  • Gold: Codership, Pythian
  • Silver: Box, SpringbokSQL, Yelp
  • Exhibit Only: FoundationDB, Severalnines, Tokutek, VividCortex
  • Other Sponsors: MailChimp
  • Media Sponsor: Database Trends & Applications , Datanami, InfoQ , Linux Journal, O’Reilly Media

Sneak peek at the Percona Live MySQL Conference & Expo 2015Percona Live 2015 will feature a variety of formal tracks and sessions related to High Availability, DevOps, Programming, Performance Optimization, Replication and Backup, MySQL in the Cloud, MySQL and NoSQL, MySQL Case Studies, Security, and What’s New in MySQL.

As usual the conference will be held in the heart of Silicon Valley at the Hyatt Regency Santa Clara and Santa Clara Convention Center. But this year Percona has also unveiled OpenStack Live 2015, a new conference that will run in parallel with Percona Live MySQL Conference & Expo 2015 on April 13 and 14.

And don’t forget, Super Saver registration discounts are available through Dec. 14 at 11:30 p.m. PST. I hope to see you in Santa Clara!

The post Sneak peek at the Percona Live MySQL Conference & Expo 2015 appeared first on MySQL Performance Blog.

Dec
02
2014
--

Tips from the trenches for over-extended MySQL DBAs

This post is a follow-up to my November 19 webinar, “Tips from the Trenches: A Guide to Preventing Downtime for the Over-Extended DBA,” during which I described some of the most common reasons DBAs experience avoidable downtime. The session was aimed at the “over-stretched DBA,” identified as the MySQL DBA short of time or an engineer of another discipline without the depth of the MySQL system. The over-stretched DBA may be prone to making fundamental mistakes that cause downtime through poor response time, operations that cause blocking on important data or administrative mishaps through the lack of best practice monitoring and alerting. (You can download my slides and view the recorded webinar here.)

Monitor the things
One of the aides to keeping the system up and running is ensuring that your finger is on the pulse of the environment. Here on the Percona Managed Services team, we leverage Percona Monitoring Plugins (open source plugins for Nagios, Cacti and Zabbix) to ensure we have visibility of our client’s operations. Having a handle on basics such as disk space, memory usage and MySQL operational metrics ensures that we avoid trivial downtime that would affect the client’s uptime or worse, their bottom line.

Road Blocks
One of the most common reasons that an application is unable to serve data to its end user is that access to a table is being blocked due to another ongoing operation. This can be blamed on a variety of sources: backups, schema changes, poor configuration and long running transactions can all lend themselves to costly blocking. Understanding the impact of actions on a MySQL server can be the difference between a happy end user and a frustrated one.

During the webinar I made reference to some resources and techniques that can assist the over extended DBA avoid downtime and here are some highlights….

Monitoring and Alerting
It’s important that you have some indications that something is reaching its capacity. It might be the disk, connections to MySQL or auto_increment limit on a highly used table. There is quite the landscape to cover but here are a handful of helpful tools:
* Percona Monitoring Plugins
* Monyog
* New Relic

Query Tuning
Poorly performing SQL can be indicative that the configuration is incorrect, that there’s a missing index or that your development team needs a quick lesson on MySQL anti-patterns. Arm yourself with proof that the SQL statements are substandard using these resources and work with the source to make things more efficient:
* Percona Cloud Tools
* pt-query-digest, explain, indexes

High Availability
If you need to ensure that your application survives hiccups such as hardware failure or network impairment, a well deployed HA solution will give you the peace of mind that you can quickly mitigate bumps in the road.
* MHA
Percona XtraDB Cluster, Galera
* Percona Replication Manager
* LinuxHA/Corosync/DRBD

Backups
A wise man once quoted “A backup today saves you tomorrow.” Covering all bases can be the difference between recovering from a catastrophic failure and job hunting. Mixing logical, physical and incremental backups while adding in some offsite copies can provide you with the safety net in the event that a small mistake like a dropped table is met or worse, all working copies of data and backups are lost in a SAN failure. It happens so be prepared.
* Percona XtraBackup
* mydumper
* mysqldump
* mysqlbinlog (5.6)
* mylvmbackup

We had some great questions from the attendees and regrettably were unable to answer them all, so here are some of them with my response.

Tips from the trenches for over-extended MySQL DBAsQ: I use MySQL on Amazon RDS. Isn’t much of the operations automated or do these tips still apply?
A: It’s not completely automated. There are still challenges to address and configuration opportunities, but understanding the limitations of RDS is key. For example, the location and size of the tmpdir is something you are unable to customise on RDS. You would typically review this config in a production environment if your workload required it. Any costly queries that perform operations requiring tmp area to sort (think OLAP) might not be a good fit on RDS due to this limitation. Getting to know the limitations around hosted or DBaaS services is time well spent to avoid explaining what keeps taking the application down in peak hours.

Q: What other parts of Percona Toolkit do you recommend for MySQL operations?
A: Percona Toolkit is a well-evolved suite of tools that all MySQL DBAs should familiarize themselves with. In particular I will fit many tools into my weekly workflow:

Operations

  • pt-online-schema-change
  • pt-table-checksum
  • pt-table-sync

Troubleshooting

  • pt-stalk
  • pt-pmp
  • pt-config-diff

Knowledge Gathering

  • pt-summary
  • pt-mysql-summary
  • pt-duplicate -key-checker

The key with Percona Toolkit is that many common tasks or problems that could cause you to reinvent the wheel are covered, mature and production ready. As with any tool, you should always read the label or in this case the documentation so you’re well aware what the tools can do, the risks and the features that you can make use of.

Q: HA – are there any solutions that you would stay away from?
A: Using any particular HA solution is going to be another R&D exercise. You will need to understand the tradeoffs, configuration options and compare between products. Some might have a higher TCO or lack functionality. Once the chosen solution is implemented it’s pertinent that the engineers understand the technology to be able to troubleshoot or utilize the functionality in the situation where failover needs to be instigated. I like HA solutions to be fast to failover to and some entail starting MySQL from cold.

Q: You mentioned having tested backups. How do you perform this?
A: Percona’s method is using a dedicated host with access to the backup files. Then with a combination of mysqlsandbox and pt-table-checksum we can discover if we trust the files we capture for disaster recovery. Many people underestimate the importance of this task.

Q: Percona Cloud Tools – how much does it cost?
A: Right now it’s a free service. Visit cloud.percona.com for more information, but in a nutshell Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses.

Q: Is there API access to Percona Cloud Tools for application integration?
A: There is currently not a public API available. It is on the roadmap, though. We’d be interested to hear more about your use case so please sign up for the service and try it out. After signing in, all pages include a Feedback link to share your thoughts and ideas such as how you’d like to use a public API.

Q: Can you use MHA with Percona XtraDB Cluster?
A: MHA is not something that can be used with Percona XtraDB Cluster (PXC). It’s common to partner PXC with HAProxy for making sure your writes are going to the appropriate node.

Q: Can MHA make automatic failover? If MHA has automatic failover, what do you recommend? Configure it for automatic failover?
A: MHA can make an automatic failover. Personally I prefer managed failover. When working with automated failover it’s important that failback is manual to avoid “flapping.” “Splitbrain” is an ailment that you don’t want to suffer from as well and auto failover removes the human judgment from the decision to relocate operations from a failed node onto a standby node. If you are going to vote for an automatic failover it is advised to test all potential failure scenarios and to employ a STONITH method to really ensure that the unresponsive node is not serving read/write traffic.

Q: What is the best way to detect database blocking from DML statements? Is there a tool that will show blocking after the fact so you don’t have to catch it real-time?
A: Once again, Percona has a tool called pt-deadlock-logger that can detect and log deadlocks. Detecting locking can be achieved using “SHOW ENGINE INNODB STATUS” or utilizing the information_schema.innodb_locks table. Some engineering might be required for this to be logged but those resources exist for use.

Q: Since you mentioned tinkering with ELK I was wondering if you had any tips on good Kibana dashboards to build to monitor MySQL databases/clusters?
A: ELK is something that I’m looking to publish some information on soon so watch this space!

Thanks again everyone for the great questions! And as a reminder, you can download my slides and view the recorded webinar here.

The post Tips from the trenches for over-extended MySQL DBAs appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com