Sep
07
2015
--

super_read_only and GTID replication

Percona Server 5.6.21+ and MySQL 5.7.8+ offer the super_read_only option that was first implemented in WebscaleSQL. Unlike read_only, this option prevents all users from running writes (even those with the SUPER privilege). Sure enough, this is a great feature, but what’s the relation with GTID? Read on!

TL;DR

Enabling super_read_only on all slaves when using GTID replication makes your topology far less sensitive to errant transactions. Failover is then easier and safer because creating errant transactions is much harder.

GTID replication is awesome…

For years, all MySQL DBAs in the world have been fighting with positioning when working with replication. Each time you move a slave from one master to another, you must be very careful to start replicating at the correct position. That was boring and error-prone.

GTID replication is a revolution because it allows auto-positioning: when you configure server B to replicate from A, both servers will automatically negociate which events should be sent by the master. Of course this assumes the master has all missing events in its binlogs. Otherwise the slave will complain that it can’t get all the events it needs and you will see an error 1236.

… but there’s a catch

Actually GTID replication has several issues, the main one in MySQL 5.6 being the inability to switch from position-based to GTID-based replication without downtime. This has been fixed since then fortunately.

The issue I was thinking of is errant transactions. Not familiar with this term? Let me clarify.

Say you have a slave (B) replicating from a master (A) using the traditional position-based replication. Now you want to create a new database. This is easy: just connect to B and run:

mysql> CREATE DATABASE new_db;

Ooops! You’ve just made a big mistake: instead of creating the table on the master, you’ve just created it on the slave. But the change is easy to undo: run DROP DATABASE on B, followed by CREATE DATABASE on A.

Nobody will ever known your mistake and next time you’ll be more careful.

However with GTID-replication, this is another story: when you run a write statement on B, you create an associated GTID. And this associated GTID will be recorded forever (even if the binlog containing the transaction is purged at some point).

Now you can still undo the transaction but there is no way to undo the GTID. What you’ve created is called an errant transaction.

This minor mistake can have catastrophic consequences: say that 6 months later, B is promoted as the new master. Because of auto-positioning, the errant transaction will be sent to all slaves. But it’s very likely that the corresponding binlog has been purged, so B will be unable to send the errant transaction. As a result replication will be broken everywhere. Not nice…

super_read_only can help

Enter super_read_only. If it is enabled on all slaves, the above scenario won’t happen because the write on B will trigger an error and no GTID will be created.

With super_read_only, tools that were not reliable with GTID replication become reliable enough to be used again. For instance, MHA supports failover in a GTID-based setup but it doesn’t check errant transactions when failing over, making it risky to use with GTID replication. super_read_only makes MHA attractive again with GTID.

However note that super_read_only can’t prevent all errant transactions. The setting is dynamic so if you have privileged access, you can still disable super_read_only, create an errant transaction and enable it back. But at least it should avoid errant transactions that are created by accident.

The post super_read_only and GTID replication appeared first on MySQL Performance Blog.

Oct
29
2014
--

Facebook MySQL database engineers ready for Percona Live London 2014

With 1.28 billion active users, Facebook MySQL database engineers are active and extremely valuable contributors to the global MySQL community. So naturally they are also active participants of Percona Live MySQL conferences! And next week’s Percona Live London 2014 (Nov. 3-4) is no exception. (Register now and use the promotional code “Facebook” to save £30!)

I spoke with Facebook database engineers Yoshinori “Yoshi” Matsunobu and Shlomo Priymak about their upcoming sessions along with what’s new at Facebook since our last conversation back in April.


Percona Live London 2014Tom: Yoshi, last year Facebook deployed MySQL 5.6 on all production environments – what have you and your team learned since doing that? And do you have a few best practices you could share? I realize you’ll be going into detail during your session in London (MySQL 5.6 and WebScaleSQL at Facebook), but maybe a few words on a couple of the bigger ones?

Yoshi: MySQL 5.6 has excellent replication enhancements to use in large-scale deployments. For example, crash safe slave makes it possible to recover without rebuilding a slave instance on server crash. This can greatly minimize slave downtime, especially if your database size is large. There are many other new features such as GTID, multi-threaded slave, streaming mysqlbinlog and we actively use them in production.

For InnoDB, Online DDL is a good example to ease operations. Many MySQL users are doing schema changes by switching masters. This can minimize downtime but requires operational efforts. Online DDL made things much easier.

Tom: Facebook is an active and extremely valuable part of the overall MySQL community and ecosystem – what are some of the key features and improvements you’ve contributed in the past year since moving to MySQL 5.6?

Yoshi: For InnoDB, I think online defragmentation and faster full table scan are the most valuable contributions from Facebook in 5.6. I have received very positive feedback about faster InnoDB full table scan (Logical ReadAhead). My colleague Rongrong will speak about something interesting regarding online defragmentation at Percona Live London. For Replication, we have done many optimizations to make GTID and MTS work without pain. Semi-Synchronous mysqlbinlog and backported Loss-Less semisync from MySQL 5.7 are very useful when you use Semi-Synchronous replication.

Tom: Shlomo, your sesson, “MySQL Automation at Facebook Scale,” will be of great interest to DBAs at large and growing organizations considering that Facebook has one of the world’s largest MySQL database clusters. What are the two or three most significant things that you’ve learned as a database engineer operating a cluster of this size? And has anything surprised you along the way (so far)?

Shlomo: This is a great question! We like to speak of “10x” at Facebook when thinking of scaling. For example, what would you do differently if the number of servers you had was 10x more than what it is? This type of mental exercise is surprisingly useful when working with systems at scale. If you, or any of the readers, try to extrapolate this about systems you manage, there will be things you’ll be imagining about how a system like this would be – and you won’t be too far from our reality in many aspects.

You’d imagine that we automate much of the single units of work, like master/slave failover, upgrades and schema changes. You’d suspect we have automated fault detection, self managing systems, good alarming and self remediation. You’d presume that if you’re used to running a command on 100 machines, you’ll now be running it on 1000. At least that’s what I thought to myself, so these are not the things that surprised me. There are a few fundamental shifts in one’s thinking when you get to these sizes, which I didn’t foresee.

The first one is that there is absolutely no such thing as “one-off.” If there is a server somewhere that hits a problem every three years, and you have 1000 servers, this will be happening daily! Take it to 10,000 servers, and you can see absolutely nothing is a “one-off”. We can’t write things off as “worst case, I’ll get an SMS.” Whatever it is, we have to chase it down and fix it. Not just that – to deploy a fix at scale can require writing fairly large amounts of code, a fix that could be deployed manually by a DBA in smaller environments.

The second one is adapting to constraints which are very pragmatic and tangible. If you’re on AWS, you’re pretty much isolated from things like worrying where your servers are physically located, when they go over their lifetime, and if the firmware on the switch in the rack needs to be upgraded. If you’re a small shop and have a few racks up in a co-lo, hardware maintenance is just not as frequent, but it becomes more painful as you grow.

At Facebook, we run our own datacenters! We need to work around interesting challenges, such as running datacenters that have highly variable compositions of server hardware. Since we have so many servers, something is always going on. Racks of servers need to be moved. Whole clusters need to be rebuilt or refreshed, to be made better, faster, stronger.  New datacenters are constructed, others decommissioned.

Tom: And this is where automation comes into the picture, right?

Shlomo: We have had to build a lot of automation to make these operations seamless, and we work closely with the Site Ops teams on the ground to coordinate these logistically complicated processes.

Another thing my team does in this space is planning capacity and hardware purchases. Since we build our own servers, the turnaround time between ordering and getting machines is quite long, so proper planning is paramount. Buy too much, and you’ve wasted millions of dollars. Buy too few servers, and there won’t be space for user growth and upcoming projects. The sheer scale makes these decisions more complicated and involved.

These things have actually made my job much more interesting, and I think I’d find it hard to adjust to a smaller environment.

Tom: Last April Facebook announced a move to the newly created WebScaleSQL. Yoshi, do you have an update on where WebScaleSQL is today? And I know it’s early, but has there been any impact on Facebook yet?

Yoshi: WebscaleSQL is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale. Collaboration is nothing new to the MySQL community. The intent is to make this collaboration more efficient.

We are based on the latest upstream (currently MySQL-5.6.21), and added many features. We added patches to improve InnoDB performance around compression LRU flushing, locking, NUMA Support, and doublewrite. We statically link Semi-Sync based on lessons learned at Facebook environments (plugin-lock caused hot mutex contentions). We have many upcoming features such as async clients.

We will continue to track the upstream branch that is the latest, production-ready release (currently MySQL 5.6). We are continuing to push the generally useful changes we have from all of the participants.  If you think you have something to contribute, get in touch!

Tom: I remember being surprised earlier this year when you told me there was usually just one MySQL Operations team member on call at any given time thanks to “robots.” How many robots did your team build and what do they do? Oh, and should rank-and-file DBAs around the world be worried about losing their day jobs to these robots? ;-)

Shlomo: Instead of becoming obsolete as some fear, our team is shifting its focus from smaller to larger problems, as we rise higher in the levels of abstraction. Our team has progressed with the requirements of the role. From being a team of DBAs that automate some of their work, we have become more like Production Engineers. We design, write and maintain MySQL/Facebook-specific automation that does our work for us.

While we build these software “robots” to do our work, we also have to maintain them. The job of the oncall is to fix these robots when they malfunction, and that can sometimes be difficult due to the size of our codebase.

In regards to employment concerns, I’d say our work has become more interesting, and the amount has increased. It definitely did not decrease, so if Facebook is indicative of other companies, jobs are not at risk just yet. Speaking of jobs – if what we’re doing sounds interesting, we’re hiring!

Oh, and as for details about these “robots” – that’s the topic of my talk next week in London. Come and hear me speak if you want to know more!

Tom: Yoshi, you also will host a session titled “Fast Master Failover without Data Loss.” I don’t want to give too much away, but how did you get failover to work at scale – across vast datacenters?

Yoshi: Master failure is a norm at Facebook, because of the large amount of servers. Without automation, it is not realistic for a limited number of people to manage. We have a very interesting infrastructure to automate failure handling at Facebook scale. To automate stuff, reliability is important. Unreliable automation makes engineers spend lots of time fixing things manually, and that increases downtime. It is also important to define what to automate and what we shouldn’t automate. Define failure scenarios and write good test cases and continuously integrate. There are multiple failure scenarios like the ones below and you’ll hear about each in detail at my session:

– mysqld crash
– mysqld stalls
– kernel panic and reboot
– error spikes caused by H/W failure
– error spikes caused by bad application logic
– rack switch down
– multiple rack switches down
– datacenter down

Tom: What other sessions, keynotes or events are you looking forward to at Percona Live London 2014? And are you guys planning on attending the MySQL Community Dinner?

Yoshi:MySQL 5.7: Performance and Scalability Benchmark(led by Oracle MySQL performance architect Dimitri Kravtchuk). And yes, we’re looking forward to meeting with people at MySQL Community Dinner!

Tom:  Thanks again Yoshi and Shlomo for taking the time to speak with me and I look forward to seeing you both in London next week!

Percona Live London 2014 MySQL Community DinnerAnd readers, I invite you to register now for Percona Live London using the promotional code “Facebook” to save £30. I also hope to see you at the MySQL Community Dinner next Monday (Nov. 3). Space is limited so be sure to reserve your spot now and join us aboard our private double-decker bus to the restaurant.

I’d also like to thank the Percona Live London 2014 Conference Committee for putting together a terrific event this year! The conference committee includes:

  • Dailymotion’s Cédric Peintre, conference chairman
  • Percona’s David Busby
  • MariaDB’s Colin Charles
  • ebay Classifieds Group’s Luis Motta Campos
  • Booking.com’s Nicolai Plum
  • Oracle’s Morgan Tocker
  • Spil Games’ Art van Scheppingen

The post Facebook MySQL database engineers ready for Percona Live London 2014 appeared first on MySQL Performance Blog.

Oct
21
2014
--

MySQL community set to meet at Percona Live London 2014

The countdown is on for Europe’s largest annual MySQL event, Percona Live London 2014. The two days of technical tutorials and sessions, November 3-4, will focus on the latest MySQL industry trends, news, best practices – and a look at what’s on the near- and long-term horizon within the global MySQL ecosystem.

Percona Live London 2014 will bring attendees up to date on key areas including MySQL 5.7, database security, database as a service (DBaaS), Hadoop and high availability (HA), disaster recovery, replication and backup, performance and scalability, WebScaleSQL and much, much more.

Team Oracle will be in London, led by Tomas Ulin, vice president of Oracle’s MySQL engineering team, who will explain why MySQL just keeps getting better with the latest news for the MySQL Database, MySQL Cluster, MySQL Workbench… and more. Oracle’s Luis Soares, principle software engineer, and Andrew Morgan, MySQL HA product management, will provide insight into what’s in the latest 5.7 development milestone release and also what’s going on in the labs… particularly around MySQL replication. Seize the opportunity to learn how to leverage MySQL 5.7 replication to grow your business from their session, “MySQL Replication: What’s New in 5.7 and Beyond.”

If anything keeps DBAs up at night it’s database security – especially with recent revelations of vulnerabilities like the POODLE SSLv3 security flaw (CVE-2014-3566) and “Bash Bug,” also known as Shellshock (CVE-2014-6271). Attendees will have the opportunity to talk face-to-face with database security expert David Busby of Percona, who will also lead a session titled, “Security it’s more than just your database you should worry about.”

The official Percona Live London 2014 t-shirt!

The official Percona Live London 2014 t-shirt!
(Click image for larger view)

Observe how to incorporate semi-synchronous replication to achieve failover – without data loss. Facebook’s Yoshinori Matsunobu and Santosh Banda will share how they did it at scale (across data centers) by extending MySQL internals along with some handy new self-made tools.

Meet the next-generation C connector for MySQL: libAttachSQL. It’s a new lightweight async C connector library for MySQL being developed from scratch by HP’s Advanced Technology Group. Andrew Hutchings, principal software engineer at Hewlett-Packard, will be on hand to share the latest on libAttachSQL.

Successful applications often become limited by MySQL performance. But tracking down and fixing those issues can be a huge drain on time and resources. Unless you think smart – spending time on what gives you the best return. Percona CEO Peter Zaitsev will explain how in his session, “Practical MySQL Performance Optimization.”

Percona Live London attendees will also learn from the real-life experiences of MySQL experts who share case studies. Shake hands with Art van Scheppingen, head of database engineering at Spil Games, who will explain how to serve out any page with an HA Sphinx environment.

Save yourself a quarter century by absorbing Tim Callaghan’s MySQL performance benchmarking tips, tricks and lessons learned. Tim, vice president of engineering at Tokutek, with share what he’s learned in the past 25 years maintaining the performance of database applications.

And of course there will be a MySQL community dinner! But be sure to register now for the dinner because space is limited – especially if you want to enjoy a ride to the restaurant on a vintage double-decker London bus (you do not need to attend the conference to join the dinner).

Register now for Percona Live London 2014 and save £30 with discount code “MPB30“. See you in London!

The post MySQL community set to meet at Percona Live London 2014 appeared first on MySQL Performance Blog.

May
22
2014
--

A technical WebScaleSQL review and comparison with Percona Server

A technical WebScaleSQL review and comparison with Percona ServerThe recent WebScaleSQL announcement has made quite a splash in the MySQL community over the last few weeks, and with a good reason. The collaboration between the major MySQL-at-scale users to develop a single code branch that addresses the needs of, well, web scale, is going to benefit the whole community. But I feel that the majority of community opinions and comments to date have been based on the announcement itself and the organizational matters only. What we have been missing is an actual look at the code. What actual new features and bug-fixes are there? Let’s take a look.

At the same time, as Percona is also a developer of an enhanced MySQL replacement database server, it’s natural to try to compare the two. So let’s try to do that as well, but an important caveat applies here. Both MySQL branches (a branch and an upstream-tracking fork would be more exact) are being developed with different goals and for different end users. WebScaleSQL is all things scale-performance: diagnostics, specific features-for a relatively narrow and highly proficient group of users. There are no binary releases, the code base is supposed to serve as a basis for further code branches, specific to each corporate contributor. On the other hand, Percona Server is a general-purpose server that is developed with broad input from Percona’s customers, professional services departments, and general community. Thus, it would be unfair to say that one of the branches should be considered better than the other just because a certain feature is missing or not reaching as far. The software serves different needs.

The rest of this post is an annotated list of WebScaleSQL-specific code commits: user-visible features, performance fixes, general fixes, and finally the stuff of interest to developers.

New features

  • Ability for clients to specify millisecond (as opposed to second in MySQL) read/write/connect timeouts. This patch also carries an internal cleanup to introduce a timeout data type to avoid second-milisecond unit conversion errors.
  • Super read-only when regular read-only is not enough, that is, when writes by SUPER users need to be prevented as well.

All of the above are absent from Percona Server but possible to merge if there is interest. I’d also note that this list is rather short at the moment with some obvious stuff missing, such as the user statistics patch. I’d expect this to change in near future.

Performance-related features and fixes

It is Web Scale, remember.

These two changes require understanding on the user’s part of what are the tradeoffs. They are missing in Percona Server, but, again, possible to merge if there is interest in them.

  • Fix for MySQL bug #72123. Helps if you have many row or table lock waits.
  • Prefix index query optimisation. Avoids a primary index read for rows that have the whole row data in the prefix index itself.

Again Percona Server does not carry these. Different from the previous ones, these should be safe for every single user and we could merge them without having to give any further thought to their merits. Oracle MySQL should do the same.

That is one general performance-related change that Percona Server has too.

InnoDB flushing performance fixes

These would belong to the previous section, but I’d like to highlight them separately. We spent a lot of effort to analyse and improve the 5.6 InnoDB flushing before the Percona Server 5.6 GA release and continue to do so in the point releases. The WebScaleSQL changes below show that we and they have discovered a lot of identical improvement areas independently, and provided different fixes for the same issues. For an overview of XtraDB 5.6 changes in this area, go here and here. Note that these changes are somewhat more extensive, especially for the high-concurrency cases.

  • Back port of 5.7 WL #7047 and a fix for MySQL bug #71411 (fixed in Percona Server too). WL #7047 reduces the buffer list scan complexity. We have identified the same issue but attempted to work around it with flushing heuristic tweaks.
  • Fix for MySQL bugs #70500 and #71988 to remove potential flushing instabilities. Both fixed in Percona Server.
  • Fix for MySQL bug #62534, enabling finer-grained setting of innodb_max_dirty_pages_pct and unbreaking it for value zero. Not fixed in Percona Server, but the Oracle fix should be coming in 5.6.19.
  • Fix for MySQL bug #70899, removing redundant flush on server startup, which should speed up crash recovery with large buffer pools. Not fixed in Percona Server. Oracle fix expected in 5.6.20.
  • Ability to specify idle system flushing rate. Absent in Percona Server. I believe it should be possible to get the same effect by tuning existing variables: setting innodb_io_capacity lower and innodb_io_capacity_max higher, but it needs experimenting before being able to tell for sure.

General fixes

Fixes for assorted MySQL bugs. None of them are present in Percona Server, they might be merged as needed. Our own list of assorted MySQL bugs we have fixed is here. I have omitted fixes for MySQL developer-specific bugs, these are listed in the next session.

  • Fix for MySQL bug #64751 – Make NO_UNSIGNED_SUBTRACTION SQL mode work for additions too, i.e. unifying the cases of “foo – 1″ and “foo + -1″.
  • Stop spawning one extra thread on server startup to work around a bug in glibc that was fixed in 2006. Interestingly I was not able to find any MySQL bug report for this. Anyone?
  • Preserve slave I/O thread connection settings if compression is used. Again, is there a MySQL bug report for this?
  • Fix for MySQL bug #64347 –  database option mix-up if lower_case_table_names = 0 and the database names differ only by case.

Build changes

Making MySQL play nicer with the system libs, and other assorted changes.

  • Static linking of semisync replication plugins, based on a MariaDB patch. This might have a performance angle to it – MySQL bug #70218?
  • Do not embed OpenSSL and zlib in the static libraries.
  • Fix building with system OpenSSL and zlib. Using system libraries whenever possible make packaging easier and more conformant to Linux distribution requirements. We have been working on this too.
  • Fix building with system libreadline (MySQL bug #63130, closed without fix). Likewise.

Developer changes

These are patches of interest to MySQL / WebScaleSQL developers and not immediately visible for end users. I’m omitting some things, such as testcase compatibility with various build options, testsuite timeout tweaks, and patches that integrate with tools used for project development: Jenkins, Phabricator, etc.

  • Switch to C++11 and C99, the newer C and C++ language versions. It’s a big change from development perspective and one that is possible to pull off only if the project does not need to support older systems and their compilers (or even the newest compilers on some platforms). This is precisely the kind of thing that is easiest to implement for WebScaleSQL than for everybody else. As for the benefits of the change, the project already makes use of C++11 memory model – see the next item.
  • An efficient atomic stat counter framework, using C++11. I wonder how its performance compares to that of get_sched_indexer_t, which is present in Oracle MySQL 5.6, but not used?
  • Making the Performance Schema MTR suite slightly more sane by not recording stuff that tests nothing and at the same time is prone to change. Performance Schema MTR bits are something I’m sure every single 5.6 branch developer has encountered. This particular commit fixes MySQL bug #68714. Fixed in Percona Server. This is useful if one configures the build to re-enable the Performance Schema.
  • More of the same. Half of that commit fixes MySQL bug #68635, which is fixed in Percona Server too but unfortunately was considered by upstream as not requiring any fixes.
  • Stabilising the MTR testsuite, SHOW PROCESSLIST bits. Is there a MySQL bug report for this?
  • Stabilising the MTR testsuit, missing ORDER BY in 5.6.17 bits. Likewise, is there a bug report for this one?
  • Fix for MTR breaking if there is a ‘@’ somewhere in the working directories. Jenkins CI likes to put ‘@’ there. Same question re. bug report?
  • Unbreak a bunch of tests in the parts suite. This looks to me like MySQL bug #69252, but it has been already fixed by Oracle. Is the WebScaleSQL fix still required?
  • Re-enable AIO if MTR –mem option is passed.
  • Stress tests in MTR.
  • Fix compilation with Bison 3, based on a MariaDB patch (MySQL bug #71250). Fixed in Percona Server.
  • Fix compilation warnings (more). A bug report?
  • Fix uninitialised variable use warnings as reported by AddressSanitizer. Is there a bug report?
  • Fix a potential out-of-bound access, found by AddressSanitizer. Is there a MySQL bug for this?
  • Fix CMake confusion of two different ways to ask for a debug build resulting in different builds (MySQL bug #70647, fixed in 5.7).

Notice that the last list is quite long, especially if compared to the list of user-visible features added to date. That makes perfect sense for the project at this stage: building a solid development foundation first so that the features can follow in good quality and reduced maintenance effort.  Add a whole bunch of performance fixes to make a big picture view for today: A solid foundation for further development; numerous performance fixes; a few general fixes and new features.

As for comparing to Percona Server, currently the biggest overlap is in the performance-InnoDB flushing-fixes. For the rest, we can merge from WebScaleSQL as necessary – if you think that a certain WebScaleSQL feature or a fix would benefit you, drop us a line to discuss the options. And of course we invite WebScaleSQL to take any our fixes or patches they would like.

The post A technical WebScaleSQL review and comparison with Percona Server appeared first on MySQL Performance Blog.

Apr
04
2014
--

Facebook’s Yoshinori Matsunobu on MySQL, WebScaleSQL & Percona Live

Facebook's Yoshinori Matsunobu

Facebook’s Yoshinori Matsunobu

I spoke with Facebook database engineer Yoshinori Matsunobu here at Percona Live 2014 today about a range of topics, including MySQL at Facebook, the company’s recent move to WebScaleSQL, new MySQL flash storage technologies – and why attending the Percona Live MySQL Conference and Expo each year is very important to him.

Facebook engineers are hosting several sessions at this year’s conference and all have been standing room only. That’s not too surprising considering that Facebook, the undisputed king of online social networks, has 1.23 billion monthly active users collectively contributing to an ocean of data-intensive tasks – making the company one of the world’s top MySQL users. And they are sharing what they’ve learned in doing so this week.

You can read my previous post where I interviewed five Facebook MySQL experts (Steaphan Greene, Evan Elias, Shlomo Priymak, Yoshinori Matsunobu and Mark Callaghan) and below is my short video interview with Yoshi, who will lead his third and final session of the conference today at 12:50 p.m. Pacific time titled, “Global Transaction ID at Facebook.”

The post Facebook’s Yoshinori Matsunobu on MySQL, WebScaleSQL & Percona Live appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com