Feb
23
2018
--

This Week in Data with Colin Charles 29: Percona Live Full Schedule, MariaDB Events, and a Matter of Compatibility

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

I think the biggest news from Percona-land is that besides the tutorial grid, the schedule for day 1 and day 2 are live! Also notice the many “sub-themes”: a 101 track, using MySQL, MongoDB, cloud, MySQL database software, PostgreSQL, containers & automation, monitoring & ops, and misc. database stuff. Learn from 107 different experts (this number is likely to increase). So register now.

This coming week, Peter Zaitsev, Tom Basil, and I will be in New York. Some of us will be at the MariaDB 2018 Developers Unconference, and all of us will be attending M|18. We have a schedule for the Developers Unconference, and I hope you find time on Sunday to join us as I present MySQL features missing in MariaDB  between 12:15-13:00. Being an unconference, it shouldn’t just be a presentation, but also active discussion. I recall during the FOSDEM MySQL DevRoom, MariaDB Foundation developer Vicentiu Ciorbaru assigned to himself support for the super readonly feature (see tweet).

If you have thoughts of what you like in MySQL but are missing from MariaDB Server, please don’t hesitate to tweet at me @bytebot, or even drop me an email: colin.charles@percona.com. I will happily change and add to the slides until Sunday morning, Eastern Standard Time.

Why is this important? Quite simply, take a look at Todd Farmer’s blog post: Bitten by MariaDB 10.2 Incompatible Change. Here’s Cloudera Manager failing, on specific minor versions of software since the behavior changed (so this particular issue occurs in 10.2.8+ but not before!). I’d definitely spend some time reading the comments as well as the associated Jira. Maybe with 10.3/10.4, it’s time to stop calling it a “drop-in replacement” (an initial goal when I worked on MariaDB Server), and just call it something else. Maybe something for the new Chief Marketing Officer to think about?

Releases

Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018
  • FOSSASIA 2018 – Singapore – March 22-25 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Feb
22
2018
--

Percona Live 2018 Featured Talk – Scaling a High-Traffic Database: Moving Tables Across Clusters with Bryana Knight

Percona Live 2018 Featured Talk

Percona Live 2018 Featured TalkWelcome to the first interview blog for the upcoming Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Bryana Knight, Platform Engineer at GitHub. Her talk is titled Scaling a High-Traffic Database: Moving Tables Across Clusters. Facing an immediate need to distribute load, GitHub came up with creative ways to move a significant amount of traffic off of their main MySQL cluster – with no user impact. In our conversation, we discussed how Bryana and GitHub solved some of these issues:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Bryana: I started at GitHub as a full-stack engineer working on a new business offering, and was then shortly offered the opportunity to transition to the database services team. Our priorities back then included reviewing every single database migration for GItHub.com. Having spent my whole career as a full-stack engineer, I had to level-up pretty quickly on MySQL, data modeling, data access patterns – basically everything databases. I spent the first few months learning our schema and setup through lots of reading, mentorship from other members of my team, reviewing migrations for most of our tables, and asking a million questions.

Originally, my team spent a lot of time addressing immediate performance concerns. Then we started partnering with product engineering teams to build out the backends for new features. Now we are focused on the longterm scalability and availability of our database, stemming from how we access it. I work right between our DBA’s and our product and API engineers.

Percona: Your talk is titled “Scaling a High-Traffic Database: Moving Tables Across Clusters”. What were the challenges GitHub faced that required redistributing your tables?

Bryana GitHubBryana: This biggest part of the GitHub codebase is an 8-year-old monolith. As a company, we’ve been fortunate enough to see a huge amount of user growth since the company started. User growth means data growth. The schema and setup that worked for GitHub early on, and very much allowed GitHub to get to where it is today with tons of features and an extremely robust API, is not necessarily the right schema and setup for the size GitHub is today. 

We were seeing that higher than “normal” load was starting to have a more noticeable effect. The monolith aspect of our database, organic growth, plus inefficiencies in our code base were putting a lot of pressure on the master of our primary database cluster, which held our most core tables (think users, repos, permissions). From the database perspective, this meant contention, locking, and replica lag. From the user’s perspective, this meant anything from longer page loads to delays in UI updates and notifications, to timeouts. 

Percona: What were some of the other options you looked at (if any)?

Bryana: Moving tables out of our main cluster was not the only action we took to alleviate some of the pressure in our database. However, it was the highest impact change we could make in the medium-term to give us the breathing room we needed and improve performance and availability. We also prioritized efforts around moving more reads to replicas and off the master, throttling more writes where possible, index improvements and query optimizations. Moving these tables gave us the opportunity to start thinking more long-term about how we can store and access our data differently to allow us to scale horizontally while maintaining our healthy pace of feature development.

Percona: What were the issues that needed to be worked out between the different teams you mention in your description? How did they impact the project?

Bryana: Moving tables out of our main database required collaboration between multiple teams. The team I’m on, database-services, was responsible for coming up with the strategy to move tables without user impact, writing the code to handle query isolation and routing, connection switching, backgrounding writes, and so on. Our database-infrastructure team determined where the tables we were moving should go (new cluster or existing), setup the clusters, and advised us on how to safely copy the data. In some cases, we were able to use MySQL replication. When that wasn’t possible, they weighed in on other options. 

We worked with production engineers to isolate data access to these tables and safely split JOINs with other tables. Everybody needed to be sure we weren’t affecting performance and user experience when doing this. We discussed with our support team the risk of what we were doing. Then we worked with them to determine if we should preemptively status yellow when there was a higher risk of user impact. During the actual cut-overs, representatives from all these groups would get on a war-room-like video call and “push the button”, and we always made sure to have a roll-out and roll-back plan. 

Percona: Why should people attend your talk? What do you hope people will take away from it?

Bryana: In terms of database performance, there are a lot of little things you can do immediately to try and make improvements: things like adding indexes, tweaking queries, and denormalizing data. There are also more drastic, architectural changes you can pursue, that many companies need to do when they get to certain scale. The topic of this talk is a valid strategy that fits between these two extremes. It relieved some ongoing performance problems and availability risk, while giving us some breathing room to think long term. I think other applications and databases might be in a similar situation and this could work for them. 

Percona: What are you looking forward to at Percona Live (besides your talk)?

This is actually the first time I’m attending a Percona Live conference. I’m hoping to learn from some of the talks around scaling a high traffic database and sharding. I’m also looking forward to seeing some talks from the wonderful folks on GitHub database-infrastructure team.

Want to find out more about this Percona Live 2018 featured talk, and Bryana and GitHub’s migration? Register for Percona Live 2018, and see her talk Scaling a High-Traffic Database: Moving Tables Across Clusters. Register now to get the best price!

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Feb
21
2018
--

Percona Live 2018 Open Source Database Conference Full Schedule Now Available

Percona Live 2018 Featured Talk

Percona Live 2018The conference session schedule for the seventh annual Percona Live 2018 Open Source Database Conference, taking place April 23-25 at the Santa Clara Convention Center in Santa Clara, CA is now live and available for review! Advance Registration Discounts can be purchased through March 4, 2018, 11:30 p.m. PST.

Percona Live Open Source Database Conference 2018 is the premier open source database event. With a theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to start learning how to use and operate open source databases.

Major areas of focus at the conference include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB Landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

Conference Session Schedule

Conference sessions take place April 24-25 and will feature 90+ in-depth talks by industry experts related to each of the key areas. Several sessions from Oracle and Percona will focus on how the new features and enhancements in the upcoming release of MySQL 8.0 will impact businesses. Conference session examples include:

Sponsorships

Sponsorship opportunities for Percona Live Open Source Database Conference 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Platinum – Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, SolarWinds, Timescale, TwinDB, Yelp
  • Other Sponsors – cPanel
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire, ODBMS.org, Packt

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference is held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Feb
09
2018
--

This Week in Data with Colin Charles 27: Percona Live Tutorials Released and a Comprehensive Review of the FOSDEM MySQL DevRoom

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 update: tutorials have been announced. The committee rated over 300+ talks, and easily 70% of the schedule should go live next week as well. In practice, then, you should see about 50 talks announced next week. There’s been great competition: we only have 70 slots in total, so about 1 in 5 talks get picked — talk about a competitive ratio.

FOSDEM

FOSDEM was truly awesome last week. From a Percona standpoint, we had a lot of excellent booth traffic (being outside of the PostgreSQL room on Saturday, and not too far out from the MySQL room on Sunday). We gave away bottle openers — useful in Brussels with all the beer; we tried a new design with a magnet to attach it to your fridge — stickers, some brochures, but most of all we had plenty of great conversations. There was quite a crowd from Percona, and it was excellent to see the MySQL & Friends DevRoom almost constantly full! A few of us brave souls managed to stay there the whole day, barely with any breaks, so as to enjoy all the talks.

I find the quality of talks to be extremely high. And when it comes to a community run event, with all content picked by an independent program committee, FOSDEM really sets the bar high. There is plenty of competition to get a good talk in, and I enjoyed everything we picked (yes, I was on the committee too). We’ve had plenty of events in the ecosystem that sort of had “MySQL” or related days, but FOSDEM might be the only one that has really survived. I understand we will have a day of some sort at SCALE16x, but even that has been scaled down. So if you care about the MySQL ecosystem, you will really want to ensure that you are at FOSDEM next year.

This year, we started with the usual MySQL Day on Friday. I could not be present, as I was at the CentOS Dojo, giving a presentation. So, the highlight of Friday for me? The community dinner. Over 80 people showed up, I know there was a waiting list, and lots of people were trying to get tickets at the last minute. Many missed out too; sorry, better luck next year; and also, hopefully, we will get a larger venue going forward. I really thank the organizers for this — we affectionately refer to them as the Belconians (i.e. a handful of Perconians based in Belgium). The conversation, the food, the drink —  they were all excellent. It’s good to see representation from all parts of the community: MySQL, Percona, MariaDB, Pythian, and others. So thank you again, Liz, Dimitri, Tom, and Kenny in absentia. I think Tjerk also deserves special mention for always helping (this year with the drinks)

As for FOSDEM itself, beyond the booth, I think the most interesting stuff was the talks. There are video recordings and slides of pretty much all talks, but I will also give you the “Cliff’s Notes” of them here.

MySQL DevRoom talk quick summaries

Beyond WHERE and GROUP BY – Sergei Golubchik

  • EXCEPT is in MariaDB Server 10.3
  • recursive CTEs are good for hierarchical data, graphs, data generation, Turing complete (you can use it to solve Sudoku even)
  • non-recursive CTEs can be an alternative syntax for subqueries in the FROM clause
  • Window functions:
    • Normal: one result per row, depend on that row only
    • Aggregate: one result per group, depending on the whole group
    • Window: one result per row, depending on the whole group
  • System versioned tables with AS OF
  • Aggregate stored functions

MySQL 8.0 Performance: InnoDB Re-Design – Dimitri Kravtchuk

  • Contention-Aware Transactions Scheduling (CATS), since 8.0.3. Not all transactions are equal, FIFO could not be optimal, unblock the most blocking transactions first
  • CATS (VATS) had a few issues, and there were bugs (they thought everything worked since MariaDB Server had implemented it). They spent about 9 months before fixing everything.
  • Where does CATS help? Workloads hitting row lock contentions. You can monitor via SHOW ENGINE INNODB MUTEX.
  • the main problem is because of repeatable read versus read committed transaction isolation on the same workload. You really need to understand your workload when it comes to VATS.

MySQL 8.0 Roles – Giuseppe Maxia

  • Created like a user, granted like privileges. You need to activate them to use them.
  • Before roles, you created a user, then grant, grant, and more grant’s… Add another user? Same deal. Lots of repetitive work and a lot of chances to make mistakes.
  • Faster user administration – define a role, assign it many times. Centralized grant handling – grant and revoke privileges to roles, add/edit all user profiles.
  • You need to remember to set the default role.
  • A user can have many roles; default role can be a list of roles.
  • Roles are users without a login – roles are saved in user tables. This is useful from an account lock/unlock perspective.
  • You can grant a user to a user
  • SET ROLE is for session management; SET DEFAULT ROLE is a permanent assignment of a role for a user. SET ROLE DEFAULT means assign the default role for this user for this session
  • The role_edges table reports which roles are assigned to which users. default_roles keeps track of the current default roles assigned to users. A default role may not exist.

Histogram support in MySQL 8.0 – Øystein Grøvlen

  • You can now do ANALYZE TABLE table UPDATE HISTOGRAM on column WITH n BUCKETS;
  • New storage engine API for sampling (default implementation is full table scan even when sampling)
  • Histogram is stored in a JSON column in the data dictionary. Grab this from the INFORMATION_SCHEMA.
  • Histograms are useful for columns that are not the first column of any index, and used in WHERE conditions of JOIN queries, queries with IN-subqueries, ORDER BY … LIMIT queries. Best fit: low cardinality columns (e.g. gender, orderStatus, dayOfWeek, enums), columns with uneven distribution (skew), stable distribution (do not change much over time)
  • How many buckets? equi-height, 100 buckets should be enough.
  • Histograms are stored in the data dictionary, so will persist over restarts of course.

Let’s talk database optimizers – Vicen?iu Ciorbaru

TLS for MySQL at Large Scale – Jaime Crespo

  • Literally took 3 lines in the my.cnf to turn on TLS
  • https://dbtree.wikimedia.org
  • They wanted to do a data centre failover and wanted to ensure replication would be encrypted.
  • They didn’t have proper orchestration in place (MySQL could have this too). Every time OpenSSL or MySQL had to be upgraded, the daemon needed restarting. If there was an incompatible change, you had to sync master/replicas too.
  • The automation and orchestration that Wikipedia uses: https://fosdem.org/2018/schedule/event/cumin_automation/ (it is called Cumin: https://wikitech.wikimedia.org/wiki/Cumin)
  • Server support was poor – OpenSSL – so they had to deploy wmf-mysql and wmf-mariadb of their own
  • Currently using MariaDB 10.0, and looking to migrate to MariaDB 10.1
  • Client library pain they’ve had
  • TLSv1.2 from the beginning (2015).
  • 20-50x slower for actual connecting; the impact is less than 5% for the actual query performance. Just fix client libraries, make them use persistent connections. They are now very interested in ProxySQL for this purpose.
  • https://grafana.wikimedia.org/?orgId=1
  • Monty asks, would a double certificate help? Jaime says sure. But he may not actually use double certificates; might not solve CA issues, and the goal is not to restart the server.
  • Monty wonders why not to upgrade to 10.2? “Let’s talk outside because it’s a much larger question.”

MySQL InnoDB Cluster – Miguel Araújo

  • group replication: update everywhere (multi-master), virtually synchronous replication, automatic server failover, distributed recovery, group reconfiguration, GCS (implementation of Paxos – group communication system). HA is a critical factor.
  • mysqlsh: interactive and batch operations. Document store (CRUD and relational access)
  • admin API in mysqlsh: create & manage clusters, hide complexity of configuration/provisioning/orchestration of the InnoDB clusters. Works with JavaScript and Python
  • Usability. HA out of the box.
  • It’s easy to join a new node; new node goes into recovery mode (and as long as you have all the binary logs, this is easy; otherwise start from a backup)
  • SET PERSIST – run a command remotely, and the configuration is persisted in the server
  • Network flapping? Group replication will just reject the node from the cluster if its flapping too often

Why we’re excited about MySQL 8 – Peter Zaitsev

  • Native data dictionary – atomic, crash safe, DDLs, no more MyISAM system table requirements
  • Fast INFORMATION_SCHEMA
  • utf8mb4 as default character set
  • Security: roles, breakdown of SUPER privileges, password history, faster cached-SHA2 authentication (default), builds using OpenSSL (like Percona Server), skip grants blocks remote connections, logs now encrypted when tablespace encryption enabled
  • Persistent AUTO_INCREMENT
  • auto-managed undo tablespaces – do not use system table space for undo space. Automatically reclaim space on disks.
  • Self-tuning, limited to InnoDB (innodb_dedicated_server to auto-tune)
  • partial in-place update for JSON – update filed in JSON object without full rewrite. Good for counters/statuses/timestamps. Update/removal of element is supported
  • Invisible indexes – test impact of dropping indexes before actually dropping them. Maintained but unused by the optimizer. If not needed or used, then drop away.
  • TmpTable Storage Engine – more efficient storage engine for internal temporary tables. Efficient storage for VARCHAR and VARBINARY columns. Good for GROUP BY queries. Doesn’t support BLOB/TEXT columns yet (this reverts to InnoDB temp table now)
  • Backup locks – prevent operations which may result in inconsistent backups. CHECK INSTANCE FOR BACKUP (something Percona Server has had before)
  • Optimizer histograms – detailed statistics on columns, not just indexes
  • improved cost model for the optimizer – www.unofficialmysqlguide.com
  • Performance schematic – faster (via “fake” indexes), error instrumentation, response time histograms (global & per query), digest summaries
  • select * from sys.session – fast potential replacement for show processlist
  • RESTART (command)
  • SET PERSIST – e.g. change the buffer pool size, and this helps during a restart
  • assumes default storage is SSD now
  • binary log on by default, log_slave_updates enabled by default, and log expires after 30 days by default
  • query cache removed. Look at ProxySQL or some other caching solution
  • native partitioning only – remove partitions from MyISAM or convert to InnoDB
  • resource groups – isolation and better performance (map queries to specific CPU cores; can jail your costly queries, like analytical queries)
  • Feature Requests: better single thread performance, no parallel query support

MySQL Test Framework for Support and Bugs Work – Sveta Smirnova

  • MTR allows you to add multiple connections
  • has commands for flow control

ProxySQL – GTID Consistent Reads – René Cannaò, Nick Vyzas

  • threshold is configurable in increments of 1 second. Replication lag can be monitored with ProxySQL. Want to ensure you don’t have stale reads.
  • Why is GTID important? To guarantee consistently. Auto positioning for restructuring topologies.
  • –session-track-gtids is an important feature which allows sending the GTID for a transaction on the OK packet for a transaction. Not available in MariaDB.
  • There is a ProxySQL Binlog Reader now – GTID information about a MySQL server to all connected ProxySQL instances. Lightweight process to run on your MySQL server.
  • ProxySQL can be configured to enforce GTID consistency for reads on any hostgroup/replication hostgroup.
  • Live demo by René

Turbocharging MySQL with Vitess – Sugu Sougoumarane

  • trend for the cloud: container instances, short-lived containers, tolerate neighbors, discoverability. No good tools yet for Kubernetes.
  • non-ideal options: application sharing, NoSQL, paid solutions, NewSQL (CockroachDB, TiDB, Yugabyte)
  • Vitess: leverage MySQL at massive scale, opensource, 8+ years of work, and multiple production examples
  • Square uses Vitess for Square Cash application.
  • Can MySQL run on Docker? Absolutely, many of the companies do huge QPS on Docker.
  • YouTube does a major re-shard every 2-3 months once. No one notices nowadays when that happens.
  • app server connects to vtgate, and only underneath it’s a bunch of smaller databases with vttablet + mysqld. The lockserver is what makes it run well in the cloud.
  • pluggable architecture with no compromise on performance: monitoring, health check, ACLs, tracing, more.
  • at most, it adds about 2ms overhead to connections
  • Go coding standards are enforced, unit tests with strict coverage requirements, end-to-end tests, Travis, CodeClimate and Netlify. Readability is king.
  • On February 5 2018, it will be a CNCF project. One year of due diligence. They said there was nothing to compare it with. Looked at maturity and contributors. It’s becoming a truly community-owned project! (CNCF to Host Vitess is already live as of now)
  • roadmap: full cross-shard queries, migration tools, simplify configurability, documentation.
  • full MySQL protocol, but a limited query set – they want to get it to a point where it accepts a full MySQL query.

Orchestrator on Raft – Shlomi Noach

  • Raft: guaranteed to be in-order replication log, an increasing index. This is how nodes choose a leader based on who has the higher index. Get periodic snapshots (node runs a full backup).
  • HashiCorp raft, a Golang raft implementation, used by Consul
  • orchestrator manages topology for HA topologies; also want orchestrator to be highly available. Now with orchestrator/raft, remove the MySQL backend dependency, and you can have data center fencing too. Now you get: better cross-DC deploys, DC-local KV control, and also Kubernetes friendly.
  • n-orchestrator nodes, each node still runs its own backend (either MySQL or SQLite). Orchestrator provides the communication for SQLite between the nodes. Only one (the Raft leader) will handle failovers
  • implementation & deployment @ Github – one node per DC (deployed at 3 different DCs). 1-second raft polling interval. 2 major DCs, one in the cloud. Step-down, raft-yield, SQLite-backed log store, and still a MySQL backend (SQLite backend use case is in the works)
  • They patched the HashiCorp raft library. The library doesn’t care about the identity of nodes, with Github they do want to control the identity of the leader. There is an “active” data center, and locality is important. This is what they mean by raft-yield (picking a candidate leader).
  • The ability for a leader to step down is also something they had to patch.
  • HashiCorp Raft only supports LMDB and another database, so the replication log is now kept in a relational SQLite backed log store. Another patch.
  • once orchestrator can’t run its own self-health check, it recognizes this. The application can tell raft now that it’s stepping down. Takes 5 seconds to step down, and raft then promotes another orchestrator node to be the leader. This is their patch.
  • can also grab leadership
  • DC fencing handles network partitioning.
  • orchestrator is Consul-aware. Upon failover, orchestrator updates Consul KV with the identity of the promoted master.
  • considerations to watch out for: what happens if, upon replay of the Raft log, you hit two failovers for the same cluster? NOW() and otherwise time-based assumptions. Reapplying snapshot/log upon startup
  • roadmap: use Kubernetes (cluster IP based configuration in progress, already container friendly via auto-re-provisioning of nodes via Raft)

MyRocks Roadmaps – Yoshinori Matsunobu

  • Facebook has a large User Database (UDB). Social graph, massively sharded, low latency, automated operations, pure flash storage (constrained by space, not CPU/IOPS)
  • They have a record cache in-front of MySQL – Tao for reads. If cache misses, then it hits the database. And all write requests go thru MySQL. UDB has to be fast to ensure a good user experience.
  • they also at Facebook run 2 instances of MySQL on the same machine, because CPU wasn’t huge, but the space savings were awesome.
  • design decisions: clustered index (same as InnoDB), slower for reads, faster for writes (bloom filters, column family), support for transactions including consistency between binlog and MyRocks. Faster data loading/deletes/replication, dynamic options (instead of having to restart mysqld), TTL (comparable to HBase TTL feature, specify the TTL, any data older than time, can be removed), online logical (for recovery purposes) & binary backup (for creating replicas)
  • Pros: smaller space, better cache hit rate, writes are faster so you get faster replication, much smaller bytes written
  • Cons: no statement based replication, GAP locks, foreign keys, full-text index, spatial index support. Need to use case sensitive collations for performance. Reads are slower, especially if the data fits in memory. Dependent on file system and OS; lack of solid direct I/O (uses buffered I/O). You need a newer than 4.6 kernel. Too many tuning options beyond buffer pool such as bloom filter, compactions, etc.
  • https://twitter.com/deniszh/status/960163082642382849
  • Completed InnoDB to MyRocks migration. Saved 50% space in UDB compared to compressed InnoDB.
  • Roadmaps: getting in MariaDB and Percona Server for MySQL. Read Mark’s blog for matching read performance vs InnoDB. Supporting mixed engines. Better replication and bigger instance sizes.
  • mixed engines: InnoDB and MyRocks on the same instance, though single transaction does not overlap engines. Plan to extend star backup to integrate `myrocks_hotbackup. Backport gtid_pos_auto_engines from MariaDB?
  • Removing engine log. Could be caused by binlog and engine log, which requires 2pc and ordered commits. Use one log? Either binlog or binlog like service or RocksDB WAL? Rely on binlog now (semi-sync, binlog consumers), need to determine how much performance is gained by stopping writing to WAL.
  • Parallel replication apply is important in MySQL 8
  • support bigger instance sizes: shared nothing database is not a general purpose database. Today you can get 256GB+ RAM and 10TB+ flash on commodity servers. Why not run one big instance and put everything there? Bigger instances may help general purpose small-mid applications. Then you don’t have to worry about sharing. Atomic transactions, joins and secondary keys will just work. Amazon Aurora today supports a 60TB instance!
  • today: you can start deploying slaves with consistency check. Many status counters for instance monitoring.

ProxySQL internals – René Cannaò

  • reduce latency, scales, maximize throughput. Single instance to travel hundreds of thousands of connections and to handle thousands of backend servers.
  • threading models: one thread per connection (blocking I/O), thread pooling (non-blocking I/O, scalable).
  • ProxySQL thread pool implementation: known as “MySQL threads”, fixed number of worker threads (configurable), all threads listen on the same port(s), client connections are not shared between threads, all threads perform their own network I/O, and it uses poll() (does that scale? True, but there is a reason why poll over epoll)
  • threads never share client connections – no need for synchronization, thread contention is reduced, each thread calls poll(). Possibly imbalanced load as a con (one thread that has way more connections that another). Is it really a problem? Most of the time, no, connections will automatically balance.
  • poll() is O(N), epoll() is O(1). Poll() is faster than epoll() for fewer connections (around 1000). Performance degrees when there are a lot of connections. So by default, it uses poll() instead of epoll(), around 50,000 connections performance degrades badly – so ProxySQL has auxiliary threads.
  • MySQL_Session() is implemented as a state machine. Stores metadata associated with the client session (running timers, default hostgroup, etc.)

MySQL Point-in-time recovery like a rockstar – Frederic Descamps

Releases

  • Percona Monitoring and Management 1.7.0 (PMM) – This release features improved support for external services, which enables a PMM Server to store and display metrics for any available Prometheus exporter. For example, you could deploy the postgres_exporter and use PMM’s external services feature to store PostgreSQL metrics in PMM. Immediately, you’ll see these new metrics in the Advanced Data Exploration dashboard. Then you could leverage many of the pre-developed PostgreSQL dashboards available on Grafana.com, and with a minimal amount of edits have a working PostgreSQL dashboard in PMM!
  • MariaDB Server 10.1.31 – usual updates to storage engines, and a handful of bug fixes.

Link List

Upcoming appearances

  • SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Feb
08
2018
--

Tutorial Schedule for Percona Live 2018 Is Live

Percona Live 2018

Percona Live 2018Percona has revealed the line-up of in-depth tutorials for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, Calif. Secure your spot now with Advanced Registration prices (available until March 4, 2018). Sponsorship opportunities for the conference are still available.

Percona Live 2018 Open Source Database Conference is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. 

Tutorials take place throughout the day on April 23, 2018. Tutorials provide practical, in-depth knowledge of critical open source database issues. Topics include:

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Sponsorships

Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Continuent, VividCortex
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – SolarWinds, TwinDB, Yelp
  • Media Sponsors – Datanami, EnterpriseTech, HPCWire, ODBMS.org
Feb
02
2018
--

This Week in Data with Colin Charles 26: Percona Live Schedule is Near Completion, FOSDEM Underway and a Percona Toolkit Use Case

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Santa Clara 2018 update: tutorials have been picked, and the schedule/press release should be announced by next week. We’ve (the committee) rated over 300+ talks, and easily 70% of the schedule should go live next week as well.

There’s a lot happening for FOSDEM this week — so expect a longer report of some sort next week.

A friend, Yanwei Zhou, DBA at Qunar in China, gave an excellent presentation in Chinese on how they use Percona Toolkit. Check it out:

Are you on Twitter? Hope you’re following the @planetmysql account.

Releases

Link List

Upcoming appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Jan
15
2018
--

Sneak Peek of the Percona Live 2018 Open Source Database Conference Breakout Sessions!

Percona Live 2018

Percona Live 2018Take a look at the sneak peek of the breakout sessions for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, California. Early Bird registration discounts are available until February 4, 2018, and sponsorship opportunities are still available.

Conference breakout sessions will feature a range of in-depth talks related to each of the key areas. Breakout session examples include:

  • Database Security as a Function: Scaling to Your Organization’s Needs – Laine Campbell, Fastly
  • How to Use JSON in MySQL Wrong – Bill Karwin, Square
  • Scaling a High Traffic Database: Moving Tables Across Clusters – Bryana Knight, GitHub
  • MySQL: How to Save Bandwidth – Georgi Kodinov, Oracle
  • MyRocks Roadmaps and Production Deployment at Facebook – Yoshinori Matsunobu, Facebook
  • Securing Your Data on PostgreSQL – Payal Singh, OmniTI Computer Consulting, Inc.
  • The Accidental DBA – Jenni Snyder, Yelp
  • How Microsoft Built MySQL, PostgreSQL and MariaDB for the Cloud – Jun Su, Microsoft
  • MongoDB Cluster Topology, Management and Optimization – Steven Wang, Tesla
  • Ghostferry: A Data Migration Tool for Incompatible Cloud Platforms – Shuhao Wu, Shopify, Inc.

Percona Live Open Source Database Conference 2018 is the premier open source database event. The theme for the upcoming conference is “Championing Open Source Databases,” with a range of topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations and Business/Case Studies. A major conference focus will be providing strategies to help attendees meet their business goals by deploying the right mix of database solutions to obtain the performance they need while managing complexity.

Hyatt Regency Santa Clara & The Santa Clara Convention Center

Percona Live 2018 Open Source Database Conference 2018 will be held at the Hyatt Regency Santa Clara & The Santa Clara Convention Center, at 5101 Great America Parkway Santa Clara, CA 95054.

The Hyatt Regency Santa Clara & The Santa Clara Convention Center is a prime location in the heart of the Silicon Valley. Enjoy this spacious venue with complimentary wifi, on-site expert staff and three great restaurants offering Tuscan cuisine, classic American or tantalizing Sushi. Staying for a couple of extra days? Take time to enjoy the Bay Area and enjoy a day in San Francisco located only an hour away. You can reserve a room by booking through the Hyatt’s dedicated Percona Live reservation site.

Book your hotel using Percona’s special room block rate!

Sponsorships

Sponsorship opportunities for Percona Live 2018 Open Source Database Conference 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

 

Dec
21
2017
--

This Week in Data with Colin Charles 20: cPanel changes strategy, Percona Live CFP extended

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

I think the biggest news from last week was from cPanel – if you haven’t already read the post, please do – on Being a Good Open Source Community Member: Why we hesitated on MySQL 5.7. cPanel anticipated MariaDB being the eventual replacement for MySQL, based on movements from Red Hat, Wikipedia and Google. The advantage focused on transparency around security disclosure, and the added features/improvements. Today though, “MySQL now consistently matches or outpaces MariaDB when it comes to development and releases, which in turn is increasing the demand on us for providing those upgraded versions of MySQL by our users.” And maybe a little more telling, “when MariaDB 10.2 became stable in May 2017 it included many features found in MySQL 5.7. However, MySQL reached stable nearly 18 months earlier in October 2015.” (emphasis mine).

So cPanel is going forth and supporting MySQL 5.7. They will continue supporting MariaDB Server for the foreseeable future. This really is cPanel ensuring they are responsive to users: “The people using and building database-driven applications are doing so with MySQL in mind, and are hesitant to add support for MariaDB. Responding to our community’s desires is one of the most important things to us, and this is something that we are hearing asked for from our community consistently.”

I, of course, think this is a great move. Users deserve choice. And MySQL has features that are sometimes still not included in MariaDB Server. Have you seen the Complete list of new features in MySQL 5.7? Or my high-level response to a MariaDB Corporation white paper?

I can only hope to see more people think pragmatically like cPanel. Ubuntu as a Linux distribution still does – you get MySQL 5.7 as a default (very unlike the upstream Debian which ships MariaDB Server nowadays). I used to be a proponent of MariaDB Server being everywhere, when it was community-developed, feature-enhanced, and backward-compatible. However, the moment it stopped being a branch and a true fork is the moment where trouble lies for users. I think it was still marginally fine with 10.0, and maybe even 10.1, but the ability to maintain feature parity with enhanced features has long gone. Short of a rebase? But then… what would be different to the already popular branch of MySQL called Percona Server for MySQL?

While there are wins and support from cloud vendors, like Amazon AWS RDS and Microsoft Azure, you’ll notice that they offer both MySQL and MariaDB Server. Google Cloud SQL notably only offers MySQL. IBM may be a sponsor of the MariaDB Foundation, but I don’t see their services like Compose offering anything other than MySQL (with group replication nonetheless!). Platinum member Alibaba Cloud offers MySQL and PostgreSQL. However, Tencent seems to suggest that MariaDB is coming soon? One interesting statistic to watch would be user uptake naturally.

Events

From an events standpoint, the Percona Live 2018 Call for Papers has been extended to January 12, 2018. We expect an early announcement of maybe ten talks in the week of  January 5. Please submit to the CFP. Have you got your tickets yet? Nab them during our Percona Live 2018 super saver registration when they are the best price!

FOSDEM has got Sveta and myself speaking in the MySQL and Friends DevRoom, but we also have good news in the sense that Peter Zaitsev is also going to be at FOSDEM – speaking in the main track. We’ll also have plenty of schwag at the stand.

I think it’s important to take note of the updates to Percona bug tracking: yes, its Jira all the way. Would be good for everyone to start also looking at how the sausage is made.

Dragph, a “distributed fast graph database“, just raised $3m and released 1.0. Have you used it?

On a lighter note, there seems to be a tweet going around by many, so I thought I’d share it here. Merry Christmas and Happy Holidays.

He’s making a database
He’s sorting it twice
SELECT * FROM girls_boys WHERE behaviour = “nice”
SQL Claus is coming to town!

Releases

Link List

Upcoming appearances

  • FOSDEM 2018 – Brussels, Belgium – February 3-4 2018
  • SCALE16x – Pasadena, California, USA – March 8-11 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Dec
20
2017
--

Percona Live 2018 Call for Papers Deadline Extended to January 12, 2018

Percona Live 2018 Call for Papers

Percona Live 2018 Call for PapersPercona is extending the Percona Live 2018 call for papers deadline to January 12, 2018!

Percona’s gift to you this holiday season is the gift of time – submit your speaking topics right up until January 12, 2018!

As the year winds up, we received many requests to extend the Percona Live Open Source Database Conference 2018 call for papers. Since many speakers wanted to submit during the week that they’re planning vacations (from Christmas until New Year’s Day), we realized that December 22 was too soon.

If you haven’t submitted already, please consider doing so. Speaking at Percona Live is a great way to talk about what you’re doing, build up your personal and company brands, and get collaborators to your project. If selected, all speakers receive a full complimentary conference pass.

Percona Live 2018 is the destination to share, learn and explore all pertinent topics related to open source databases. The theme for Percona Live 2018 is “Championing Open Source Databases,” with topics on MySQLMongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations, and Business/Case Studies.

Percona Live KeynotesRemember, just like last year, we aren’t looking for just MySQL-ecosystemrelated talks (that includes MariaDB Server and Percona Server for MySQL). We are actively looking for talks around MongoDB, as well as other open source databases (so this is where you can add PostgreSQL, time series databases, graph databases, etc.). That also involves complementary technologies, such as the increasing importance of the cloud and container solutions such as Kubernetes.

Talk about your journey to open source. Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Share your case studies, best practices and technical knowledge with an engaged audience of open source peers.

We are looking for breakout sessions (25 or 50 minutes long), tutorials (3 hours or 6 hours long), and lightning talks and birds of a feather sessions. Submit as many topics as you think you can deliver well.

The conference itself features one day of tutorials and two days of talks. There will also be exciting keynote talks. Don’t forget that registration is now open, and our Super Saver tickets are the best price you can get (Super Saver tickets are on sale until January 7, 2018).

If your company is interested in sponsoring the conference, please take a look at the sponsorship prospectus.

All in, submit away and remember the Percona Live 2018 call for papers deadline is January 12, 2018. We look forward to seeing you at the conference from April 23-25 2018 in Santa Clara.

Dec
01
2017
--

This Week in Data with Colin Charles 17: AWS Re:Invent, a New Book on MySQL Cluster and Another Call Out for Percona Live 2018

Colin Charles

Colin Charles Open Source Database evangelist for PerconaJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The CFP for Percona Live Santa Clara 2018 closes December 22, 2017: please consider submitting as soon as possible. We want to make an early announcement of talks, so we’ll definitely do a first pass even before the CFP date closes. Keep in mind the expanded view of what we are after: it’s more than just MySQL and MongoDB. And don’t forget that with one day less, there will be intense competition to fit all the content in.

A new book on MySQL Cluster is out: Pro MySQL NDB Cluster by Jesper Wisborg Krogh and Mikiya Okuno. At 690 pages, it is a weighty tome, and something I fully plan on reading, considering I haven’t played with NDBCLUSTER for quite some time.

Did you know that since MySQL 5.7.17, connection control plugins are included? They help DBAs introduce an increasing delay in server response to clients after a certain number of consecutive failed connection attempts. Read more at the connection control plugins.

While there are a tonne of announcements coming out from the Amazon re:Invent 2017 event, I highly recommend also reading Some data of interest as AWS reinvent 2017 ramps up by James Governor. Telemetry data from sumologic’s 1,500 largest customers suggest that NoSQL database usage has overtaken relational database workloads! Read The State of Modern Applications in the Cloud. Page 8 tells us that MySQL is the #1 database on AWS (I don’t see MariaDB Server being mentioned which is odd; did they lump it in together?), and MySQL, Redis & MongoDB account for 40% of database adoption on AWS. In other news, Andy Jassy also mentions that less than 1.5 months after hitting 40,000 database migrations, they’ve gone past 45,000 over the Thanksgiving holiday last week. Have you started using AWS Database Migration Service?

Releases

Link List

Upcoming appearances

  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be bigger and better in 2017, since it is now two days long!

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com