A startup that helps businesses determine when drones are flying unwantedly or otherwise into their airspace, Dedrone, has partnered with the electronics division of civil aircraft manufacturers Airbus to bring drone detection to wide open spaces and remote locations. Through their partnership, Dedrone will integrate Airbus’s long range radar technology into its systems which are… Read More
MariaDB 10.2 includes some long-awaited features. In this blog, we are going to discuss the improvements to some table definitions: the DEFAULT clause and the CHECK constraints. These clauses describe columns default values and rules for data validation.
Note that MariaDB 10.2 is still in alpha stage. This article describes the current state of these features, which could change before MariaDB 10.2 becomes GA.
The DEFAULT clause
The DEFAULT clause has always been supported in MariaDB/MySQL, but traditionally it only accepted literal values (like “hello world” or “2”). MariaDB 10.2 removes this limitation, so DEFAULT can now accept most SQL expressions. For example:
- fiscal_year SMALLINT DEFAULT (YEAR(NOW()))
- valid_until DATE DEFAULT (NOW() + INTERVAL 1 YEAR)
- owner VARCHAR(100) DEFAULT (USER())
Additionally, MariaDB 10.2 allows you to set a DEFAULT value for the TEXT and BLOB columns. This was not possible in previous versions. While this might look like a small detail, it can be hard to add a column to an existing table that is used by production applications, if it cannot have a default value.
The DEFAULT clause has some very reasonable limitations. For example, it cannot contain a subquery or a stored function. An apparently strange limitation is that we can mention another column in DEFAULT only if it comes first in the CREATE TABLE command.
Note that DEFAULT can make use of non-deterministic functions even if the binary log uses the STATEMENT format. In this case, default non-deterministic values will be logged in the ROW format.
CHECK constraints are SQL expressions that are checked when a row is inserted or updated. If this expression result is false (0, empty string, empty date) or NULL, the statement will fail with an error. The error message states which CHECK failed in a way that is quite easy to parse:
ERROR 4022 (23000): CONSTRAINT `consistent_dates` failed for `test`.`author`
Some example of CHECK constraints:
- CONSTRAINT non_empty_name CHECK (CHAR_LENGTH(name) > 0)
- CONSTRAINT consistent_dates CHECK (birth_date IS NULL OR death_date IS NULL OR birth_date < death_date)
- CONSTRAINT past_date CHECK (birth_date < NOW())
A possible trick is checking that a column is different from its default value. This forces users to assign values explicitly.
CHECK constraints cannot be added or altered. It is only possible to drop them. This is an important limitation for production servers.
Another limitation is that CHECK metadata are not accessible via the INFORMATION_SCHEMA. The only way to find out if a table has CHECK clauses is parsing the output of SHOW CREATE TABLE.
The exact behavior of CHECK constraints in a replication environment depends on the master binary log format. If it is STATEMENT, the slaves will apply CHECK constraints to events received from the master. If it is ROW, only the master will need to apply constraints, because failed statements will not be replicated.
Thus, in all cases, we recommend having identical constraints on master and slaves, and only using deterministic constraints.
While I didn’t run a professional benchmark, I can say that both DEFAULT and CHECK clauses don’t have a noticeable impact on a simple test where we insert one million rows (on my local machine).
However, these clauses evaluate an SQL expression each time a row is inserted or updated. The overhead is at least equal to the SQL expression performance. If high-performing writes are important, you will probably not want to use complex data validation.
To check how fast an expression is, we can use the BENCHMARK() function:
MariaDB [(none)]> SELECT BENCHMARK(10000000, (555 / 100 * 20)); +---------------------------------------+ | BENCHMARK(10000000, (555 / 100 * 20)) | +---------------------------------------+ | 0 | +---------------------------------------+ 1 row in set (1.36 sec) MariaDB [(none)]> SELECT BENCHMARK(100000000, MD5('hello world')); +------------------------------------------+ | BENCHMARK(100000000, MD5('hello world')) | +------------------------------------------+ | 0 | +------------------------------------------+ 1 row in set (14.84 sec)
In this example, we executed the specified expressions ten million times.
always returns 0, but what we want to check is the execution time. We can see for example that evaluating MD5(‘hello world’) takes less than 0.000002 seconds. In some cases, we may want to retry the same expressions with different parameters (longer strings, higher numbers, etc.) to check if the execution times varies.
Unfortunately, we don’t have a status variable which tells us how many times MariaDB evaluated CHECK clauses. If our workload performs many writes, that variable could help us to find out if CHECK constraints are slowing down inserts. Maybe the MariaDB team can take this as a suggestion for the future.
French-American big data startup Talend made its debut as a public company today with a pop. After pricing its shares at $18 last night (above the expected range of $15-$17), the company began trading on Nasdaq under the ticker TLND at $27.66, up 54 percent on its IPO price, giving the company an implied valuation of $537 million. However, in the immediate hour after opening, the… Read More
When Oracle bought NetSuite yesterday for a cool $9.3 billion in cash, it let the world know it was now totally serious about growing cloud computing revenue. That wasn’t always the case. In the early days of the cloud, Oracle was content to sit on the sidelines and make fun, while it raked in traditional license and maintenance revenue. In fact, CEO Larry Ellison famously ridiculed… Read More
A few days ago Uber published the article “Why Uber Engineering Switched from Postgres to MySQL”. I didn’t read the article right away because my inner nerd told me to do some home improvements instead. While doing so my mailbox was filling up with questions like “Is PostgreSQL really that lousy?”. Knowing that PostgreSQL is not generally lousy, these messages made me wonder what the heck is written in this article. This post is an attempt to make sense out of Uber’s article.
In my opinion Uber’s article basically says that they found MySQL to be a better fit for their environment as PostgreSQL. However, the article does a lousy job to transport this message. Instead of writing “PostgreSQL has some limitations for update-heavy use-cases” the article just says “Inefficient architecture for writes,” for example. In case you don’t have an update-heavy use-case, don’t worry about the problems described in Uber’s article.
In this post I’ll explain why I think Uber’s article must not be taken as general advice about the choice of databases, why MySQL might still be a good fit for Uber, and why success might cause more problems than just scaling the data store.
The first problem Uber’s article describes in great, yet incomplete detail is that PostgreSQL always needs to update all indexes on a table when updating rows in the table. MySQL with InnoDB, on the other hand, needs to update only those indexes that contain updated columns. The PostgreSQL approach causes more disk IOs for updates that change non-indexed columns (“Write Amplification” in the article). If this is such a big problem to Uber, these updates might be a big part of their overall workload.
However, there is a little bit more speculation possible based upon something that is not written in Uber’s article: The article doesn’t mention PostgreSQL Heap-Only-Tuples (HOT). From the PostgreSQL source, HOT is useful for the special case “where a tuple is repeatedly updated in ways that do not change its indexed columns.” In that case, PostgreSQL is able to do the update without touching any index if the new row-version can be stored in the same page as the previous version. The latter condition can be tuned using the fillfactor setting. Assuming Uber’s Engineering is aware of this means that HOT is no solution to their problem because the updates they run at high frequency affect at least one indexed column.
This assumption is also backed by the following sentence in the article: “if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes to reflect the
ctid for the new row”. It explicitly says “only covered by a single index” which is the edge case—just one index—otherwise PostgreSQL’s HOT would solve the problem.
[Side note: I’m genuinely curious whether the number of indexes they have could be reduced—index redesign in my challenge. However, it is perfectly possible that those indexes are used sparingly, yet important when they are used.]
It seems that they are running many updates that change at least one indexed column, but still relatively few indexed columns compared to the “dozen” indexes the table has. If this is a predominate use-case, the article’s argument to use MySQL over PostgreSQL makes sense.
There is one more statement about their use-case that caught my attention: the article explains that MySQL/InnoDB uses clustered indexes and also admits that “This design means that InnoDB is at a slight disadvantage to Postgres when doing a secondary key lookup, since two indexes must be searched with InnoDB compared to just one for Postgres.” I’ve previously written about this problem (“the clustered index penalty”) in context of SQL Server.
What caught my attention is that they describe the clustered index penalty as a “slight disadvantage”. In my opinion, it is a pretty big disadvantage if you run many queries that use secondary indexes. If it is only a slight disadvantage to them, it might suggest that those indexes are used rather seldom. That would mean, they are mostly searching by primary key (then there is no clustered index penalty to pay). Note that I wrote “searching” rather than “selecting”. The reason is that the clustered index penalty affects any statement that has a where clause—not just select. That also implies that the high frequency updates are mostly based on the primary key.
Finally there is another omission that tells me something about their queries: they don’t mention PostgreSQL’s limited ability to do index-only scans. Especially in an update-heavy database, the PostgreSQL implementation of index-only scans is pretty much useless. I’d even say this is the single issue that affects most of my clients. I’ve already blogged about this in 2011. In 2012, PostgreSQL 9.2 got limited support of index-only scans (works only for mostly static data). In 2014 I even raised one aspect of my concern at PgCon. However, Uber doesn’t complain about that. Select speed is not their problem. I guess query speed is generally solved by running the selects on the replicas (see below) and possibly limited by mostly doing primary key side.
By now, their use-case seems to be a better fit for a key/value store. And guess what: InnoDB is a pretty solid and popular key/value store. There are even packages that bundle InnoDB with some (very limited) SQL front-ends: MySQL and MariaDB are the most popular ones, I think. Excuse the sarcasm. But seriously: if you basically need a key/value store and occasionally want to run a simple SQL query, MySQL (or MariaDB) is a reasonable choice. I guess it is at least a better choice than any random NoSQL key/value store that just started offering an even more limited SQL-ish query language. Uber, on the other hand just builds their own thing (“Schemaless”) on top of InnoDB and MySQL.
On Index Rebalancing
One last note about how the article describes indexing: it uses the word “rebalancing” in context of B-tree indexes. It even links to a Wikipedia article on “Rebalancing after deletion.” Unfortunately, the Wikipedia article doesn’t generally apply to database indexes because the algorithm described on Wikipedia maintains the requirement that each node has to be at least half-full. To improve concurrency, PostgreSQL uses the Lehman, Yao variation of B-trees, which lifts this requirement and thus allows sparse indexes. As a side note, PostgreSQL still removes empty pages from the index (see slide 15 of “Indexing Internals”). However, this is really just a side issue.
What really worries me is this sentence: “An essential aspect of B-trees are that they must be periodically rebalanced, …” Here I’d like to clarify that this is not a periodic process one that runs every day. The index balance is maintained with every single index change (even worse, hmm?). But the article continues “…and these rebalancing operations can completely change the structure of the tree as sub-trees are moved to new on-disk locations.” If you now think that the “rebalancing” involves a lot of data moving, you misunderstood it.
The important operation in a B-tree is the node split. As you might guess, a node split takes place when a node cannot host a new entry that belongs into this node. To give you a ballpark figure, this might happen once for about 100 inserts. The node split allocates a new node, moves half of the entries to the new node and connects the new node to the previous, next and parent nodes. This is where Lehman, Yao save a lot of locking. In some cases, the new node cannot be added to the parent node straight away because the parent node doesn’t have enough space for the new child entry. In this case, the parent node is split and everything repeats.
In the worst case, the splitting bubbles up to the root node, which will then be split as well and a new root node will be put above it. Only in this case, a B-tree ever becomes deeper. Note that a root node split effectively shifts the whole tree down and therefore keeps the balance. However, this doesn’t involve a lot of data moving. In the worst case, it might touch three nodes on each level and the new root node. To be explicit: most real world indexes have no more than 5 levels. To be even more explicit: the worst case—root node split—might happen about five times for a billion inserts. On the other cases it will not need to go the whole tree up. After all, index maintenance is not “periodic”, not even very frequent, and is never completely changing the structure of the tree. At least not physically on disk.
On Physical Replication
That brings me to the next major concern the article raises about PostgreSQL: physical replication. The reason the article even touches the index “rebalancing” topic is that Uber once hit a PostgreSQL replication bug that caused data corruption on the downstream servers (the bug “only affected certain releases of Postgres 9.2 and has been fixed for a long time now”).
Because PostgreSQL 9.2 only offers physical replication in core, a replication bug “can cause large parts of the tree to become completely invalid.” To elaborate: if a node split is replicated incorrectly so that it doesn’t point to the right child nodes anymore, this sub-tree is invalid. This is absolutely true—like any other “if there is a bug, bad things happen” statement. You don’t need to change a lot of data to break a tree structure: a single bad pointer is enough.
The Uber article mentions other issues with physical replication: huge replication traffic—partly due to the write amplification caused by updates—and the downtime required to update to new PostgreSQL versions. While the first one makes sense to me, I really cannot comment on the second one (but there were some statements on the PostgreSQL-hackers mailing list).
Finally, the article also claims that “Postgres does not have true replica MVCC support.” Luckily the article links to the PostgreSQL documentation where this problem (and remediations) are explained. The problem is basically that the master doesn’t know what the replicas are doing and might thus delete data that is still required on a replica to complete a query.
According to the PostgreSQL documentation, there are two ways to cope with this issue: (1) delaying the application of the replication stream for a configurable timeout so the read transaction gets a chance to complete. If a query doesn’t finish in time, kill the query and continue applying the replication stream. (2) configure the replicas to send feedback to the master about the queries they are running so that the master does not vacuum row versions still needed by any slave. Uber’s article rules the first option out and doesn’t mention the second one at all. Instead the article blames the Uber developers.
To quote it in all its glory: “For instance, say a developer has some code that has to email a receipt to a user. Depending on how it’s written, the code may implicitly have a database transaction that’s held open until after the email finishes sending. While it’s always bad form to let your code hold open database transactions while performing unrelated blocking I/O, the reality is that most engineers are not database experts and may not always understand this problem, especially when using an ORM that obscures low-level details like open transactions.”
Unfortunately, I understand and even agree with this argument. Instead of “most engineers are not database experts” I’d even say that most developers have very little understanding of databases because every developer that touches SQL needs to know about transactions—not just database experts.
Giving SQL training to developers is my main business. I do it at companies of all sizes. If there is one thing I can say for sure is that the knowledge about SQL is ridiculously low. In context of the “open transaction” problem just mentioned I can conform that hardly any developer even knows that read only transactions are a real thing. Most developers just know that transactions can be used to back out writes. I’ve encountered this misunderstanding often enough that I’ve prepared slides to explain it and I just uploaded these slides for the curious reader.
This leads me to the last problem I’d like to write about: the more people a company hires, the closer their qualification will be to the average. To exaggerate, if you hire the whole planet, you’ll have the exact average. Hiring more people really just increases the sample size.
The two ways to beat the odds are: (1) Only hire the best. The difficult part with this approach is to wait if no above-average candidates are available; (2) Hire the average and train them on the job. This needs a pretty long warm-up period for the new staff and might also bind existing staff for the training. The problem with both approaches is that they take time. If you don’t have time—because your business is rapidly growing—you have to take the average, which doesn’t know a lot about databases (empirical data from 2014). In other words: for a rapidly growing company, technology is easier to change than people.
The success factor also affects the technology stack as requirements change over time. At an early stage, start-ups need out-of-the-box technology that is immediately available and flexible enough to be used for their business. SQL is a good choice here because it is actually flexible (you can query your data in any way) and it is easy to find people knowing SQL at least a little bit. Great, let’s get started! And for many—probably most—companies, the story ends here. Even if they become moderately successful and their business grows, they might still stay well within the limits of SQL databases forever. Not so for Uber.
A few lucky start-ups eventually outgrow SQL. By the time that happens, they have access to way more (virtually unlimited?) resources and then…something wonderful happens: They realize that they can solve many problems if they replace their general purpose database by a system they develop just for their very own use-case. This is the moment a new NoSQL database is born. At Uber, they call it Schemaless.
On Uber’s Choice of Databases
By now, I believe Uber did not replace PostgreSQL by MySQL as their article suggests. It seems that they actually replaced PostgreSQL by their tailor-made solution, which happens to be backed by MySQL/InnoDB (at the moment).
It seems that the article just explains why MySQL/InnoDB is a better backend for Schemaless than PostgreSQL. For those of you using Schemaless, take their advice! Unfortunately, the article doesn’t make this very clear because it doesn’t mention how their requirements changed with the introduction of Schemaless compared to 2013, when they migrated from MySQL to PostgreSQL.
Sadly, the only thing that sticks in the reader’s mind is that PostgreSQL is lousy.
Like prior versions, PMM is distributed through Docker Hub and is free to download. Full instructions for download and installation of the server and client are available in the documentation.
Notable changes to the tool include:
- Upgraded to Grafana 3.1.0.
- Upgraded to Prometheus 1.0.1.
- Set default metrics retention to 30 days.
- Eliminated port 9001. Now the container uses only one configurable port, 80 by default.
- Eliminated the need to specify ADDRESS variable when creating Docker container.
- Completely re-wrote pmm-admin with more functions.
- Added ability to stop all services using the new pmm-admin.
- Added support to name instances using the new pmm-admin.
- Query Analytics Application updates:
- Redesigned queries profile table
- Redesigned metrics table
- Redesigned instance settings page
- Added sparkline charts
- Added ability to show more than ten queries
- Various updates for MongoDB dashboards.
A demonstration of the tool has been set up at pmmdemo.percona.com.
We have also implemented forums for the discussion of PMM.
Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!
Some screen shots of the updates:
Note the new sparkline that shows the current load in context (so you know if the number is higher/normal/lower than normal), and the option to “Load next 10 queries” at the bottom of the listing.
Our admin tool was completely re-written with new functions:
Join Brent Compton, Kyle Bader and Yves Trudeau on August 2, 2016 at 10 am PDT (UTC-7) for a MySQL and Ceph webinar.
Many operators select OpenStack as their control plane of choice for providing both internal and external IT services. The OpenStack user survey repeatedly shows Ceph as the dominant backend for providing persistent storage volumes through OpenStack Cinder. When building applications and repatriating old workloads, developers are discovering the need to provide OpenStack infrastructure database services. Given MySQL’s ubiquity, and it’s reliance on persistent storage, it is of utmost importance to understand how to achieve the performance demanded by today’s applications. Databases like MySQL can be incredibly IO intensive, and Ceph offers a great opportunity to go beyond the limitations presented by a single scale-up system. Since Ceph provides a mutable object store with atomic operations, could MySQL store InnoDB pages directly in Ceph?
This talk reviews the general architecture of Ceph, and then discusses benchmark results from small to mid-size Ceph clusters. These benchmarks lead to the development of prescriptive guidance around tuning Ceph storage nodes (OSDs), the impact the amount of physical memory, and the presence of SSDs, high-speed networks or RAID controllers.
Director Storage Solution Architectures, Red Hat
Sr Solution Architect, Red Hat
Oracle will acquire NetSuite for about $9.3 billion, or $109 per share in an all-cash deal, the companies announced Thursday. Both Oracle and NetSuite’s cloud service offerings aimed at enterprise customers will continue to operate and “coexist in the marketplace forever,” according to a statement by Oracle CEO Mark Hurd.
Hurd called NetSuite and Oracle’s offerings… Read More
LeadGenius announced today that it has raised $10 million in Series B funding. Formerly known as MobileWorks, LeadGenius helps customers find likely sales leads and reach out to them with automated emails. The company says revenue increased 6x in the past 20 months, with customers including eBay, Box and Weebly. LeadGenius has now raised more than $21 million in funding. The new round was… Read More
We are excited to announce that the tutorial schedule for the Percona Live Europe Amsterdam Open Source Database Conference 2016 is up!
The Percona Live Europe Amsterdam Open Source Database Conference is the premier event for the diverse and active open source community, as well as businesses that develop and use open source software. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Tackling subjects such as analytics, architecture and design, security, operations, scalability and performance, Percona Live provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs.
Tackling subjects such as analytics, architecture and design, security, operations, scalability and performance, Percona Live Europe provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience – all to help you learn how to tackle your open source database challenges in a whole new way. These tutorials are a must for any data performance professional!
The Percona Live Europe Open Source Database Conference is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.
Click through to the tutorial link right now, look them over, and pick which sessions you want to attend. Discounted passes available below!
- InnoDB Architecture and Performance Optimization
Peter Zaitsev (Percona)
- Best Practices for MySQL High Availability
Colin Charles (MariaDB)
- ProxySQL Tutorial
René Cannaò (Self Employed), Derek Downey (Pythian), David Turner(Dropbox)
- MySQL Operations in Docker
Giuseppe Maxia (VMware)
- MySQL GTID Implementation, Maintenance and Best Practices
Mark Filipi (SurveyMonkey), Gillian Gunson (GitHub), Brian Cain (Dropbox)
- Become a MySQL DBA (Part 1 and 2)
Krzysztof Ksi??ek (Severalnines AB)
- MongoDB 101 (Part 1 and 2)
David Murphy (Percona)
- NoSQL Data Stores in Research and Practice
- MySQL Performance Schema in Action
Alexander Rubin (Percona), Sveta Smirnova (Percona)
- MySQL Group Replication in a Nutshell: Hands-on Tutorial
Frédéric Descamps (Oracle)
- MyRocks Deep Dive: Flash Optimized LSM Database for MySQL, and Its Use Case at Facebook
Yoshinori Matsunobu (Facebook)
- MySQL Schema Design in Practice
Jaime Crespo (Wikimedia Foundation)
- MySQL High Availability with Percona XtraDB 5.7
- The Complete MariaDB Tutorial
Colin Charles (MariaDB)
Early Bird Discounts
Just a reminder to everyone out there: our Early Bird discount rate for the Percona Live Europe Amsterdam Open Source Database Conference is only available ‘til August 8, 2016, 11:30 pm PST! This rate gets you all the excellent and amazing opportunities that Percona Live offers, at a very reasonable price!
Sponsor Percona Live
Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, MongoDB and open source database event. Sponsors become a part of a dynamic and growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.