May
13
2019
--

Solve Query Failures Regarding ONLY_FULL_GROUP_BY SQL Mode

Solve Query Failures SQL mode

“Hey, what’s going on with my applications? I installed a newer version of MySQL. I have queries that perfectly run with the older version and now I have a lot of errors.”

This is a question some customers have asked me after upgrading MySQL. In this article, we’ll see what one of the most frequent causes of this issue is, and how to solve it.

We are talking about this error:

ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause
and contains nonaggregated column 'test.web_log.user_id' which is not functionally
dependent on columns in GROUP BY clause; this is incompatible
with sql_mode=only_full_group_by

Have you ever seen it?

SQL_MODE

As the first thing let me introduce the concept of SQL_MODE.

MySQL can work using different SQL modes that affect the syntax of the queries and validation checks. Based on the configured value of the variable sql_mode, it means that a query can be valid and regularly executes or can receive a validation error and cannot be executed.

The oldest versions of MySQL got users accustomed to writing queries that were not semantically correct because it was designed to work in the “forgiving mode”. Users could write any kind of syntactically valid query regardless of SQL standard compliance or semantic rules. This was a bad habit that was corrected introducing the sql_mode to instruct MySQL to work in a more restrictive way for query validation.

Some users are not aware of this feature because the default value was not so restrictive. Starting from 5.7, the default value is more restrictive and this the reason why some users have problems with unexpected query failures after migration to 5.7 or 8.0.

The sql_mode variable can be set in the configuration file (/etc/my.cnf) or can be changed at runtime. The scope of the variable can be GLOBAL and SESSION, so it can change by the purpose of the mode for any single connection.

The sql_mode variable can have more values, separated by a comma, to control different behaviors. For example, you can instruct MySQL how to deal with dates with zeros as ‘0000-00-00’, to ensure the date be considered as valid or not. In the “forgiving mode” (or if sql_mode variable is empty) you can INSERT such a value without problems.

# set sql mode to "forgiving mode"
mysql> set session sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> create table t1( mydate date );
Query OK, 0 rows affected (0.05 sec)
mysql> insert into t1 values('0000-00-00');
Query OK, 1 row affected (0.00 sec)
mysql> select * from t1;
+------------+
| mydate     |
+------------+
| 0000-00-00 |
+------------+
1 row in set (0.00 sec)

But this is not the correct behavior as stated by the TRADITIONAL mode. As good programmers know, you have to validate dates into your source code in order to avoid to have incorrect data or incorrect results.

The following is how you can dynamically instruct MySQL to behave in the traditional mode to throw an error instead:

mysql> set session sql_mode='TRADITIONAL';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1 values('0000-00-00');
ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'mydate' at row 1

There are many other modes you can use. Covering all the modes is not the goal of the article, so please refer to the official documentation for more details and examples:

https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html

 

The ONLY_FULL_GROUP_BY issue

Let’s focus on the most frequent cause of errors when migrating to 5.7 or 8.0. As we said, 5.7 has a default SQL mode that is more restrictive than 5.6, and as such it’s for 8.0. This is true when you upgrade MySQL copying the old my.cnf file that doesn’t have a specific setting for the sql_mode variable. So, be aware.

Let’s create a sample table to store the clicks on the webpages of our site. We would like to log the page name and the id of the registered user.

mysql> create table web_log ( id int auto_increment primary key, page_url varchar(100), user_id int, ts timestamp);
Query OK, 0 rows affected (0.03 sec)
mysql> insert into web_log(page_url,user_id,ts) values('/index.html',1,'2019-04-17 12:21:32'),
    -> ('/index.html',2,'2019-04-17 12:21:35'),('/news.php',1,'2019-04-17 12:22:11'),('/store_offers.php',3,'2019-04-17 12:22:41'),
    -> ('/store_offers.php',2,'2019-04-17 12:23:04'),('/faq.html',1,'2019-04-17 12:23:22'),('/index.html',3,'2019-04-17 12:32:25'),
    -> ('/news.php',2,'2019-04-17 12:32:38');
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0
mysql> select * from web_log;
+----+--------------------+---------+---------------------+
| id | page_url           | user_id | ts                  |
+----+--------------------+---------+---------------------+
|  1 | /index.html        |       1 | 2019-04-17 12:21:32 |
|  2 | /index.html        |       2 | 2019-04-17 12:21:35 |
|  3 | /news.php          |       1 | 2019-04-17 12:22:11 |
|  4 | /store_offers.php  |       3 | 2019-04-17 12:22:41 |
|  5 | /store_offers.html |       2 | 2019-04-17 12:23:04 |
|  6 | /faq.html          |       1 | 2019-04-17 12:23:22 |
|  7 | /index.html        |       3 | 2019-04-17 12:32:25 |
|  8 | /news.php          |       2 | 2019-04-17 12:32:38 |
+----+--------------------+---------+---------------------+

 

Now we want to issue a query to calculate the most visited pages.

# let's turn the sql mode to "forgiving"
mysql> set session sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT page_url, user_id, COUNT(*) AS visits
    -> FROM web_log
    -> GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+---------+--------+
| page_url          | user_id | visits |
+-------------------+---------+--------+
| /index.html       |       1 |      3 |
| /news.php         |       1 |      2 |
| /store_offers.php |       3 |      2 |
| /faq.html         |       1 |      1 |
+-------------------+---------+--------+
4 rows in set (0.00 sec)

The query works, but it’s not really correct. It is easily understandable that page_url is the column of the grouping function, the value we are most interested in and we want to be unique for counting. Also, the visits column is good, as it’s the counter. But what about user_id? What does this column represent? We grouped on the page_url so the value returned for user_id is just one of the values in the group. In fact, it was not only user number1 to visit the index.html, but even users 2 and 3 visited the page. How can I consider that value? Is it the first visitor? Is it the last one?

We don’t know the right answer! We should consider the user_id column’s value as a random item of the group.

Anyway, the right answer is that the query is not semantically correct, because it has no meaning to return a value from a column that is not part of the grouping function. Then the query is expected to be invalid in the traditional sql.

Let’s test it.

mysql> SET SESSION sql_mode='ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT page_url, user_id, COUNT(*) AS visits
    -> FROM web_log
    -> GROUP BY page_url ORDER BY COUNT(*) DESC;
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause
and contains nonaggregated column 'test.web_log.user_id' which is not functionally
dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Now we have an error, as expected.

The SQL mode ONLY_FULL_GROUP_BY is part of the TRADITIONAL mode and it is enabled by default starting from 5.7.

A lot of customers had this kind of issue after migration to a recent version of MySQL.

Now we know what the cause of the issue is, but our applications are still not working. What possible solutions do we have to let the applications work again?

Solution 1 – rewrite the query

Since it’s not correct to select a column that is not part of the grouping, we can rewrite the query without those columns. Very simple.

mysql> SELECT page_url, COUNT(*) AS visits
    -> FROM web_log
    -> GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+--------+
| page_url          | visits |
+-------------------+--------+
| /index.html       |      3 |
| /news.php         |      2 |
| /store_offers.php |      2 |
| /faq.html         |      1 |
+-------------------+--------+

If you have a lot of queries affected by the problem, you have to potentially do a lot of work to retrieve and rewrite them. Or maybe the queries can be part of a legacy application you are not able or you don’t want to touch.

But this solution is the one that forces you to write correct queries and let your database configuration be restrictive in term of SQL validation.

 

Solution 2 – step back to the forgiving mode

You can change MySQL’s configuration and step back to the “forgiving” mode.

Or you can only drop the ONLY_FULL_GROUP_BY from the default. The default SQL mode in MySQL 5.7 includes these modes: ONLY_FULL_GROUP_BY, STRINCT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER

#set the complete "forgiving" mode
mysql> SET GLOBAL sql_mode='';
# alternatively you can set sql mode to the following
mysql> SET GLOBAL sql_mode='STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_ENGINE_SUBSTITUTION';

 

Solution 3 – use of aggregation functions

If your application absolutely needs to retrieve the user_id field for some valid reason, or it’s too complicated to change your source code,  you can rely on an aggregation function in order to avoid changing the sql mode configuration.

For example we can use MAX(), MIN() or even GROUP_CONCAT() aggregation functions.

mysql> SET SESSION sql_mode='ONLY_FULL_GROUP_BY';
mysql> SELECT page_url, MAX(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+--------------+--------+
| page_url          | MAX(user_id) | visits |
+-------------------+--------------+--------+
| /index.html       |            3 |      3 |
| /news.php         |            2 |      2 |
| /store_offers.php |            3 |      2 |
| /faq.html         |            1 |      1 |
+-------------------+--------------+--------+
mysql> SELECT page_url, GROUP_CONCAT(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+-----------------------+--------+
| page_url          | GROUP_CONCAT(user_id) | visits |
+-------------------+-----------------------+--------+
| /index.html       |                 1,2,3 |      3 |
| /news.php         |                   1,2 |      2 |
| /store_offers.php |                   3,2 |      2 |
| /faq.html         |                     1 |      1 |
+-------------------+-----------------------+--------+

 

MySQL provides even a specific function for solving the problem: ANY_VALUE().

mysql> SELECT page_url, ANY_VALUE(user_id), COUNT(*) AS visits FROM web_log GROUP BY page_url ORDER BY COUNT(*) DESC;
+-------------------+--------------------+--------+
| page_url          | ANY_VALUE(user_id) | visits |
+-------------------+--------------------+--------+
| /index.html       |                  1 |      3 |
| /news.php         |                  1 |      2 |
| /store_offers.php |                  3 |      2 |
| /faq.html         |                  1 |      1 |
+-------------------+--------------------+--------+

 

Conclusion

I personally prefer solution number 1 because it forces you to write SQL-92 compliant queries. Following the standards is often considered a best practice.

Solution 2 is good in case you cannot change your application code or if rewriting all the queries is really too complicated. The solution is very good to solve the issues in a matter of seconds, however, I strongly suggest to have a long term plan to rewrite the queries that are not SQL-92 compliant.

For more details: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Photo of sign by Ken Treloar on Unsplash

May
13
2019
--

Community Update for PL Open Source Database Conference 2019

Percona Live community dinner on the Lone Star River Boat

Percona Live 2019It’s not long now before we’ll be gathering in Austin TX for this year’s Percona Live event. It’s the only conference of its type dedicated entirely to open source databases – of all marques. From a community perspective, we have a few updates to share.

Community Dinner

We’re happy to say that PlanetScale have stepped up to the plate (!) and are sponsoring this year’s community dinner alongside Percona. This year we are taking to the water and consequently numbers are limited to 100 attendees. If you are keen to join us, be sure to not miss the boat.  ?

PlanetScale

The speakers’ perspective

For speakers who also enjoy writing, there’s an opportunity to write about forthcoming presentations on our Community Blog. There’s still time if you haven’t sent a piece and would like to do so, please get in touch. Here are the contributors so far, and there are more to come:

Even if you are not speaking at Percona Live you might be interested to know that there’s an opportunity to write for the Community Blog permanently available. You’ll find more information about that here.

Attending the conference

If you’re attending the conference this year – or would like to – we want the experience to be enjoyable, informative, and memorable – for all the right reasons! Please take a few minutes when you can to read the code of conduct and diversity statements on the conference website. If you would like to attend but need advice or assistance, there’s some information there that could help you decide. You’re welcome to contact the Community Team in confidence if you’d like to do that.

Percona Live community dinner on the Lone Star River Boat


Photo from Lone Star Riverboat, April 2018, and our venue for the Community Dinner

May
09
2019
--

Announcing Keynotes for Percona Live Open Source Database Performance Conference 2019!

Percona Live 2019

Percona Live 2019There’s less than a month to go so it’s time to announce the keynote addresses for Percona Live Open Source Database Performance Conference 2019! We’re excited to share our lineup of conference keynotes, featuring talks from Continuent, Vivid Cortex, Amazon Web Services, Facebook, PingCap and more!

The speakers will address the current status of key open source database projects MySQL®, PostgreSQL, MongoDB®, and MariaDB®. They’ll be sharing with you how organizations are shifting from single-use databases to a polyglot strategy, thereby avoiding vendor lock-in and enabling business growth.

$5 Percona Live Keynote Passes

We are once again offering $5 Percona Live keynote passes for the Percona Live Open Source Database Performance Conference 2019.

A keynote pass provides access to the keynote addresses, the expo hall and demo theater sessions. Here you will have the opportunity to meet with conference attendees, sponsors, and play retro arcade games! The first 100 people who register for an Expo and Keynote pass using the discount code “KEYNOTE5” will be able to register for just $5

Without further ado, here’s the keynote line-up for 2019!

Wednesday, May 29

The State of Open Source Databases

It has been an exciting year in the open source database industry, with more choice, more cloud, and key changes in the industry. In his keynote address, Peter Zaitsev of Percona will dive into the key developments over the last year, including the most important open source database software releases, the significance of cloud-native solutions in a multi-vendor, multi-cloud world, the new criticality of security challenges, the evolution of the open source software industry, and interesting results from a new open source database survey.

MyRocks in the Real World

In this keynote, Yoshinori will share interesting lessons learned from Facebook’s production deployment and operations of MyRocks and future MyRocks development roadmaps.

MariaDB 10.4, Where We Are Now

MariaDB Server has come a long way since it forked from MySQL in 2009. Nine releases later, it is now the default in most major Linux distributions and no longer a mere alternative to MySQL, but also to Oracle, through its compatibility mode with support for PL/SQL and other non-MySQL functionality. MariaDB Foundation has also matured since it was established in 2012. With board members from MariaDB Corporation, Visma, Booking.com, IBM, Automattic, Tencent, Alibaba and Microsoft, MariaDB Foundation has created a vibrant community of contributors, with the number of submitted pull requests being four times that of MySQL over their respective lifetimes. Join MariaDB Foundation Senior Team Lead Vicen?iu Ciorbaru for this keynote, where he compares the mission of MariaDB Foundation with its achievements and highlights the latest contributions and functionality in its freshest stable release, 10.4.

TiDB 3.0: What’s New and What’s Next

TiDB is a popular open source distributed NewSQL database. It speaks the MySQL protocol, and the majority of its syntax – so to your applications it looks like MySQL 5.7. Since its birth four years ago, TiDB has been used by more than 300 companies for core banking systems / High concurrency internet service / Real-time analysis. In this talk, I will introduce the current status in the TiDB community, the core features of TiDB 3.0, and roadmap to make TiDB a truly distributed HTAP database.

Open Source at Amazon

Customer obsession is one of the key leadership principles at Amazon. We innovate on customer’s behalf and focus on solving their problems. Over the years, customer usage and dependencies on open source technologies have been steadily increasing, and this is why we’ve long been committed to open source, and our pace of contributions to open source projects continues to accelerate. Attend this talk to learn about the motivation for open source at AWS and what drives our active participation in open-source communities.

Panel – Discussion – The changing Face of Open Source

Panel discussion with various companies to discuss the changing face of Open Source.

AWS Product Announcement

Something exciting is coming! Stop by to find out more!

MySQL Community Awards

Join us for the MySQL Community Awards presented by Emily Slocombe!

Thursday, May 30

State of the Dolphin

Where are we with MySQL 8.0 one year after GA? How was the journey and how has the Community impacted the road map? During this session come to discover our new release on MySQL allowing us to enhance performance, stability, and also innovation.

The Color of Open Source Money — Are some open source business models more acceptable than others?

The color of open source money is a curious case…

There are those organizations that make money from open source technologies, with or without making contributions back to the software itself. There also are commercial open source software vendors, which do add value to the underlying open source technology by enhancing the solution and providing support for it. And then there are commercial software vendors who extend the capability of the open source solution with proprietary solutions. And, last but not least, some cloud providers’ business model involves building SaaS solutions based on free open source technology.

Some of the above business models are deemed to be more acceptable by the open source community than others.

Now, one would think that the color of the money should be the same, but it is not. Cloud companies, especially, may be seen as blood-sucking parasites that abuse the “system” – particularly when not contributing back. Yet they seem to be making the most money from it. And while some cloud providers do feed part of that money back to open source consulting companies to enhance and support their solutions, they are not necessarily sharing those enhancements with the community. So does that make them more acceptable?

With this talk, I’m looking to examine some of the aspects of it a little closer,  and offer my perspective on the acceptable balance of commercial efforts to justify the investments needed for the sustainability of the project.

Bringing DevOps To The Database

Applying DevOps principles and practices to the database can make software development processes faster, better, and cheaper. Yet most companies do not include their databases in their DevOps initiatives because they face a variety of challenges to make it work.

Why is it hard to apply DevOps to databases? How can we get better at it? This talk addresses those questions (and more) by exploring the real-life stories of teams that successfully changed their entrenched culture, workflows, and tooling – and others who tried. Topics include:

  • What research shows about DevOps, databases, and company performance
  • Current and emerging trends in how we build/manage data tiers, and its implications
  • Elevation of the traditional dedicated DBA role and what’s happening as a result
  • Driving cultural change towards distributed database competency and responsibility
  • Why some teams succeed in this transformation, while others fail

Now you can apply DevOps principles to your database. Your work will be better for it and your business will benefit. This talk will show you how.

From respected by DBAs to loved by Application Developers

Ten years ago Postgres was a safe and stable database, but not at the same level of user friendly-ness or rich feature set as other databases. At some point that all started to change as it learned from MySQL and Mongo, and we’ll examine some of that journey of how it became respected by DBAs to beloved by application developers. We’ll close with a look at what the next five years may hold.

Don’t miss out, buy your tickets here!

Connect. Accelerate. Innovate.

With a lot of focus on the benefits of open source over proprietary models of software delivery, you surely can’t afford to miss this opportunity to connect with leading figures of the open source database world. On Tuesday, May 28 you can opt to accelerate your knowledge with our in-depth tutorials, or choose to attend our business track geared towards open source innovation and adoption.

With Tuesday and Wednesday offering sessions across 8 different tracks, there’s something for all levels of experience, addressing a range of business challenges. See the full schedule.

With thanks to our sponsors!

Diamond: Continuent, VividCortex
Platinum: Veritas, Amazon Web Services
Gold: EnterpriseDB
Silver: MySQL, Altinity, PingCAP, SmartStyle, Facebook, ScaleGrid, Vendita, Intel Optane, Galera Cluster, ProxySQL, Aiven, Pythian
Community: MariaDB Foundation, Free Software Foundation
Community Dinner: PlanetScale
Branding: Bloomberg Engineering, Yelp
Media Sponsors: Austin Technology Council

Apr
15
2019
--

A Truly Open Source Database Conference

companies represented by speakers at percona live 2019

Percona Live 2019Many of our regular attendees already know that the Percona Live Open Source Database Conference is not all about Percona software or, indeed, all about Percona. However, with moving to a new city—Austin, TX— we have realized that it’s something we ought to shout out loud and clear. Our conference really is technology agnostic! As long as submissions were related to open source databases then they were candidate for selection.

We have thirteen tracks at this year’s conference including a track entitled “Other Open Source Databases” which we are presenting alongside tracks dedicated to MySQL®, MariaDB®, MongoDB®, and PostgreSQL. And that’s not all. While most talks are technology-oriented, we also have tracks that are highly relevant if you are managing technology aspects of your business. For those still considering the impact of GDPR you’ll be able to hear talks about other issues relating to compliance and data security that you might well want to get to grips with. Or perhaps consider the talks oriented towards business and enterprise. Maybe you are looking to minimize your license costs by moving from proprietary to open source databases? In which case our migration track might be for you. There are five more tracks for you to discover… why not take a look?

We’d like to thank the volunteer conference committee again for their contributions in developing this fantastic, diverse, and intriguing program!

Companies represented by speakers at Percona Live 2019

Also, of course, not all of the talks are given by Percona speakers. As you can see from this graphic at least sixty companies are represented by speakers at the event, including some huge names not just in the open source space but in the tech space as a whole. Anyone heard of Facebook? Uber? Oracle? Walmart? MailChimp? Alibaba… I won’t list all sixty names, but you get the idea! In fact, both Facebook and Alibaba are sponsoring their own tracks at this year’s conference, alongside PingCap presenting a track dedicated to TiDB. Don’t miss out! Our advanced rate registration ends on Sunday April 21 after which the price moves to standard registration rate. Don’t delay…

Register Now

companies represented by speakers at percona live 2019

Sponsors

We base Percona Live events in major cities, use premium venues, and sponsor our own staff to speak…Percona Live is an expensive production and we heavily subsidize the tickets. We are eternally grateful to our sponsors who share the costs of keeping Percona Live special. Without their support it would be very difficult to host an event of this quality and scale.

Diamond sponsors

continuent

VividCortex

Platinum sponsors

Veritas Logo

AWS

Gold sponsors

EnterpriseDB

Silver Sponsors

mysql
altinity
PingCAP
SmartStyle
Alibaba
facebook

Branding sponsors

bloomberg

Media sponsors

Austin Technology Council

Thanks again to all of our sponsors, we appreciate your support!

Mar
14
2019
--

Percona’s Open Source Data Management Software Survey

PerconaSurvey

Click Here to Complete our New Survey!

Last year we informally surveyed the open source community and our conference attendees.
The results revealed that:

  • 48% of those in the cloud choose to self-manage their databases, but 52% were comfortable relying on the DBaaS offering of their cloud vendor.
  • 49% of people said “performance issues” when asked, “what keeps you up at night?”
  • The major decision influence for buying services was price, with 42% of respondents keen to make the most of their money.

We found this information so interesting that we wanted to find out more! As a result, we are pleased to announce the launch of our first annual Open Source Data Management Software Survey.

The final results will be 100% anonymous, and will be made freely available on Creative Commons.

How Will This Survey Help The Community?

Unlimited access to accurate market data is important. Millions of open source projects are in play, and most are dependent on databases. Accurate market data helps you track the popularity of different databases, as well as seeing how and where these databases are run. This helps us all build better software and take advantage of shifting trends.

Thousands of vendors are focused on helping SysAdmins, DBAs, and Developers get the most out of their database infrastructure. Insightful market data enables them to create better tools that meet current demands and grow the open source database market.

We want to assist companies who are still deciding what, how, and where to run their systems. This information will help them understand the industry direction and allow them to make an informed decision on the software and services they choose.

How Can You Help Make This Survey A Success?

Firstly, please share your insight into current trends and new developments in open source data management software.

Secondly, please share this survey with other people who work in the industry, and encourage them to contribute.

The more responses we receive, the more useful this will be to the whole open source community. If we missed anything, or you would like to ask other questions in future, let us know!

So tell us; who are the big fish, and which minnows are nibbling at their tails?! Is the cloud giving you altitude sickness, or are you flying high? What is the next big thing and is everyone on board, or is your company lagging behind?

Preliminary results will be presented at our annual Percona Live Conference in Austin, Texas (May 28-30, 2019) by our CEO, Peter Zaitsev and released to the open source community when finalized.

Click Here to Have Your Say!

Jan
10
2019
--

PostgreSQL Updatable Views: Performing Schema Updates With Minimal Downtime

postgres updatable views

postgres updatable viewsRecently, one of our customers asked us how to minimize downtime when upgrading the database structure with changes that are not backwards-compatible. It’s an interesting question and I would like to visit some alternatives here. I will use PostgreSQL for this series of posts and walk through updatable views, INSTEAD OF Triggers, and the Rule System. Later, we’ll discuss alternatives available for other databases like MySQL.

This first post will give an overview of the problem and also the first implementation of the solution in PostgreSQL using updatable Views.

The Motivation

Software is like a living organism and as such, they evolve. It’s not surprising that the database schemas also evolve, and this brings us a problem: how to minimize downtime when performing upgrades? Or even further, is it possible to upgrade them without activating maintenance mode thereby making the service unavailable for our customers?

Let’s say that we want to push out an update 2.0. It’s a major update, and in this update, there are application code changes and changes to the database such as altered tables, dropped columns, new tables and so on. Checking the changelog, we notice that most of the database changes are backwards-compatible but a few modified tables are not so we can’t just push out the new database changes without breaking some functionality in the existing codebase. To avoid triggering errors while we upgrade the database, we need to shutdown the application servers, update the database, update the codebase, and then get the servers back and running again. That means that we need an unwanted maintenance window!

As per our definition of the problem, we want to get to the point where we don’t have to use this maintenance window, a point where the old and new codebase could coexist for a period of time while we upgrade the system. One solution is to not make changes that the current codebase can’t handle, but, as you may have already assumed, it isn’t really an option when we are constantly trying to optimize and improve our databases. Another option, then, would be to use PostgreSQL updatable views.

Updatable Views

PostgreSQL has introduced automatically updatable views in 9.3. The documentation[1] says that simple views are automatically updatable and the system will allow INSERT, UPDATE or DELETE statements to be used on the view in the same way as on a regular table. A view is automatically updatable if it satisfies all of the following conditions:

  • The view must have exactly one entry in its FROM list, which must be a table or another updatable view.
  • The view definition must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses at the top level.
  • The view definition must not contain set operations (UNION, INTERSECT or EXCEPT) at the top level.
  • The view’s select list must not contain any aggregates, window functions, or set-returning functions.

Note that the idea is to give a simple mechanism that helps when using views, and if the view is automatically updatable the system will convert any INSERT, UPDATE or DELETE statement on the view into the corresponding statement on the underlying base table. This can also be used to increase the security granularity giving the power to define privilege that operates at the level. If using a WHERE clause in the view we can use the CHECK OPTION to prevent the user from being able to UPDATE or INSERT rows that are not in the scope of the view. For example, let’s say we have a view created to limit the user to view records from a specific country.  If the user changes the country of any record, those records would disappear from the view. The CHECK OPTION can help to prevent this from happening. I recommend reading the documentation for more information about how views work in PostgreSQL.

Implementation

Using updatable views makes the implementation as simple as creating views. For our example I will use the below table:

test=# CREATE TABLE t (id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, password VARCHAR(300) NOT NULL, date_created TIMESTAMP NOT NULL DEFAULT now());
CREATE TABLE
test=# INSERT INTO t(id, name, password) VALUES (1, 'user_1', 'pwd_1'), (2, 'user_2','pwd_2'),(3,'user_3','pwd_3'),(4,'user_4','pwd_4'),(5,'user_5','pwd_5');
INSERT 0 5
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)

We then changed the schema renaming the columns password to pwd, date_created to dt_created and added 2 more columns, pwd_salt and comment. The added columns are not a real problem because they can be either nullable or have a default value but the column name change is a problem. The changes are:

test=# create schema v_10;
CREATE SCHEMA
test=# CREATE VIEW v_10.t AS SELECT id, name, password AS password, date_created AS date_created FROM public.t;
CREATE VIEW
test=# ALTER TABLE public.t RENAME COLUMN password TO pwd;
ALTER TABLE
test=# ALTER TABLE public.t RENAME COLUMN date_created TO dt_created;
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN pwd_salt VARCHAR(100);
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN comment VARCHAR(500);
ALTER TABLE

To make sure our application will work properly we’ve defined that the tables will be in a specific main schema, in this example is the PUBLIC schema and the views will be in the versioned schemas. In this case, if we have a change in one specific version that needs a view guaranteeing backwards-compatibility, we just create the view inside the versioned schema and apply the changes to the table in the main schema. The application will always define the “search_path” as “versioned_schema,main_schema”, which is “v_10, public” in this example:

test=# SET search_path TO v_10, public;
SET
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)
test=# select * from public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the application still sees the old schema, but does this work? What if someone updates the password of ID #3? Let’s check:

test=# UPDATE t SET password = 'new_pwd_3' WHERE id = 3;
UPDATE 1
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+-----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455
(5 rows)
test=# SELECT * FROM public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-----------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the updatable view worked just like a charm! The new and old application codebase can coexist and work together while we roll up our upgrades. There are some restrictions, as explained in the documentation, like having only one table or view in the WHERE clause but for its simplicity, upgradable views do a great job. For more complex cases where we need to split/join tables? Well, we will discuss these in future articles and show how we can solve them with both TRIGGERS and the PostgreSQL Rule System.

References

[1] https://www.postgresql.org/docs/current/sql-createview.html


Photo by Egor Kamelev from Pexels

Dec
31
2018
--

Great things that happened with PostgreSQL in the Year 2018

PostgreSQL in 2018

In this blog post, we’ll look back at what’s been going on in the world of PostgreSQL in 2018.

Before we start talking about the good things that have happened in the PostgreSQL in  2018, we hope you had a wonderful year and we wish you a happy and prosperous 2019.

PostgreSQL has been a choice for those who are looking for a completely community-driven open source database that is feature-rich and extensible. We have seen tremendously great things happening in PostgreSQL for many years, with 2018 being a prime example. As you could see the following snippet from DB engine rankings, PostgreSQL has topped the chart for growth in popularity in the year 2018 compared to other databases.

PostgreSQL adoption growth has been increasing year over year, and 2018 has again been one such year as we can see.

Let’s start with a recap of some of the great PostgreSQL events, and look at what we should take away from 2018 in the PostgreSQL space.

PostgreSQL 11 Released

PostgreSQL 11 was a release that incorporated a lot of features offered in commercial database software governed by an enterprise license. For example, there are times when you are required to enforce the handling of embedded transactions inside a stored procedure in your application code. There are also times when you wish to partition a table with foreign keys or use hash partitioning. This used to require workarounds. The release of PostgreSQL 11 covers these scenarios.

There were many other add-ons as well, such as Just-In-Time compilation, improved query parallelism, partition elimination, etc. You can find out more in our blog post here, or the PostgreSQL 11 release notes (if you have not seen already). Special thanks to everyone involved in such a vibrant PostgreSQL release.

End of Life for PostgreSQL 9.3

9.3.25 was the last minor release that has happened for PostgreSQL 9.3 (on November 8, 2018). There will be no more minor releases supported by the community for 9.3. If you are still using PostgreSQL 9.3 (or a major earlier release than 9.3), it is the time to start planning to upgrade your database to take advantage of additional features and performance improvements.

Watch out for future Percona webinars (dates will be out soon) on PostgreSQL migrations and upgrades that will help handle situations such as downtime and other complexities involved in migrating your partitions built using table inheritance when you migrate from legacy PostgreSQL versions to the latest versions.

PostgreSQL Minor Releases

For minor PostgreSQL release, there was nothing new in what we saw this year compared to previous years. The PostgreSQL community aims for a minor version release for all the supported versions every quarter. However, we may see more minor releases due to critical bug fixes or security fixes. One of such release was done on March 3rd, 2018 for the CVE-2018-1058 security fix. This proves that you do not necessarily need to wait for specific release dates when a security vulnerability has been identified. You may see the fix released as a minor version as soon as the development, review and testing are completed for the fix.  

There have been five minor releases this year on the following dates.

Security Fixes in All the Supported PostgreSQL Releases This Year

The PostgreSQL Global Development Team and contributors handle security fixes very seriously. There have been several instances where we have received immediate responses after reporting a problem or a bug. Likewise, we have seen many security bug fixes as soon as they have been reported.

Following are a list of security fixes we have seen in the year 2018:

We thank all the Core team, Contributors, Hackers and Users involved in making it another great year for PostgreSQL and a huge WIN for the open source world.

If you would like to participate in sharing your PostgreSQL knowledge to a wider audience, or if you have got a great topic that you would like to talk about, please submit your proposal to one of the world’s biggest open source conferences: Percona Live Open Source Database Conference 2019 in Austin, Texas from May 28-30, 2019. The Call for Papers is open until Jan 20, 2019.

You can also submit blog articles to our Open Source Database Community Blog for publishing to a wider open source database audience.

Please subscribe to our blog posts to hear many more great things that are happening in the PostgreSQL space.

Dec
20
2018
--

Benchmark PostgreSQL With Linux HugePages

Benchmarking HugePages and PostgreSQL

Linux kernel provides a wide range of configuration options that can affect performance. It’s all about getting the right configuration for your application and workload. Just like any other database, PostgreSQL relies on the Linux kernel to be optimally configured. Poorly configured parameters can result in poor performance. Therefore, it is important that you benchmark database performance after each tuning session to avoid performance degradation. In one of my previous posts, Tune Linux Kernel Parameters For PostgreSQL Optimization, I described some of the most useful Linux kernel parameters and how those may help you improve database performance. Now I am going to share my benchmark results with you after configuring Linux Huge Page with different PostgreSQL workload. I have performed a comprehensive set of benchmarks for many different PostgreSQL load sizes and different number concurrent clients.

Benchmark Machine

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 11

Linux Kernel Settings

I have used default kernel settings without any optimization/tuning except for disabling Transparent HugePages. Transparent HugePages are by default enabled, and allocate a page size that may not be recommended for database usage. For databases generally, fixed sized HugePages are needed, which Transparent HugePages do not provide. Hence, disabling this feature and defaulting to classic HugePages is always recommended.

PostgreSQL Settings

I have used consistent PostgreSQL settings for all the benchmarks in order to record different PostgreSQL workloads with different settings of Linux HugePages. Here is the PostgreSQL setting used for all benchmarks:

shared_buffers = '64GB'
work_mem = '1GB'
random_page_cost = '1'
maintenance_work_mem = '2GB'
synchronous_commit = 'on'
seq_page_cost = '1'
max_wal_size = '100GB'
checkpoint_timeout = '10min'
synchronous_commit = 'on'
checkpoint_completion_target = '0.9'
autovacuum_vacuum_scale_factor = '0.4'
effective_cache_size = '200GB'
min_wal_size = '1GB'
wal_compression = 'ON'

Benchmark scheme

In the benchmark, the benchmark scheme plays an important role. All the benchmarks are run three times with thirty minutes duration for each run. I took the median value from these three benchmarks. The benchmarks were carried out using the PostgreSQL benchmarking tool pgbench.  pgbench works on scale factor, with one scale factor being approximately 16MB of workload. 

HugePages

Linux, by default, uses 4K memory pages along with HugePages. BSD has Super Pages, whereas Windows has Large Pages. PostgreSQL has support for HugePages (Linux) only. In cases where there is a high memory usage, smaller page sizes decrease performance. By setting up HugePages, you increase the dedicated memory for the application and therefore reduce the operational overhead that is incurred during allocation/swapping; i.e. you gain performance by using HugePages.

Here is the Hugepage setting when using Hugepage size of 1GB. You can always get this information from /proc.

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     100
HugePages_Free:       97
HugePages_Rsvd:       63
HugePages_Surp:        0
Hugepagesize:    1048576 kB

For more detail about HugePages please read my previous blog post.

https://www.percona.com/blog/2018/08/29/tune-linux-kernel-parameters-for-postgresql-optimization/

Generally, HugePages comes in sizes 2MB and 1GB, so it makes sense to use 1GB size instead of the much smaller 2MB size.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge
https://kerneltalks.com/services/what-is-huge-pages-in-linux/

Benchmark Results

This benchmark shows the overall impact of different sizes of HugePages. The first set of benchmarks was created with the default Linux 4K page size without enabling HugePages. Note that Transparent Hugepages were also disabled, and remained disabled throughout these benchmarks.

Then the second set of benchmarks was performed with 2MB HugePages. Finally, the third set of benchmarks is performed with HugePages set to 1GB in size.

All these benchmarks were executed with PostgreSQL version 11. The sets include a combination of different database sizes and clients. The graph below shows comparative performance results for these benchmarks with TPS (transactions per seconds) on the y-axis, and database size and the number of clients per database size on the x-axis.

 

Clearly, from the graph above, you can see that the performance gain with HugePages increases as the number of clients and the database size increases, as long as the size remains within the pre-allocated shared buffer.

This benchmark shows TPS versus clients. In this case, the database size is set to 48GB. On the y-axis, we have TPS and on the x-axis, we have the number of connected clients. The database size is small enough to fit in the shared buffer, which is set to 64GB.

With HugePages set to 1GB, the higher the number of clients, the higher the comparative performance gain.

The next graph is the same as the one above except for a database size of 96GB. This exceeds the shared buffer size, which is set to 64GB.

 

The key observation here is that the performance with 1GB HugePages improves as the number of clients increases and it eventually gives better performance than 2MB HugePages or the standard 4KB page size.

This benchmark shows the TPS versus database size. In this case, the number of connected clients it set to 32. On the y-axis, we have TPS and on the x-axis, we have database sizes.

As expected, when the database spills over the pre-allocated HugePages, the performance degrades significantly.

Summary

One of my key recommendations is that we must keep Transparent HugePages off. You will see the biggest performance gains when the database fits into the shared buffer with HugePages enabled. Deciding on the size of huge page to use requires a bit of trial and error, but this can potentially lead to a significant TPS gain where the database size is large but remains small enough to fit in the shared buffer.

Dec
19
2018
--

Using Partial and Sparse Indexes in MongoDB

MongoDb using partial sparse indexes

MongoDb using partial sparse indexesIn this article I’m going to talk about partial and sparse indexes in MongoDB® and Percona Server for MongoDB®. I’ll show you how to use them, and look at cases where they can be helpful. Prior to discussing these indexes in MongoDB in detail, though, let’s talk about an issue on a relational database like MySQL®.

The boolean issue in MySQL

Consider you have a very large table in MySQL with a boolean column. Typically you created a ENUM(‘T’,’F’) field to store the boolean information or a TINYINT column to store only 1s and 0s. This is good so far. But think now what you can do if you need to run a lot of queries on the table, with a condition on the boolean field, and no other relevant conditions on other indexed columns are used to filter the examined rows.

Why not create and index on the boolean field? Well, yes, you can, but in some cases this solution will be completely useless and will introduce an overhead for the index maintenance.

Think about if you have an even distribution of true and false values in the table, in more or less a 50:50 split. In this situation, the index on the boolean column cannot be used because MySQL will prefer to do a full scan of the large table instead of selecting half of rows using the BTREE entries. We can say that a boolean field like this one has a low cardinality, and it’s not highly selective.

Consider now the case in which you don’t have an even distribution of the values, let’s say 2% of the rows contain false and the remaining 98% contain true. In such a situation, a query to select the false values will most probably use the index. The queries to select the true values won’t use the index, for the same reason we have discussed previously. In this second case the index is very useful, but only for selecting the great minority of rows. The remaining 98% of the entries in the index are completely useless. This represents a great waste of disk space and resources, because the index must be maintained for each write.

It’s not just booleans that can have this problem in relation to index usage, but any field with a low cardinality.

Note: there are several workarounds to deal with this problem, I know. For example, you can create a multi-column index using a more selective field and the boolean. Or you could design your database differently. Here, I’m illustrating the nature of the problem in order to explain a MongoDB feature in a context. 

The boolean issue in MongoDB

How about MongoDB? Does MongoDB have the same problem?  The answer is: yes, MongoDB has the same problem. If you have a lot of documents in a collection with a boolean field or a low cardinality field, and you create an index on it, then you will have a very large index that’s not really useful. But more importantly you will have writes degradation for the index maintenance.

The only difference is that MongoDB will tend to use the index anyway, instead of doing the entire collection scan, but the execution time will be of the same magnitude as doing the COLLSCAN. In the case of very large indexes, a COLLSCAN should be preferable.

Fortunately MongoDB has an option that you can specify during index creation to define a Partial Index. Let’s see.

Partial Index

A partial index is an index that contains only a subset of values based on a filter rule. So, in the case of the unevenly distributed boolean field, we can create an index on it specifying that we want to consider only the false values. This way we avoid recording the remaining 98% of useless true entries. The index will be smaller, we’ll save disk and memory space, and the most frequent writes – when entering the true values – won’t initiate the index management activity. As a result, we won’t have lots of penalties during writes but we’ll have a useful index when searching the false values.

Let’s say that, when you have an uneven distribution, the most relevant searches are the ones for the minority of the values. This is in general the scenario for real applications.

Let’s see now how to create a Partial Index.

First, let’s create a collection with one million random documents. Each document contains a boolean field generated by the javascript function randomBool(). The function generates a false value in 5% of the documents, in order to have an uneven distribution. Then, test the number of false values in the collection.

> function randomBool() { var bool = true; var random_boolean = Math.random() >= 0.95; if(random_boolean) { bool = false }; return bool; }
> for (var i = 1; i <= 1000000; i++) { db.test.insert( { _id: i, name: "name"+i, flag: randomBool() } ) }
WriteResult({ "nInserted" : 1 })
> db.test.find().count()
1000000
> db.test.find( { flag: false } ).count()
49949

Create the index on the flag field and look at the index size using db.test.stats().

> db.test.createIndex( { flag: 1 } )
{ "createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1 }
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 4575232 }

The index we created is 4575232 bytes.

Test some simple queries to extract the documents based on the flag value and take a look at the index usage and the execution times. (For this purpose, we use an explainable object)

// create the explainable object
> var exp = db.test.explain( "executionStats" )
// explain the complete collection scan
> exp.find( {  } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1000000,
		"executionTimeMillis" : 250,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"nReturned" : 1000000,
			"executionTimeMillisEstimate" : 200,
			"works" : 1000002,
			"advanced" : 1000000,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents flag=true
> exp.find( { flag: true } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 1028,
		"totalKeysExamined" : 950051,
		"totalDocsExamined" : 950051,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 990,
			"works" : 950052,
			"advanced" : 950051,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 7422,
			"restoreState" : 7422,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 950051,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 950051,
				"executionTimeMillisEstimate" : 350,
				"works" : 950052,
				"advanced" : 950051,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 7422,
				"restoreState" : 7422,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				},
				"keysExamined" : 950051,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents with flag=false
> exp.find( { flag: false } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 83,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 70,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 10,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

As expected, MongoDB does a COLLSCAN when looking for db.test.find( {} ). The important thing here is that it takes 250 milliseconds for the entire collection scan.

In both the other cases – find ({flag:true}) and find({flag:false}) – MongoDB uses the index. But let’s have a look at the execution times:

  • for db.test.find({flag:true}) is 1028 milliseconds. The execution time is more than the COLLSCAN. The index in this case is not useful. COLLSCAN should be preferable.
  • for db.test.find({flag:false}) is 83 milliseconds. This is good. The index in this case is very useful.

Now, create the partial index on the flag field. To do it we must use the PartialFilterExpression option on the createIndex command.

// drop the existing index
> db.test.dropIndex( { flag: 1} )
{ "nIndexesWas" : 2, "ok" : 1 }
// create the partial index only on the false values
> db.test.createIndex( { flag : 1 }, { partialFilterExpression :  { flag: false }  } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
// get the index size
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 278528 }
// create the explainalbe object
> var exp = db.test.explain( "executionStats" )
// test the query for flag=false
> exp.find({ flag: false  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 80,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 80,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 40,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// test the query for flag=true
> exp.find({ flag: true  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 377,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 210,
			"works" : 1000002,
			"advanced" : 950051,
			"needTime" : 49950,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

We can notice the following:

  • db.test.find({flag:false}) uses the index and the execution time is more or less the same as before
  • db.test.find({flag:true}) doesn’t use the index. MongoDB does the COLLSCAN and the execution is better than before
  • note the index size is only 278528 bytes. now A great saving in comparison to the complete index on flag. There won’t be overhead during the writes in the great majority of the documents.

Partial option on other index types

You can use the partialFilterExpression option even in compound indexes or other index types. Let’s see an example of a compound index.

Insert some documents in the students collection

db.students.insert( [
{ _id:1, name: "John", class: "Math", grade: 10 },
{ _id: 2, name: "Peter", class: "English", grade: 6 },
{ _id: 3, name: "Maria" , class: "Geography", grade: 8 },
{ _id: 4, name: "Alex" , class: "Geography", grade: 5},
{ _id: 5, name: "George" , class: "Math", grade: 7 },
{ _id: 6, name: "Tony" , class: "English", grade: 9 },
{ _id: 7, name: "Sam" , class: "Math", grade: 6 },
{ _id: 8, name: "Tom" , class: "English", grade: 5 }
])

Create a partial compound index on name and class fields for the grade greater or equal to 8.

> db.students.createIndex( { name: 1, class: 1  }, { partialFilterExpression: { grade: { $gte: 8} } } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}

Notice that the grade field doesn’t necessarily need to be part of the index.

Query coverage

Using the students collection, we want now to show when a partial index can be used.

The important thing to remember is that a partial index is “partial”. It means that it doesn’t contain all the entries.

In order for MongoDB to use it the conditions in the query must include an expression on the filter field and the selected documents must be a subset of the index.

Let’s see some examples.

The following query can use the index because we are selecting a subset of the partial index.

> db.students.find({name:"Tony", grade:{$gt:8}})
{ "_id" : 6, "name" : "Tony", "class" : "English", "grade" : 9 }
// let's look at the explain
> db.students.find({name:"Tony", grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 8
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"name" : 1,
					"class" : 1
				},
				"indexName" : "name_1_class_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"name" : [ ],
					"class" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"name" : [
						"[\"Tony\", \"Tony\"]"
					],
					"class" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

The following query cannot use the index because the condition on grade > 5 is not selecting a subset of the partial index. So the COLLSCAN is needed.

> db.students.find({name:"Tony", grade:{$gt:5}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 5
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"name" : {
							"$eq" : "Tony"
						}
					},
					{
						"grade" : {
							"$gt" : 5
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Even the following query cannot use the index. As we said the grade field is not part of the index. The simple condition on grade is not sufficient.

> db.students.find({grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"grade" : {
				"$gt" : 8
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Sparse Index

A sparse index is an index that contains entries only for the documents that have the indexed field.

Since MongoDB is a schemaless database, not all the documents in a collection will necessarily contain the same fields. So we have two options when creating an index:

  • create a regular “non-sparse” index
    • the index contains as many entries as the documents
    • the index contains entries as null for all the documents without the indexed field
  • create a sparse index
    • the index contains as many entries as the documents with the indexed field

We call it “sparse” because it doesn’t contain all the documents of the collection.

The main advantage of the sparse option is to reduce the index size.

Here’s how to create a sparse index:

db.people.createIndex( { city: 1 }, { sparse: true } )

Sparse indexes are a subset of partial indexes. In fact you can emulate a sparse index using the following definition of a partial.

db.people.createIndex(
{city:  1},
{ partialFilterExpression: {city: {$exists: true} } }
)

For this reason partial indexes are preferred over sparse indexes.

Conclusions

Partial indexing is a great feature in MongoDB. You should consider using it to achieve the following advantages:

  • have smaller indexes
  • save disk and memory space
  • improve writes performance

You are strongly encouraged to consider partial indexes if you have one or more of these use cases:

  • you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
  • you have a low cardinality field and the majority of the queries look for a subset of the values
  • the majority of the queries look for a limited subset of the values in a field
  • you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache

Further readings

Partial indexes: https://docs.mongodb.com/manual/core/index-partial/

Sparse indexes: https://docs.mongodb.com/manual/core/index-sparse/

Articles on query optimization and investigation:


Photo by Mike Greer from Pexels

Dec
04
2018
--

MongoDB 4.0: Using ACID Multi-Document Transactions

mongodb 4.0 acid compliant transactions

mongodb 4.0 acid compliant transactionsMongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.

Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.

MongoDB 4.0 provides fully ACID transactions support but remember:

  • multi-document transactions are available for replica set deployments only
    • you can use transactions even on a standalone server but you need to configure it as a replica set (with just one node)
  • multi-document transactions are not available for sharded cluster
    • hopefully transactions will be available from version 4.2
  • multi-document transactions are available for the WiredTiger storage engine only

ACID transactions in MongoDB 4.0

ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.

  • Atomicity: a group of commands inside the transaction must follow the “all or nothing” paradigm. If only one of the commands fails for any reason, the complete transaction fails as well.
  • Consistency: if a transaction successfully executes, it will take the database from one state that is consistent to another state that is also consistent.
  • Isolation: multiple transactions can run at the same time in the system. Isolation guarantees that each transaction is not able to view partial results of the others. Executing multiple transactions in parallel must have the same results as running them sequentially
  • Durability: it guarantees that a transaction that has committed will remain persistent, even in the case of a system failure

Limitations of transactions

The support for transactions introduced some limitations:

  • a collection MUST exist in order to use transactions
  • a collection cannot be created or dropped inside a transaction
  • an index cannot be created or dropped inside a transaction
  • non-CRUD operations are not permitted inside a transaction (for example, administrative commands like createUser are not permitted )
  • a transaction cannot read or write in config, admin, and local databases
  • a transaction cannot write to system.* collections
  • the size of a transaction is limited to 16MB
    • a single oplog entry is generated during the commit: the writes inside the transaction don’t have single oplog entries as in regular queries
    • the limitation is a consequence of the 16MB maximum size of any BSON document in the oplog
    • in case of larger transactions, you should consider splitting these into smaller transactions
  • by default a transaction that executes for longer then 60 seconds will automatically expire
    • you can change this using the configuration parameter transactionLifetimeLimitSeconds
    • transactions rely on WiredTiger snapshot capability, and having a long running transaction can result in high pressure on WiredTiger’s cache to maintain snapshots, and lead to the retention of a lot of unflushed operations in memory

Sessions

Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.

At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.

Three new commands were introduce for creating, committing, and aborting transactions:

  • session.startTransaction()
    • starts a new transaction in the current session
  • session.commitTransaction()
    • saves consistently and durably the changes made by the operations in the transaction
  • session.abortTransaction()
    • the transaction ends without saving any of the changes made by the operations in the transaction

Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.

Our first transaction

To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:

#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo

Create a new collection, and insert some data.

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.createCollection('people')
{
   "ok" : 1,
   "operationTime" : Timestamp(1538483120, 1),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1538483120, 1),
      "signature" : {
         "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         "keyId" : NumberLong(0)
       }
    }
}
foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])

Create a session

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }

Start a transaction and insert some new documents

foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
WriteResult({ "nInserted" : 2 })

Now read the collection from inside and outside the session and see what happens

foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }

As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.

Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.

foo:PRIMARY> session.commitTransaction()
foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.

Isolation test

Let’s test now the isolation between two concurrent transactions.

Open the first connection, create a session and start a transaction:

//Connection #1
foo:PRIMARY> var session1 = db.getMongo().startSession()
foo:PRIMARY> session1.startTransaction()

do the same on the second connection:

//Connection #2
foo:PRIMARY> var session2 = db.getMongo().startSession()
foo:PRIMARY> session2.startTransaction()

Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.

//Connection #1
foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

Update the same collection on connection #2 to add the same gender field to all the males:

//Connection #2
foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})
WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.

Commit the transaction in connection #1:

//Connection #1
foo:PRIMARY> session1.commitTransaction()
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

In the connection #2 read the collection:

//Connection #2
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M"  }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M"  }
{ "_id" : 5, "name" : "Tom", "gender" : "M"  }

As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.

Now commit the transaction in connection #2 and see the new values of the collection:

//Connection #2
foo:PRIMARY> session2.commitTransaction()
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

Conflicts

When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.

Let’s see an example.

Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.

//Connection #1
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.

//Connection #2
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})
WriteCommandError({
    "errorLabels" : [
       "TransientTransactionError"
    ],
    "operationTime" : Timestamp(1538495683, 1),
    "ok" : 0,
    "errmsg" : "WriteConflict",
    "code" : 112,
    "codeName" : "WriteConflict",
    "$clusterTime" : {
       "clusterTime" : Timestamp(1538495683, 1),
       "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
       }
     }
})

We received an error and the transaction failed. We can retry it later.

Other details

  • the individual writes inside the transaction are not retry-able even if retryWrites is set to true
  • each commit operation is a retry-able write operation regardless of whether retryWrites is set to true. The drivers retry the commit a single time in case of an error.
  • Read Concern supports snapshot, local and majority values
  • Write Concern can be set at the transaction level. The individual operations inside the transaction ignore the write concern. Write concern is evaluated during the commit
  • Read Preference supports only primary value

Conclusions

Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.

Image modified from original photo: by Annie Spratt on Unsplash

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com