Mar
14
2019
--

Percona’s Open Source Data Management Software Survey

PerconaSurvey

Click Here to Complete our New Survey!

Last year we informally surveyed the open source community and our conference attendees.
The results revealed that:

  • 48% of those in the cloud choose to self-manage their databases, but 52% were comfortable relying on the DBaaS offering of their cloud vendor.
  • 49% of people said “performance issues” when asked, “what keeps you up at night?”
  • The major decision influence for buying services was price, with 42% of respondents keen to make the most of their money.

We found this information so interesting that we wanted to find out more! As a result, we are pleased to announce the launch of our first annual Open Source Data Management Software Survey.

The final results will be 100% anonymous, and will be made freely available on Creative Commons.

How Will This Survey Help The Community?

Unlimited access to accurate market data is important. Millions of open source projects are in play, and most are dependent on databases. Accurate market data helps you track the popularity of different databases, as well as seeing how and where these databases are run. This helps us all build better software and take advantage of shifting trends.

Thousands of vendors are focused on helping SysAdmins, DBAs, and Developers get the most out of their database infrastructure. Insightful market data enables them to create better tools that meet current demands and grow the open source database market.

We want to assist companies who are still deciding what, how, and where to run their systems. This information will help them understand the industry direction and allow them to make an informed decision on the software and services they choose.

How Can You Help Make This Survey A Success?

Firstly, please share your insight into current trends and new developments in open source data management software.

Secondly, please share this survey with other people who work in the industry, and encourage them to contribute.

The more responses we receive, the more useful this will be to the whole open source community. If we missed anything, or you would like to ask other questions in future, let us know!

So tell us; who are the big fish, and which minnows are nibbling at their tails?! Is the cloud giving you altitude sickness, or are you flying high? What is the next big thing and is everyone on board, or is your company lagging behind?

Preliminary results will be presented at our annual Percona Live Conference in Austin, Texas (May 28-30, 2019) by our CEO, Peter Zaitsev and released to the open source community when finalized.

Click Here to Have Your Say!

Jan
10
2019
--

PostgreSQL Updatable Views: Performing Schema Updates With Minimal Downtime

postgres updatable views

postgres updatable viewsRecently, one of our customers asked us how to minimize downtime when upgrading the database structure with changes that are not backwards-compatible. It’s an interesting question and I would like to visit some alternatives here. I will use PostgreSQL for this series of posts and walk through updatable views, INSTEAD OF Triggers, and the Rule System. Later, we’ll discuss alternatives available for other databases like MySQL.

This first post will give an overview of the problem and also the first implementation of the solution in PostgreSQL using updatable Views.

The Motivation

Software is like a living organism and as such, they evolve. It’s not surprising that the database schemas also evolve, and this brings us a problem: how to minimize downtime when performing upgrades? Or even further, is it possible to upgrade them without activating maintenance mode thereby making the service unavailable for our customers?

Let’s say that we want to push out an update 2.0. It’s a major update, and in this update, there are application code changes and changes to the database such as altered tables, dropped columns, new tables and so on. Checking the changelog, we notice that most of the database changes are backwards-compatible but a few modified tables are not so we can’t just push out the new database changes without breaking some functionality in the existing codebase. To avoid triggering errors while we upgrade the database, we need to shutdown the application servers, update the database, update the codebase, and then get the servers back and running again. That means that we need an unwanted maintenance window!

As per our definition of the problem, we want to get to the point where we don’t have to use this maintenance window, a point where the old and new codebase could coexist for a period of time while we upgrade the system. One solution is to not make changes that the current codebase can’t handle, but, as you may have already assumed, it isn’t really an option when we are constantly trying to optimize and improve our databases. Another option, then, would be to use PostgreSQL updatable views.

Updatable Views

PostgreSQL has introduced automatically updatable views in 9.3. The documentation[1] says that simple views are automatically updatable and the system will allow INSERT, UPDATE or DELETE statements to be used on the view in the same way as on a regular table. A view is automatically updatable if it satisfies all of the following conditions:

  • The view must have exactly one entry in its FROM list, which must be a table or another updatable view.
  • The view definition must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses at the top level.
  • The view definition must not contain set operations (UNION, INTERSECT or EXCEPT) at the top level.
  • The view’s select list must not contain any aggregates, window functions, or set-returning functions.

Note that the idea is to give a simple mechanism that helps when using views, and if the view is automatically updatable the system will convert any INSERT, UPDATE or DELETE statement on the view into the corresponding statement on the underlying base table. This can also be used to increase the security granularity giving the power to define privilege that operates at the level. If using a WHERE clause in the view we can use the CHECK OPTION to prevent the user from being able to UPDATE or INSERT rows that are not in the scope of the view. For example, let’s say we have a view created to limit the user to view records from a specific country.  If the user changes the country of any record, those records would disappear from the view. The CHECK OPTION can help to prevent this from happening. I recommend reading the documentation for more information about how views work in PostgreSQL.

Implementation

Using updatable views makes the implementation as simple as creating views. For our example I will use the below table:

test=# CREATE TABLE t (id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, password VARCHAR(300) NOT NULL, date_created TIMESTAMP NOT NULL DEFAULT now());
CREATE TABLE
test=# INSERT INTO t(id, name, password) VALUES (1, 'user_1', 'pwd_1'), (2, 'user_2','pwd_2'),(3,'user_3','pwd_3'),(4,'user_4','pwd_4'),(5,'user_5','pwd_5');
INSERT 0 5
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)

We then changed the schema renaming the columns password to pwd, date_created to dt_created and added 2 more columns, pwd_salt and comment. The added columns are not a real problem because they can be either nullable or have a default value but the column name change is a problem. The changes are:

test=# create schema v_10;
CREATE SCHEMA
test=# CREATE VIEW v_10.t AS SELECT id, name, password AS password, date_created AS date_created FROM public.t;
CREATE VIEW
test=# ALTER TABLE public.t RENAME COLUMN password TO pwd;
ALTER TABLE
test=# ALTER TABLE public.t RENAME COLUMN date_created TO dt_created;
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN pwd_salt VARCHAR(100);
ALTER TABLE
test=# ALTER TABLE public.t ADD COLUMN comment VARCHAR(500);
ALTER TABLE

To make sure our application will work properly we’ve defined that the tables will be in a specific main schema, in this example is the PUBLIC schema and the views will be in the versioned schemas. In this case, if we have a change in one specific version that needs a view guaranteeing backwards-compatibility, we just create the view inside the versioned schema and apply the changes to the table in the main schema. The application will always define the “search_path” as “versioned_schema,main_schema”, which is “v_10, public” in this example:

test=# SET search_path TO v_10, public;
SET
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
(5 rows)
test=# select * from public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | pwd_3 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the application still sees the old schema, but does this work? What if someone updates the password of ID #3? Let’s check:

test=# UPDATE t SET password = 'new_pwd_3' WHERE id = 3;
UPDATE 1
test=# SELECT * FROM t;
id | name | password | date_created
----+--------+-----------+----------------------------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455
(5 rows)
test=# SELECT * FROM public.t;
id | name | pwd | dt_created | pwd_salt | comment
----+--------+-----------+----------------------------+----------+---------
1 | user_1 | pwd_1 | 2018-12-27 07:50:39.562455 | |
2 | user_2 | pwd_2 | 2018-12-27 07:50:39.562455 | |
4 | user_4 | pwd_4 | 2018-12-27 07:50:39.562455 | |
5 | user_5 | pwd_5 | 2018-12-27 07:50:39.562455 | |
3 | user_3 | new_pwd_3 | 2018-12-27 07:50:39.562455 | |
(5 rows)

As we can see, the updatable view worked just like a charm! The new and old application codebase can coexist and work together while we roll up our upgrades. There are some restrictions, as explained in the documentation, like having only one table or view in the WHERE clause but for its simplicity, upgradable views do a great job. For more complex cases where we need to split/join tables? Well, we will discuss these in future articles and show how we can solve them with both TRIGGERS and the PostgreSQL Rule System.

References

[1] https://www.postgresql.org/docs/current/sql-createview.html


Photo by Egor Kamelev from Pexels

Dec
31
2018
--

Great things that happened with PostgreSQL in the Year 2018

PostgreSQL in 2018

In this blog post, we’ll look back at what’s been going on in the world of PostgreSQL in 2018.

Before we start talking about the good things that have happened in the PostgreSQL in  2018, we hope you had a wonderful year and we wish you a happy and prosperous 2019.

PostgreSQL has been a choice for those who are looking for a completely community-driven open source database that is feature-rich and extensible. We have seen tremendously great things happening in PostgreSQL for many years, with 2018 being a prime example. As you could see the following snippet from DB engine rankings, PostgreSQL has topped the chart for growth in popularity in the year 2018 compared to other databases.

PostgreSQL adoption growth has been increasing year over year, and 2018 has again been one such year as we can see.

Let’s start with a recap of some of the great PostgreSQL events, and look at what we should take away from 2018 in the PostgreSQL space.

PostgreSQL 11 Released

PostgreSQL 11 was a release that incorporated a lot of features offered in commercial database software governed by an enterprise license. For example, there are times when you are required to enforce the handling of embedded transactions inside a stored procedure in your application code. There are also times when you wish to partition a table with foreign keys or use hash partitioning. This used to require workarounds. The release of PostgreSQL 11 covers these scenarios.

There were many other add-ons as well, such as Just-In-Time compilation, improved query parallelism, partition elimination, etc. You can find out more in our blog post here, or the PostgreSQL 11 release notes (if you have not seen already). Special thanks to everyone involved in such a vibrant PostgreSQL release.

End of Life for PostgreSQL 9.3

9.3.25 was the last minor release that has happened for PostgreSQL 9.3 (on November 8, 2018). There will be no more minor releases supported by the community for 9.3. If you are still using PostgreSQL 9.3 (or a major earlier release than 9.3), it is the time to start planning to upgrade your database to take advantage of additional features and performance improvements.

Watch out for future Percona webinars (dates will be out soon) on PostgreSQL migrations and upgrades that will help handle situations such as downtime and other complexities involved in migrating your partitions built using table inheritance when you migrate from legacy PostgreSQL versions to the latest versions.

PostgreSQL Minor Releases

For minor PostgreSQL release, there was nothing new in what we saw this year compared to previous years. The PostgreSQL community aims for a minor version release for all the supported versions every quarter. However, we may see more minor releases due to critical bug fixes or security fixes. One of such release was done on March 3rd, 2018 for the CVE-2018-1058 security fix. This proves that you do not necessarily need to wait for specific release dates when a security vulnerability has been identified. You may see the fix released as a minor version as soon as the development, review and testing are completed for the fix.  

There have been five minor releases this year on the following dates.

Security Fixes in All the Supported PostgreSQL Releases This Year

The PostgreSQL Global Development Team and contributors handle security fixes very seriously. There have been several instances where we have received immediate responses after reporting a problem or a bug. Likewise, we have seen many security bug fixes as soon as they have been reported.

Following are a list of security fixes we have seen in the year 2018:

We thank all the Core team, Contributors, Hackers and Users involved in making it another great year for PostgreSQL and a huge WIN for the open source world.

If you would like to participate in sharing your PostgreSQL knowledge to a wider audience, or if you have got a great topic that you would like to talk about, please submit your proposal to one of the world’s biggest open source conferences: Percona Live Open Source Database Conference 2019 in Austin, Texas from May 28-30, 2019. The Call for Papers is open until Jan 20, 2019.

You can also submit blog articles to our Open Source Database Community Blog for publishing to a wider open source database audience.

Please subscribe to our blog posts to hear many more great things that are happening in the PostgreSQL space.

Dec
20
2018
--

Benchmark PostgreSQL With Linux HugePages

Benchmarking HugePages and PostgreSQL

Linux kernel provides a wide range of configuration options that can affect performance. It’s all about getting the right configuration for your application and workload. Just like any other database, PostgreSQL relies on the Linux kernel to be optimally configured. Poorly configured parameters can result in poor performance. Therefore, it is important that you benchmark database performance after each tuning session to avoid performance degradation. In one of my previous posts, Tune Linux Kernel Parameters For PostgreSQL Optimization, I described some of the most useful Linux kernel parameters and how those may help you improve database performance. Now I am going to share my benchmark results with you after configuring Linux Huge Page with different PostgreSQL workload. I have performed a comprehensive set of benchmarks for many different PostgreSQL load sizes and different number concurrent clients.

Benchmark Machine

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 11

Linux Kernel Settings

I have used default kernel settings without any optimization/tuning except for disabling Transparent HugePages. Transparent HugePages are by default enabled, and allocate a page size that may not be recommended for database usage. For databases generally, fixed sized HugePages are needed, which Transparent HugePages do not provide. Hence, disabling this feature and defaulting to classic HugePages is always recommended.

PostgreSQL Settings

I have used consistent PostgreSQL settings for all the benchmarks in order to record different PostgreSQL workloads with different settings of Linux HugePages. Here is the PostgreSQL setting used for all benchmarks:

shared_buffers = '64GB'
work_mem = '1GB'
random_page_cost = '1'
maintenance_work_mem = '2GB'
synchronous_commit = 'on'
seq_page_cost = '1'
max_wal_size = '100GB'
checkpoint_timeout = '10min'
synchronous_commit = 'on'
checkpoint_completion_target = '0.9'
autovacuum_vacuum_scale_factor = '0.4'
effective_cache_size = '200GB'
min_wal_size = '1GB'
wal_compression = 'ON'

Benchmark scheme

In the benchmark, the benchmark scheme plays an important role. All the benchmarks are run three times with thirty minutes duration for each run. I took the median value from these three benchmarks. The benchmarks were carried out using the PostgreSQL benchmarking tool pgbench.  pgbench works on scale factor, with one scale factor being approximately 16MB of workload. 

HugePages

Linux, by default, uses 4K memory pages along with HugePages. BSD has Super Pages, whereas Windows has Large Pages. PostgreSQL has support for HugePages (Linux) only. In cases where there is a high memory usage, smaller page sizes decrease performance. By setting up HugePages, you increase the dedicated memory for the application and therefore reduce the operational overhead that is incurred during allocation/swapping; i.e. you gain performance by using HugePages.

Here is the Hugepage setting when using Hugepage size of 1GB. You can always get this information from /proc.

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:     100
HugePages_Free:       97
HugePages_Rsvd:       63
HugePages_Surp:        0
Hugepagesize:    1048576 kB

For more detail about HugePages please read my previous blog post.

https://www.percona.com/blog/2018/08/29/tune-linux-kernel-parameters-for-postgresql-optimization/

Generally, HugePages comes in sizes 2MB and 1GB, so it makes sense to use 1GB size instead of the much smaller 2MB size.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge
https://kerneltalks.com/services/what-is-huge-pages-in-linux/

Benchmark Results

This benchmark shows the overall impact of different sizes of HugePages. The first set of benchmarks was created with the default Linux 4K page size without enabling HugePages. Note that Transparent Hugepages were also disabled, and remained disabled throughout these benchmarks.

Then the second set of benchmarks was performed with 2MB HugePages. Finally, the third set of benchmarks is performed with HugePages set to 1GB in size.

All these benchmarks were executed with PostgreSQL version 11. The sets include a combination of different database sizes and clients. The graph below shows comparative performance results for these benchmarks with TPS (transactions per seconds) on the y-axis, and database size and the number of clients per database size on the x-axis.

 

Clearly, from the graph above, you can see that the performance gain with HugePages increases as the number of clients and the database size increases, as long as the size remains within the pre-allocated shared buffer.

This benchmark shows TPS versus clients. In this case, the database size is set to 48GB. On the y-axis, we have TPS and on the x-axis, we have the number of connected clients. The database size is small enough to fit in the shared buffer, which is set to 64GB.

With HugePages set to 1GB, the higher the number of clients, the higher the comparative performance gain.

The next graph is the same as the one above except for a database size of 96GB. This exceeds the shared buffer size, which is set to 64GB.

 

The key observation here is that the performance with 1GB HugePages improves as the number of clients increases and it eventually gives better performance than 2MB HugePages or the standard 4KB page size.

This benchmark shows the TPS versus database size. In this case, the number of connected clients it set to 32. On the y-axis, we have TPS and on the x-axis, we have database sizes.

As expected, when the database spills over the pre-allocated HugePages, the performance degrades significantly.

Summary

One of my key recommendations is that we must keep Transparent HugePages off. You will see the biggest performance gains when the database fits into the shared buffer with HugePages enabled. Deciding on the size of huge page to use requires a bit of trial and error, but this can potentially lead to a significant TPS gain where the database size is large but remains small enough to fit in the shared buffer.

Dec
19
2018
--

Using Partial and Sparse Indexes in MongoDB

MongoDb using partial sparse indexes

MongoDb using partial sparse indexesIn this article I’m going to talk about partial and sparse indexes in MongoDB® and Percona Server for MongoDB®. I’ll show you how to use them, and look at cases where they can be helpful. Prior to discussing these indexes in MongoDB in detail, though, let’s talk about an issue on a relational database like MySQL®.

The boolean issue in MySQL

Consider you have a very large table in MySQL with a boolean column. Typically you created a ENUM(‘T’,’F’) field to store the boolean information or a TINYINT column to store only 1s and 0s. This is good so far. But think now what you can do if you need to run a lot of queries on the table, with a condition on the boolean field, and no other relevant conditions on other indexed columns are used to filter the examined rows.

Why not create and index on the boolean field? Well, yes, you can, but in some cases this solution will be completely useless and will introduce an overhead for the index maintenance.

Think about if you have an even distribution of true and false values in the table, in more or less a 50:50 split. In this situation, the index on the boolean column cannot be used because MySQL will prefer to do a full scan of the large table instead of selecting half of rows using the BTREE entries. We can say that a boolean field like this one has a low cardinality, and it’s not highly selective.

Consider now the case in which you don’t have an even distribution of the values, let’s say 2% of the rows contain false and the remaining 98% contain true. In such a situation, a query to select the false values will most probably use the index. The queries to select the true values won’t use the index, for the same reason we have discussed previously. In this second case the index is very useful, but only for selecting the great minority of rows. The remaining 98% of the entries in the index are completely useless. This represents a great waste of disk space and resources, because the index must be maintained for each write.

It’s not just booleans that can have this problem in relation to index usage, but any field with a low cardinality.

Note: there are several workarounds to deal with this problem, I know. For example, you can create a multi-column index using a more selective field and the boolean. Or you could design your database differently. Here, I’m illustrating the nature of the problem in order to explain a MongoDB feature in a context. 

The boolean issue in MongoDB

How about MongoDB? Does MongoDB have the same problem?  The answer is: yes, MongoDB has the same problem. If you have a lot of documents in a collection with a boolean field or a low cardinality field, and you create an index on it, then you will have a very large index that’s not really useful. But more importantly you will have writes degradation for the index maintenance.

The only difference is that MongoDB will tend to use the index anyway, instead of doing the entire collection scan, but the execution time will be of the same magnitude as doing the COLLSCAN. In the case of very large indexes, a COLLSCAN should be preferable.

Fortunately MongoDB has an option that you can specify during index creation to define a Partial Index. Let’s see.

Partial Index

A partial index is an index that contains only a subset of values based on a filter rule. So, in the case of the unevenly distributed boolean field, we can create an index on it specifying that we want to consider only the false values. This way we avoid recording the remaining 98% of useless true entries. The index will be smaller, we’ll save disk and memory space, and the most frequent writes – when entering the true values – won’t initiate the index management activity. As a result, we won’t have lots of penalties during writes but we’ll have a useful index when searching the false values.

Let’s say that, when you have an uneven distribution, the most relevant searches are the ones for the minority of the values. This is in general the scenario for real applications.

Let’s see now how to create a Partial Index.

First, let’s create a collection with one million random documents. Each document contains a boolean field generated by the javascript function randomBool(). The function generates a false value in 5% of the documents, in order to have an uneven distribution. Then, test the number of false values in the collection.

> function randomBool() { var bool = true; var random_boolean = Math.random() >= 0.95; if(random_boolean) { bool = false }; return bool; }
> for (var i = 1; i <= 1000000; i++) { db.test.insert( { _id: i, name: "name"+i, flag: randomBool() } ) }
WriteResult({ "nInserted" : 1 })
> db.test.find().count()
1000000
> db.test.find( { flag: false } ).count()
49949

Create the index on the flag field and look at the index size using db.test.stats().

> db.test.createIndex( { flag: 1 } )
{ "createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1 }
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 4575232 }

The index we created is 4575232 bytes.

Test some simple queries to extract the documents based on the flag value and take a look at the index usage and the execution times. (For this purpose, we use an explainable object)

// create the explainable object
> var exp = db.test.explain( "executionStats" )
// explain the complete collection scan
> exp.find( {  } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1000000,
		"executionTimeMillis" : 250,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"nReturned" : 1000000,
			"executionTimeMillisEstimate" : 200,
			"works" : 1000002,
			"advanced" : 1000000,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents flag=true
> exp.find( { flag: true } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 1028,
		"totalKeysExamined" : 950051,
		"totalDocsExamined" : 950051,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 990,
			"works" : 950052,
			"advanced" : 950051,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 7422,
			"restoreState" : 7422,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 950051,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 950051,
				"executionTimeMillisEstimate" : 350,
				"works" : 950052,
				"advanced" : 950051,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 7422,
				"restoreState" : 7422,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				},
				"keysExamined" : 950051,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents with flag=false
> exp.find( { flag: false } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 83,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 70,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 10,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

As expected, MongoDB does a COLLSCAN when looking for db.test.find( {} ). The important thing here is that it takes 250 milliseconds for the entire collection scan.

In both the other cases – find ({flag:true}) and find({flag:false}) – MongoDB uses the index. But let’s have a look at the execution times:

  • for db.test.find({flag:true}) is 1028 milliseconds. The execution time is more than the COLLSCAN. The index in this case is not useful. COLLSCAN should be preferable.
  • for db.test.find({flag:false}) is 83 milliseconds. This is good. The index in this case is very useful.

Now, create the partial index on the flag field. To do it we must use the PartialFilterExpression option on the createIndex command.

// drop the existing index
> db.test.dropIndex( { flag: 1} )
{ "nIndexesWas" : 2, "ok" : 1 }
// create the partial index only on the false values
> db.test.createIndex( { flag : 1 }, { partialFilterExpression :  { flag: false }  } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
// get the index size
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 278528 }
// create the explainalbe object
> var exp = db.test.explain( "executionStats" )
// test the query for flag=false
> exp.find({ flag: false  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 80,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 80,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 40,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// test the query for flag=true
> exp.find({ flag: true  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 377,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 210,
			"works" : 1000002,
			"advanced" : 950051,
			"needTime" : 49950,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

We can notice the following:

  • db.test.find({flag:false}) uses the index and the execution time is more or less the same as before
  • db.test.find({flag:true}) doesn’t use the index. MongoDB does the COLLSCAN and the execution is better than before
  • note the index size is only 278528 bytes. now A great saving in comparison to the complete index on flag. There won’t be overhead during the writes in the great majority of the documents.

Partial option on other index types

You can use the partialFilterExpression option even in compound indexes or other index types. Let’s see an example of a compound index.

Insert some documents in the students collection

db.students.insert( [
{ _id:1, name: "John", class: "Math", grade: 10 },
{ _id: 2, name: "Peter", class: "English", grade: 6 },
{ _id: 3, name: "Maria" , class: "Geography", grade: 8 },
{ _id: 4, name: "Alex" , class: "Geography", grade: 5},
{ _id: 5, name: "George" , class: "Math", grade: 7 },
{ _id: 6, name: "Tony" , class: "English", grade: 9 },
{ _id: 7, name: "Sam" , class: "Math", grade: 6 },
{ _id: 8, name: "Tom" , class: "English", grade: 5 }
])

Create a partial compound index on name and class fields for the grade greater or equal to 8.

> db.students.createIndex( { name: 1, class: 1  }, { partialFilterExpression: { grade: { $gte: 8} } } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}

Notice that the grade field doesn’t necessarily need to be part of the index.

Query coverage

Using the students collection, we want now to show when a partial index can be used.

The important thing to remember is that a partial index is “partial”. It means that it doesn’t contain all the entries.

In order for MongoDB to use it the conditions in the query must include an expression on the filter field and the selected documents must be a subset of the index.

Let’s see some examples.

The following query can use the index because we are selecting a subset of the partial index.

> db.students.find({name:"Tony", grade:{$gt:8}})
{ "_id" : 6, "name" : "Tony", "class" : "English", "grade" : 9 }
// let's look at the explain
> db.students.find({name:"Tony", grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 8
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"name" : 1,
					"class" : 1
				},
				"indexName" : "name_1_class_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"name" : [ ],
					"class" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"name" : [
						"[\"Tony\", \"Tony\"]"
					],
					"class" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

The following query cannot use the index because the condition on grade > 5 is not selecting a subset of the partial index. So the COLLSCAN is needed.

> db.students.find({name:"Tony", grade:{$gt:5}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 5
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"name" : {
							"$eq" : "Tony"
						}
					},
					{
						"grade" : {
							"$gt" : 5
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Even the following query cannot use the index. As we said the grade field is not part of the index. The simple condition on grade is not sufficient.

> db.students.find({grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"grade" : {
				"$gt" : 8
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Sparse Index

A sparse index is an index that contains entries only for the documents that have the indexed field.

Since MongoDB is a schemaless database, not all the documents in a collection will necessarily contain the same fields. So we have two options when creating an index:

  • create a regular “non-sparse” index
    • the index contains as many entries as the documents
    • the index contains entries as null for all the documents without the indexed field
  • create a sparse index
    • the index contains as many entries as the documents with the indexed field

We call it “sparse” because it doesn’t contain all the documents of the collection.

The main advantage of the sparse option is to reduce the index size.

Here’s how to create a sparse index:

db.people.createIndex( { city: 1 }, { sparse: true } )

Sparse indexes are a subset of partial indexes. In fact you can emulate a sparse index using the following definition of a partial.

db.people.createIndex(
{city:  1},
{ partialFilterExpression: {city: {$exists: true} } }
)

For this reason partial indexes are preferred over sparse indexes.

Conclusions

Partial indexing is a great feature in MongoDB. You should consider using it to achieve the following advantages:

  • have smaller indexes
  • save disk and memory space
  • improve writes performance

You are strongly encouraged to consider partial indexes if you have one or more of these use cases:

  • you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
  • you have a low cardinality field and the majority of the queries look for a subset of the values
  • the majority of the queries look for a limited subset of the values in a field
  • you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache

Further readings

Partial indexes: https://docs.mongodb.com/manual/core/index-partial/

Sparse indexes: https://docs.mongodb.com/manual/core/index-sparse/

Articles on query optimization and investigation:


Photo by Mike Greer from Pexels

Dec
04
2018
--

MongoDB 4.0: Using ACID Multi-Document Transactions

mongodb 4.0 acid compliant transactions

mongodb 4.0 acid compliant transactionsMongoDB 4.0 is around, and there are a lot of new features and improvements. In this article we’re going to focus on the major feature which is, undoubtedly, the support for multi-document ACID transactions. This novelty for a NoSQL database could be seen as a way to get closer to the relational world. Well, it’s not that—or maybe not just that. It’s a way to add to the document-based model a new, important, and often requested feature to address a wider range of use cases. The document model and its flexibility should remain the best way to start building an application on MongoDB. At this stage, transactions should be used in specific cases, when you absolutely need them: for example, because your application is aware of data consistency and atomicity. Transactions incur a greater performance cost over single document writes, so the denormalized data model will continue to be optimal in many cases and this helps to minimize the need for transactions.

Single writes are atomic by design: as long as you are able to embed documents in your collections you absolutely don’t need to use a transaction. Even so, transaction support is a very good and interesting feature that you can rely on in MongoDB from now on.

MongoDB 4.0 provides fully ACID transactions support but remember:

  • multi-document transactions are available for replica set deployments only
    • you can use transactions even on a standalone server but you need to configure it as a replica set (with just one node)
  • multi-document transactions are not available for sharded cluster
    • hopefully transactions will be available from version 4.2
  • multi-document transactions are available for the WiredTiger storage engine only

ACID transactions in MongoDB 4.0

ACID properties are well known in the world of relational databases, but let’s recap what the acronym means.

  • Atomicity: a group of commands inside the transaction must follow the “all or nothing” paradigm. If only one of the commands fails for any reason, the complete transaction fails as well.
  • Consistency: if a transaction successfully executes, it will take the database from one state that is consistent to another state that is also consistent.
  • Isolation: multiple transactions can run at the same time in the system. Isolation guarantees that each transaction is not able to view partial results of the others. Executing multiple transactions in parallel must have the same results as running them sequentially
  • Durability: it guarantees that a transaction that has committed will remain persistent, even in the case of a system failure

Limitations of transactions

The support for transactions introduced some limitations:

  • a collection MUST exist in order to use transactions
  • a collection cannot be created or dropped inside a transaction
  • an index cannot be created or dropped inside a transaction
  • non-CRUD operations are not permitted inside a transaction (for example, administrative commands like createUser are not permitted )
  • a transaction cannot read or write in config, admin, and local databases
  • a transaction cannot write to system.* collections
  • the size of a transaction is limited to 16MB
    • a single oplog entry is generated during the commit: the writes inside the transaction don’t have single oplog entries as in regular queries
    • the limitation is a consequence of the 16MB maximum size of any BSON document in the oplog
    • in case of larger transactions, you should consider splitting these into smaller transactions
  • by default a transaction that executes for longer then 60 seconds will automatically expire
    • you can change this using the configuration parameter transactionLifetimeLimitSeconds
    • transactions rely on WiredTiger snapshot capability, and having a long running transaction can result in high pressure on WiredTiger’s cache to maintain snapshots, and lead to the retention of a lot of unflushed operations in memory

Sessions

Sessions were deployed in version 3.6 in order to run the retryable writes (for example) but they are very important, too, for transactions. In fact any transaction is associated with an open session. Prior to starting a transaction, a session must be created. A transaction cannot be run outside a session.

At any given time you may have multiple running sessions in the system, but each session may run only a single transaction at a time. You can run transactions in parallel according to how many open sessions you have.

Three new commands were introduce for creating, committing, and aborting transactions:

  • session.startTransaction()
    • starts a new transaction in the current session
  • session.commitTransaction()
    • saves consistently and durably the changes made by the operations in the transaction
  • session.abortTransaction()
    • the transaction ends without saving any of the changes made by the operations in the transaction

Note: in the following examples, we use two different connections to create two sessions. We do this for the sake of simplicity, but remember that you can create multiple sessions even inside a single connection, assigning each session to a different variable.

Our first transaction

To test our first transaction if you don’t have a replica set already configured let’s start a standalone server like this:

#> mongod --dbpath /data/db --logpath /data/mongo.log --fork --replSet foo

Create a new collection, and insert some data.

foo:PRIMARY> use percona
switched to db percona
foo:PRIMARY> db.createCollection('people')
{
   "ok" : 1,
   "operationTime" : Timestamp(1538483120, 1),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1538483120, 1),
      "signature" : {
         "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
         "keyId" : NumberLong(0)
       }
    }
}
foo:PRIMARY> db.people.insert([{_id:1, name:"Corrado"},{_id:2, name:"Peter"},{_id:3,name:"Heidi"}])

Create a session

foo:PRIMARY> session = db.getMongo().startSession()
session { "id" : UUID("dcfa7de5-527d-4b1c-a890-53c9a355920d") }

Start a transaction and insert some new documents

foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.insert([{_id: 4 , name : "George"},{_id: 5, name: "Tom"}])
WriteResult({ "nInserted" : 2 })

Now read the collection from inside and outside the session and see what happens

foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }

As you might notice, since the transaction is not yet committed, you can see the modifications only from inside the session. You cannot see any of the modifications outside of the session, even in the same connection. If you try to open a new connection to the database, then you will not be able to see any of the modifications either.

Now, commit the transaction and see that you can now read the same data both inside and outside the session, as well as from any other connection.

foo:PRIMARY> session.commitTransaction()
foo:PRIMARY> session.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }
foo:PRIMARY> db.people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

When the transaction is committed, all the data are written consistently and durably in the database, just like any typical write. So, writing to the journal file and to the oplog takes place in the same way it as for any single write that’s not inside a transaction. As long as the transaction is open, any modification is stored in memory.

Isolation test

Let’s test now the isolation between two concurrent transactions.

Open the first connection, create a session and start a transaction:

//Connection #1
foo:PRIMARY> var session1 = db.getMongo().startSession()
foo:PRIMARY> session1.startTransaction()

do the same on the second connection:

//Connection #2
foo:PRIMARY> var session2 = db.getMongo().startSession()
foo:PRIMARY> session2.startTransaction()

Update the document on connection #1 to record Heidi’s document. Add the gender field to the document.

//Connection #1
foo:PRIMARY> session1.getDatabase("percona").people.update({_id:3},{$set:{ gender: "F" }})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

Update the same collection on connection #2 to add the same gender field to all the males:

//Connection #2
foo:PRIMARY> session2.getDatabase("percona").people.update({_id:{$in:[1,2,4,5]}},{$set:{ gender: "M" }},{multi:"true"})
WriteResult({ "nMatched" : 4, "nUpserted" : 0, "nModified" : 4 })
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

The two transactions are isolated, each one can see only the ongoing modifications that it has made itself.

Commit the transaction in connection #1:

//Connection #1
foo:PRIMARY> session1.commitTransaction()
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado" }
{ "_id" : 2, "name" : "Peter" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George" }
{ "_id" : 5, "name" : "Tom" }

In the connection #2 read the collection:

//Connection #2
foo:PRIMARY> session1.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M"  }
{ "_id" : 3, "name" : "Heidi" }
{ "_id" : 4, "name" : "George", "gender" : "M"  }
{ "_id" : 5, "name" : "Tom", "gender" : "M"  }

As you can see the second transaction still sees its own modifications, and cannot see the already committed updates of the other transaction. This kind of isolation works the same as the “REPEATABLE READ” level of MySQL and other relational databases.

Now commit the transaction in connection #2 and see the new values of the collection:

//Connection #2
foo:PRIMARY> session2.commitTransaction()
foo:PRIMARY> session2.getDatabase("percona").people.find()
{ "_id" : 1, "name" : "Corrado", "gender" : "M" }
{ "_id" : 2, "name" : "Peter", "gender" : "M" }
{ "_id" : 3, "name" : "Heidi", "gender" : "F" }
{ "_id" : 4, "name" : "George", "gender" : "M" }
{ "_id" : 5, "name" : "Tom", "gender" : "M" }

Conflicts

When two (or more) concurrent transactions modify the same documents, we may have a conflict. MongoDB can detect a conflict immediately, even while transactions are not yet committed. The first transaction to acquire the lock on a document will continue, the second one will receive the conflict error message and fail. The failed transaction can then be retried later.

Let’s see an example.

Create a new transaction in connection #1 to update Heidi’s document. We want to change the name to Luise.

//Connection #1
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Luise"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Let’s try to modify the same document in a concurrent transaction in connection #2. Modify the name from Heidi to Marie in this case.

//Connection #2
foo:PRIMARY> session.startTransaction()
foo:PRIMARY> session.getDatabase("percona").people.update({name:"Heidi"},{$set:{name:"Marie"}})
WriteCommandError({
    "errorLabels" : [
       "TransientTransactionError"
    ],
    "operationTime" : Timestamp(1538495683, 1),
    "ok" : 0,
    "errmsg" : "WriteConflict",
    "code" : 112,
    "codeName" : "WriteConflict",
    "$clusterTime" : {
       "clusterTime" : Timestamp(1538495683, 1),
       "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
       }
     }
})

We received an error and the transaction failed. We can retry it later.

Other details

  • the individual writes inside the transaction are not retry-able even if retryWrites is set to true
  • each commit operation is a retry-able write operation regardless of whether retryWrites is set to true. The drivers retry the commit a single time in case of an error.
  • Read Concern supports snapshot, local and majority values
  • Write Concern can be set at the transaction level. The individual operations inside the transaction ignore the write concern. Write concern is evaluated during the commit
  • Read Preference supports only primary value

Conclusions

Transaction support in MongoDB 4.0 is a very interesting new feature, but it isn’t fully mature yet, there are strong limitations at this stage: a transaction cannot be larger than 16MB, you cannot use it on sharded clusters and others. If you absolutely need a transaction in your application use it. But don’t use transactions only because they are cool, since in some cases a proper data model based on embedding documents in collections and denormalizing your data could be the best solution. MongoDB isn’t by its nature a relational database; as long as you are able to model your data keeping in mind that it’s a NOSQL database you should avoid using transactions. In specific cases, or if you already have a database with strong “informal relations” between the collections that you cannot change, then you could choose to rely on transactions.

Image modified from original photo: by Annie Spratt on Unsplash

Oct
30
2018
--

Release Candidate for Percona Server 8.0.12-2rc1 Is Available

Percona Server for MySQL

Following the alpha release announced earlier, Percona announces the release candidate of Percona Server for MySQL 8.0.12-2rc1 on October 31, 2018. Download the latest version from the Percona website or from the Percona Software Repositories.

This release is based on MySQL 8.0.12 and includes all the bug fixes in it. It is a Release Candidate quality release and it is not intended for production. If you want a high quality, Generally Available release, use the current Stable version (the most recent stable release at the time of writing in the 5.7 series is 5.7.23-23).

Percona provides completely open-source and free software.

Installation

As this is a release candidate, installation is performed by enabling the testing repository and installing the software via your package manager.  For Debian based distributions see apt installation instructions, for RPM based distributions see yum installation instructions.  Note that in both cases after installing the current percona-release package, you’ll need to enable the testing repository in order to install Percona Server for MySQL 8.0.12-2rc1.  For manual installations you can download from the testing repository directly through our website.

New Features

  • #4550: Native Partitioning support for MyRocks storage engine
  • #3911: Native Partitioning support for TokuDB storage engine
  • #4946: Add an option to prevent implicit creation of column family in MyRocks
  • #4839: Better default configuration for MyRocks and TokuDB
  • InnoDB changed page tracking has been rewritten to account for redo logging changes in MySQL 8.0.11.  This fixes fast incremental backups for PS 8.0
  • #4434: TokuDB ROW_FORMAT clause has been removed, compression may be set by using the session variable tokudb_row_format instead.

Improvements

  • Several packaging changes to bring Percona packages more in line with upstream, including split repositories. As you’ll note from our instructions above we now ship a tool with our release packages to help manage this.

Bugs Fixed

  • #4785: Setting version_suffix to NULL could lead to handle_fatal_signal (sig=11) in Sys_var_version::global_value_ptr
  • #4788: Setting log_slow_verbosity and enabling the slow_query_log could lead to a server crash
  • #4947: Any index comment generated a new column family in MyRocks
  • #1107: Binlog could be corrupted when tmpdir got full
  • #1549: Server side prepared statements lead to a potential off-by-second timestamp on slaves
  • #4937: rocksdb_update_cf_options was useless when specified in my.cnf or on command line.
  • #4705: The server could crash on snapshot size check in RocksDB
  • #4791: SQL injection on slave due to non-quoting in binlogged ROLLBACK TO SAVEPOINT
  • #4953: rocksdb.truncate_table3 was unstable

Other bugs fixed:

  • #4811: 5.7 Merge and fixup for old DB-937 introduces possible regression
  • #4885: Using ALTER … ROW_FORMAT=TOKUDB_QUICKLZ leads to InnoDB: Assertion failure: ha_innodb.cc:12198:m_form->s->row_type == m_create_info->row_type
  • Numerous testsuite failures/crashes

Upcoming Features

Oct
30
2018
--

20+ MongoDB Alternatives You Should Know About

alternatives to MongoDB

alternatives to MongoDBAs MongoDB® has changed their license from AGPL to SSPL many are concerned by this change, and by how sudden it has been. Will SSPL be protective enough for MongoDB, or will the next change be to go to an altogether proprietary license? According to our poll, many are going to explore MongoDB alternatives. This blog post provides a brief outline of technologies to consider.

Open Source Data Stores

  • PostgreSQL is the darling of the open source database community. Especially if your concern is the license,  PostgreSQL’s permissive licence is hard to beat. PostgreSQL has powerful JSON Support, and there are many successful stories of migrating from MongoDB to PostgreSQL
  • Citus While PostgreSQL is a powerful database, and you can store terabytes of data on a single cluster, at a larger scale you will need sharding. If so, consider the Citus PostgreSQL extension, or the DBaaS offering from the same guys.
  • TimescaleDB  If on the other hand you are storing  time series data in MongoDB, then TimescaleDB might be a good fit.
  • ToroDB If you would love to use PostgreSQL but need MongoDB wire protocol compatibility, take a look at ToroDB. While it can’t serve as a full drop-in replacement for MongoDB server just yet, the developer told me that with some work it is possible.
  • CockroachDB While not based on the PostgreSQL codebase, CockroachDB is PostgreSQL wire protocol compatible and it is natively distributed, so you will not need to do manual sharding.
  • MySQL® is another feasible replacement. MySQL 5.7 and MySQL 8 have great support for JSON, and it continues to get better with every maintenance release. You can also consider MySQL Cluster for medium size sharded environments. You can also consider MariaDB and Percona Server  for MySQL
  • MySQL DocStore is a CRUD interface for JSON data stored in MySQL, and while it is not the same as MongoDB’s query language, it is much easier to transition to compared to SQL.
  • Vitess Would you love to use MySQL but can’t stand manual sharding? Vitess is a powerful sharding engine for MySQL which will allow you to grow to great scale while using proven MySQL as a backend.
  • TiDB is another take on MySQL compatible sharding. This NewSQL engine is MySQL wire protocol compatible but underneath is a distributed database designed from the ground up.
  • CouchDB is a document database which speaks JSON natively.
  • CouchBase is another database engine to consider. While being a document based database, CouchBase offers the N1QL language which has SQL look and feel.
  • ArangoDB is multi-model database, which can be used as document store.
  • Elastic While not a perfect choice for every MongoDB workload, for workloads where document data is searched and analyzed ElasticSearch can be a great alternative.
  • Redis is another contender for some MongoDB workloads. Often used as a cache in front of MongoDB, it can also be used as a JSON store through extensions.  While such extensions from RedisLabs are no longer open source, GoodForm projects provides open source alternatives.
  • ClickHouse may be a great contender for moving analytical workloads from MongoDB. Much faster, and with JSON support and Nested Data Structures, it can be great choice for storing and analyzing document data.
  • Cassandra does not have a document data model, but it has proven to be extremely successful for building scalable distributed clusters. If this is your main use case for MongoDB, then you should consider Cassandra.
  • ScyllaDB is a protocol compatible Cassandra alternative which claims to offer much higher per node performance.
  • HBase is another option worth considering, especially if you already have a Hadoop/HDFS infrastructure.

Public Cloud Document Stores

Most major cloud providers offer some variant of a native document database for you to consider.

  • Microsoft Azure Cosmos DB is an interesting engine that provides multiple NoSQL APIs, including for MongoDB and Cassandra.
  • Amazon DynamoDB supports key value and document based APIs. While not offering MongoDB compatibility, DynamoDB has been around for a long time, and is the most battle tested of the public cloud database offerings.
  • Google Cloud DataStore  – Google Cloud offers a number of data storage options for you to consider, and Cloud DataStore offers a data model and query language that is the most similar to MongoDB.

If you’re not ready for a major migration effort, there is one more solution for you – Percona Server for MongoDB.  Based on MongoDB Community, and enhanced by Percona with Enterprise Features, Percona Server for MongoDB offers 100% compatibility. As we wrote in a previous post, we commit to shipping a supported AGPL version until the situation around SSPL is clearly resolved.

Want help on deciding what is the best option for you, or with migration heavy lifting? Percona Professional Services can help!

Have idea for another feasible MongoDB alternative?  Please comment, and I will consider adding it to the list!


Image by JOSHUA COLEMAN on Unsplash

Oct
26
2018
--

Announcing Keynotes for Percona Live Europe!

Percona Live Keynotes

There’s just over one week to go so it’s time to announce the keynote addresses for Percona Live Europe 2018! We’re excited to share our lineup of conference keynotes, featuring talks from Paddy Power Betfair, Amazon Web Services, Facebook, PingCap and more!

The speakers will address the current status of key open source database projects MySQL®, PostgreSQL, MongoDB®, and MariaDB®. They’ll be sharing with you how organizations are shifting from a single use database to a polyglot strategy, thereby avoiding vendor lock-in and enabling business growth.

Without further ado, here’s the full keynote line-up for 2018!

Tuesday, November 6

Maximizing the Power and Value of Open Source Databases

Open source database adoption continues to grow in enterprise organizations, as companies look to scale for growth, maintain performance, keep up with changing technologies, control risks and contain costs. In today’s environment, a single database technology or platform is no longer an option, as organizations shift to a best-of-breed, polyglot strategy to avoid vendor lock-in, increase agility and enable business growth. Percona’s CEO Peter Zaitsev shares his perspective.

Following this keynote, there will be a round of lightning talks featuring the latest releases from PostgreSQL, MongoDB and MariaDB.

Technology Lightning Talks

PostgreSQL 11

PostgreSQL benefits from over 20 years of open source development, and has become the preferred open source relational database for developers. PostgreSQL 11 was released on October 18. It provides users with improvements to the overall performance of the database system, with specific enhancements associated with very large databases and high computational workloads.

MongoDB 4.0

Do you love MongoDB? With version 4.0 you have a reason to love it even more! MongoDB 4.0 adds support for multi-document ACID transactions, combining the document model with ACID guarantees. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. And not only transactions – MongoDB 4.0 has more exciting features like non-blocking secondary reads, improved sharding, security improvements, and more.

MariaDB 10.3

MariaDB benefits from a thriving community of contributors. The latest release, MariaDB 10.3, provides several new features not found anywhere else, as well back-ported and reimplemented features from MySQL.

Paddy Power Betfair, Percona, and MySQL

This keynote highlights the collaborative journey Paddy Power Betfair and Percona have taken through the adoption of MySQL within the PPB enterprise. The keynote focuses on how Percona has assisted PPB in adopting MySQL, and how PPB has used this partnership to deliver a full DBaaS for a MySQL solution on OpenStack.

Wednesday 7th November

State of the Dolphin

Geir Høydalsvik (Oracle) will talk about the focus, strategy, investments, and innovations evolving MySQL to power next-generation web, mobile, cloud, and embedded applications. He will also discuss the latest and the most significant MySQL database release ever in its history, MySQL 8.0.

Amazon Relational Database Services (RDS)

Amazon RDS is a fully managed database service that allows you to launch an optimally configured, secure, and highly available database with just a few clicks. It manages time-consuming database administration tasks, freeing you to focus on your applications and business. This keynote features the latest news and announcements from RDS, including the launches of Aurora Serverless, Parallel Query, Backtrack, RDS MySQL 8.0, PostgreSQL 10.0, Performance Insights, and several other recent innovations.

TiDB 2.1, MySQL Compatibility, and Multi-Cloud Deployment

This keynote talk from PingCap will provide an architectural overview of TiDB, how and why it’s MySQL compatible, the latest features and improvements in TiDB 2.1 GA release, and how its multi-cloud fully-managed solution works.

MyRocks in the Real World

In this keynote, Yoshinori Matsunobu, Facebook, will share interesting lessons learned from Facebook’s production deployment and operations of MyRocks and future MyRocks development roadmaps. Vadim Tkachenko, Percona’s CTO, will discuss MyRocks in Percona Server for MySQL and share performance benchmark results from on-premise and cloud deployments.

Don’t miss out, buy your tickets here!

Connect. Accelerate. Innovate.

With a lot of focus on the benefits of open source over proprietary models of software delivery, you surely can’t afford to miss this opportunity to connect with leading figures of the open source database world. On Monday, November 5 you can opt to accelerate your knowledge with our in-depth tutorials, or choose to attend our business track geared towards open source innovation and adoption.

Tuesday and Wednesday with sessions across 8 different tracks, there’s something for all levels of experience, addressing a range of business challenges. See the full schedule.

With thanks to our sponsors!

Platinum: AWS, Percona
Gold: Facebook
Silver: Altinity, PingCap, Shannon Systems, OlinData, MySQL
Startup: Silver Nines
Community: PostgreSQL, MariaDB Foundation
Contributing: Intel Optane, Idera, Studio3T

Media Sponsors: Datanami, Enterprise Tech, HPC Wire, ODBMS.org, Database Trends and Applications, Packt

Percona Live Keynotes

 

Oct
25
2018
--

Percona Live Europe 2018: Our Sponsors

Sponsors PLE 2018

Without our sponsors, it would be almost out of reach to deliver a conference of the size and format  that everyone has come to expect from Percona Live. As well as financial support, our sponsors contribute massively by supporting their teams in presenting at the conference, and adding to the quality and atmosphere of the event. Having their support means we can present excellent in-depth technical content for the tutorials and talks, and that’s highly valued by conference delegates. This year, too, Amazon Web Services (AWS) sponsors the cloud track on day two, with a superb line up of cloud content.

Here’s a shout out to our sponsors, you’ll find more information on the Percona Live sponsors page:

Platinum

aws

 

For over 12 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. https://aws.amazon.com/

Gold

facebookFacebook offer a fantastic contribution to open source databases with MyRocks and are greatly appreciated for their ongoing support of Percona Live.
https://www.facebook.com

Silver

altinityAltinity is the leading service provider for ClickHouse
https://www.altinity.com/

PingCAP
PingCAP is the company and core team building TiDB, a popular open-source MySQL-compatible NewSQL hybrid database.
https://www.pingcap.com/en/

Shannon Systems

Shannon Systems is a global leader in providing enterprise-grade Flash storage devices and system solutions.
http://en.shannon-sys.com/

OlinData

OlinData is an open source infrastructure management company providing services to help companies from small to large with their infrastructure.
https://www.olindata.com/en

MySQL

MySQL is the world’s most popular OS database, delivered by Oracle.
https://www.mysql.com/

Start Up

SeveralNines
SeveralNines provide automation and management software for MySQL, MariaDB and MongoDB clusters
https://severalnines.com/

Community Sponsors

PostgreSQL
PostgreSQL is a powerful, open source object-relational database system.
https://www.postgresql.org/
MariaDB Foundation

MariaDB Server is one of the most popular database servers in the world.
https://mariadb.org

Branding

Intel OPTANE
Intel is the world’s leading technology company, powering the cloud and billions of smart, connected computing devices.
https://www.intel.com

Idera
IDERA designs powerful software with one goal in mind – to solve customers’ most complex challenges with elegant solutions.
https://www.idera.com/

Studio 3T

Studio 3T is a GUI and IDE for developers and data engineers who work with MongoDB.
https://studio3t.com/

Media

  • datanami online portal for data science, AI and advanced analytics
  • Enterprise Tech online portal addressing high performance computing technologies at scale
  • HPC Wire covering the fastest computers in the world and the people who run them
  • odbms.org a resource portal for big data, new data management technologies, data science and AI
  • Packt online technical publications and videos

Thanks again to all – appreciated!

Sponsors PLE 2018

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com