Oct
30
2018
--

Release Candidate for Percona Server 8.0.12-2rc1 Is Available

Percona Server for MySQL

Following the alpha release announced earlier, Percona announces the release candidate of Percona Server for MySQL 8.0.12-2rc1 on October 31, 2018. Download the latest version from the Percona website or from the Percona Software Repositories.

This release is based on MySQL 8.0.12 and includes all the bug fixes in it. It is a Release Candidate quality release and it is not intended for production. If you want a high quality, Generally Available release, use the current Stable version (the most recent stable release at the time of writing in the 5.7 series is 5.7.23-23).

Percona provides completely open-source and free software.

Installation

As this is a release candidate, installation is performed by enabling the testing repository and installing the software via your package manager.  For Debian based distributions see apt installation instructions, for RPM based distributions see yum installation instructions.  Note that in both cases after installing the current percona-release package, you’ll need to enable the testing repository in order to install Percona Server for MySQL 8.0.12-2rc1.  For manual installations you can download from the testing repository directly through our website.

New Features

  • #4550: Native Partitioning support for MyRocks storage engine
  • #3911: Native Partitioning support for TokuDB storage engine
  • #4946: Add an option to prevent implicit creation of column family in MyRocks
  • #4839: Better default configuration for MyRocks and TokuDB
  • InnoDB changed page tracking has been rewritten to account for redo logging changes in MySQL 8.0.11.  This fixes fast incremental backups for PS 8.0
  • #4434: TokuDB ROW_FORMAT clause has been removed, compression may be set by using the session variable tokudb_row_format instead.

Improvements

  • Several packaging changes to bring Percona packages more in line with upstream, including split repositories. As you’ll note from our instructions above we now ship a tool with our release packages to help manage this.

Bugs Fixed

  • #4785: Setting version_suffix to NULL could lead to handle_fatal_signal (sig=11) in Sys_var_version::global_value_ptr
  • #4788: Setting log_slow_verbosity and enabling the slow_query_log could lead to a server crash
  • #4947: Any index comment generated a new column family in MyRocks
  • #1107: Binlog could be corrupted when tmpdir got full
  • #1549: Server side prepared statements lead to a potential off-by-second timestamp on slaves
  • #4937: rocksdb_update_cf_options was useless when specified in my.cnf or on command line.
  • #4705: The server could crash on snapshot size check in RocksDB
  • #4791: SQL injection on slave due to non-quoting in binlogged ROLLBACK TO SAVEPOINT
  • #4953: rocksdb.truncate_table3 was unstable

Other bugs fixed:

  • #4811: 5.7 Merge and fixup for old DB-937 introduces possible regression
  • #4885: Using ALTER … ROW_FORMAT=TOKUDB_QUICKLZ leads to InnoDB: Assertion failure: ha_innodb.cc:12198:m_form->s->row_type == m_create_info->row_type
  • Numerous testsuite failures/crashes

Upcoming Features

Oct
30
2018
--

20+ MongoDB Alternatives You Should Know About

alternatives to MongoDB

alternatives to MongoDBAs MongoDB® has changed their license from AGPL to SSPL many are concerned by this change, and by how sudden it has been. Will SSPL be protective enough for MongoDB, or will the next change be to go to an altogether proprietary license? According to our poll, many are going to explore MongoDB alternatives. This blog post provides a brief outline of technologies to consider.

Open Source Data Stores

  • PostgreSQL is the darling of the open source database community. Especially if your concern is the license,  PostgreSQL’s permissive licence is hard to beat. PostgreSQL has powerful JSON Support, and there are many successful stories of migrating from MongoDB to PostgreSQL
  • Citus While PostgreSQL is a powerful database, and you can store terabytes of data on a single cluster, at a larger scale you will need sharding. If so, consider the Citus PostgreSQL extension, or the DBaaS offering from the same guys.
  • TimescaleDB  If on the other hand you are storing  time series data in MongoDB, then TimescaleDB might be a good fit.
  • ToroDB If you would love to use PostgreSQL but need MongoDB wire protocol compatibility, take a look at ToroDB. While it can’t serve as a full drop-in replacement for MongoDB server just yet, the developer told me that with some work it is possible.
  • CockroachDB While not based on the PostgreSQL codebase, CockroachDB is PostgreSQL wire protocol compatible and it is natively distributed, so you will not need to do manual sharding.
  • MySQL® is another feasible replacement. MySQL 5.7 and MySQL 8 have great support for JSON, and it continues to get better with every maintenance release. You can also consider MySQL Cluster for medium size sharded environments. You can also consider MariaDB and Percona Server  for MySQL
  • MySQL DocStore is a CRUD interface for JSON data stored in MySQL, and while it is not the same as MongoDB’s query language, it is much easier to transition to compared to SQL.
  • Vitess Would you love to use MySQL but can’t stand manual sharding? Vitess is a powerful sharding engine for MySQL which will allow you to grow to great scale while using proven MySQL as a backend.
  • TiDB is another take on MySQL compatible sharding. This NewSQL engine is MySQL wire protocol compatible but underneath is a distributed database designed from the ground up.
  • CouchDB is a document database which speaks JSON natively.
  • CouchBase is another database engine to consider. While being a document based database, CouchBase offers the N1QL language which has SQL look and feel.
  • ArangoDB is multi-model database, which can be used as document store.
  • Elastic While not a perfect choice for every MongoDB workload, for workloads where document data is searched and analyzed ElasticSearch can be a great alternative.
  • Redis is another contender for some MongoDB workloads. Often used as a cache in front of MongoDB, it can also be used as a JSON store through extensions.  While such extensions from RedisLabs are no longer open source, GoodForm projects provides open source alternatives.
  • ClickHouse may be a great contender for moving analytical workloads from MongoDB. Much faster, and with JSON support and Nested Data Structures, it can be great choice for storing and analyzing document data.
  • Cassandra does not have a document data model, but it has proven to be extremely successful for building scalable distributed clusters. If this is your main use case for MongoDB, then you should consider Cassandra.
  • ScyllaDB is a protocol compatible Cassandra alternative which claims to offer much higher per node performance.
  • HBase is another option worth considering, especially if you already have a Hadoop/HDFS infrastructure.

Public Cloud Document Stores

Most major cloud providers offer some variant of a native document database for you to consider.

  • Microsoft Azure Cosmos DB is an interesting engine that provides multiple NoSQL APIs, including for MongoDB and Cassandra.
  • Amazon DynamoDB supports key value and document based APIs. While not offering MongoDB compatibility, DynamoDB has been around for a long time, and is the most battle tested of the public cloud database offerings.
  • Google Cloud DataStore  – Google Cloud offers a number of data storage options for you to consider, and Cloud DataStore offers a data model and query language that is the most similar to MongoDB.

If you’re not ready for a major migration effort, there is one more solution for you – Percona Server for MongoDB.  Based on MongoDB Community, and enhanced by Percona with Enterprise Features, Percona Server for MongoDB offers 100% compatibility. As we wrote in a previous post, we commit to shipping a supported AGPL version until the situation around SSPL is clearly resolved.

Want help on deciding what is the best option for you, or with migration heavy lifting? Percona Professional Services can help!

Have idea for another feasible MongoDB alternative?  Please comment, and I will consider adding it to the list!


Image by JOSHUA COLEMAN on Unsplash

Oct
26
2018
--

Announcing Keynotes for Percona Live Europe!

Percona Live Keynotes

There’s just over one week to go so it’s time to announce the keynote addresses for Percona Live Europe 2018! We’re excited to share our lineup of conference keynotes, featuring talks from Paddy Power Betfair, Amazon Web Services, Facebook, PingCap and more!

The speakers will address the current status of key open source database projects MySQL®, PostgreSQL, MongoDB®, and MariaDB®. They’ll be sharing with you how organizations are shifting from a single use database to a polyglot strategy, thereby avoiding vendor lock-in and enabling business growth.

Without further ado, here’s the full keynote line-up for 2018!

Tuesday, November 6

Maximizing the Power and Value of Open Source Databases

Open source database adoption continues to grow in enterprise organizations, as companies look to scale for growth, maintain performance, keep up with changing technologies, control risks and contain costs. In today’s environment, a single database technology or platform is no longer an option, as organizations shift to a best-of-breed, polyglot strategy to avoid vendor lock-in, increase agility and enable business growth. Percona’s CEO Peter Zaitsev shares his perspective.

Following this keynote, there will be a round of lightning talks featuring the latest releases from PostgreSQL, MongoDB and MariaDB.

Technology Lightning Talks

PostgreSQL 11

PostgreSQL benefits from over 20 years of open source development, and has become the preferred open source relational database for developers. PostgreSQL 11 was released on October 18. It provides users with improvements to the overall performance of the database system, with specific enhancements associated with very large databases and high computational workloads.

MongoDB 4.0

Do you love MongoDB? With version 4.0 you have a reason to love it even more! MongoDB 4.0 adds support for multi-document ACID transactions, combining the document model with ACID guarantees. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. And not only transactions – MongoDB 4.0 has more exciting features like non-blocking secondary reads, improved sharding, security improvements, and more.

MariaDB 10.3

MariaDB benefits from a thriving community of contributors. The latest release, MariaDB 10.3, provides several new features not found anywhere else, as well back-ported and reimplemented features from MySQL.

Paddy Power Betfair, Percona, and MySQL

This keynote highlights the collaborative journey Paddy Power Betfair and Percona have taken through the adoption of MySQL within the PPB enterprise. The keynote focuses on how Percona has assisted PPB in adopting MySQL, and how PPB has used this partnership to deliver a full DBaaS for a MySQL solution on OpenStack.

Wednesday 7th November

State of the Dolphin

Geir Høydalsvik (Oracle) will talk about the focus, strategy, investments, and innovations evolving MySQL to power next-generation web, mobile, cloud, and embedded applications. He will also discuss the latest and the most significant MySQL database release ever in its history, MySQL 8.0.

Amazon Relational Database Services (RDS)

Amazon RDS is a fully managed database service that allows you to launch an optimally configured, secure, and highly available database with just a few clicks. It manages time-consuming database administration tasks, freeing you to focus on your applications and business. This keynote features the latest news and announcements from RDS, including the launches of Aurora Serverless, Parallel Query, Backtrack, RDS MySQL 8.0, PostgreSQL 10.0, Performance Insights, and several other recent innovations.

TiDB 2.1, MySQL Compatibility, and Multi-Cloud Deployment

This keynote talk from PingCap will provide an architectural overview of TiDB, how and why it’s MySQL compatible, the latest features and improvements in TiDB 2.1 GA release, and how its multi-cloud fully-managed solution works.

MyRocks in the Real World

In this keynote, Yoshinori Matsunobu, Facebook, will share interesting lessons learned from Facebook’s production deployment and operations of MyRocks and future MyRocks development roadmaps. Vadim Tkachenko, Percona’s CTO, will discuss MyRocks in Percona Server for MySQL and share performance benchmark results from on-premise and cloud deployments.

Don’t miss out, buy your tickets here!

Connect. Accelerate. Innovate.

With a lot of focus on the benefits of open source over proprietary models of software delivery, you surely can’t afford to miss this opportunity to connect with leading figures of the open source database world. On Monday, November 5 you can opt to accelerate your knowledge with our in-depth tutorials, or choose to attend our business track geared towards open source innovation and adoption.

Tuesday and Wednesday with sessions across 8 different tracks, there’s something for all levels of experience, addressing a range of business challenges. See the full schedule.

With thanks to our sponsors!

Platinum: AWS, Percona
Gold: Facebook
Silver: Altinity, PingCap, Shannon Systems, OlinData, MySQL
Startup: Silver Nines
Community: PostgreSQL, MariaDB Foundation
Contributing: Intel Optane, Idera, Studio3T

Media Sponsors: Datanami, Enterprise Tech, HPC Wire, ODBMS.org, Database Trends and Applications, Packt

Percona Live Keynotes

 

Oct
25
2018
--

Percona Live Europe 2018: Our Sponsors

Sponsors PLE 2018

Without our sponsors, it would be almost out of reach to deliver a conference of the size and format  that everyone has come to expect from Percona Live. As well as financial support, our sponsors contribute massively by supporting their teams in presenting at the conference, and adding to the quality and atmosphere of the event. Having their support means we can present excellent in-depth technical content for the tutorials and talks, and that’s highly valued by conference delegates. This year, too, Amazon Web Services (AWS) sponsors the cloud track on day two, with a superb line up of cloud content.

Here’s a shout out to our sponsors, you’ll find more information on the Percona Live sponsors page:

Platinum

aws

 

For over 12 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. https://aws.amazon.com/

Gold

facebookFacebook offer a fantastic contribution to open source databases with MyRocks and are greatly appreciated for their ongoing support of Percona Live.
https://www.facebook.com

Silver

altinityAltinity is the leading service provider for ClickHouse
https://www.altinity.com/

PingCAP
PingCAP is the company and core team building TiDB, a popular open-source MySQL-compatible NewSQL hybrid database.
https://www.pingcap.com/en/

Shannon Systems

Shannon Systems is a global leader in providing enterprise-grade Flash storage devices and system solutions.
http://en.shannon-sys.com/

OlinData

OlinData is an open source infrastructure management company providing services to help companies from small to large with their infrastructure.
https://www.olindata.com/en

MySQL

MySQL is the world’s most popular OS database, delivered by Oracle.
https://www.mysql.com/

Start Up

SeveralNines
SeveralNines provide automation and management software for MySQL, MariaDB and MongoDB clusters
https://severalnines.com/

Community Sponsors

PostgreSQL
PostgreSQL is a powerful, open source object-relational database system.
https://www.postgresql.org/
MariaDB Foundation

MariaDB Server is one of the most popular database servers in the world.
https://mariadb.org

Branding

Intel OPTANE
Intel is the world’s leading technology company, powering the cloud and billions of smart, connected computing devices.
https://www.intel.com

Idera
IDERA designs powerful software with one goal in mind – to solve customers’ most complex challenges with elegant solutions.
https://www.idera.com/

Studio 3T

Studio 3T is a GUI and IDE for developers and data engineers who work with MongoDB.
https://studio3t.com/

Media

  • datanami online portal for data science, AI and advanced analytics
  • Enterprise Tech online portal addressing high performance computing technologies at scale
  • HPC Wire covering the fastest computers in the world and the people who run them
  • odbms.org a resource portal for big data, new data management technologies, data science and AI
  • Packt online technical publications and videos

Thanks again to all – appreciated!

Sponsors PLE 2018

Oct
22
2018
--

Percona Live Europe Presents … In Their Own Words

Percona Live Europe 2018 two weeks to go

Percona Live Europe 2018 two weeks to goFor those who are looking forward to Percona Live Europe in just two weeks time—and for those yet to make up their minds—some of our presenters have shared some insight into their talks and what they’re most looking forward to themselves. Make no mistake, this is one of the most exciting events in the conference calendar for those of us who work with open source databases.

This year, our conference previews are being hosted over on the Percona community blog and the posts have been written by the presenters.

Percona Live Europe presents…

Here are the first six posts in this series of Percona Live Europe presents. There are more to come, so do come back over the next few days to see if any of the writers can help you pinpoint the talks that you are most interested in attending this year:

  • Dinesh Joshi will be taking a look at boosting Apache Cassandra’s performance using Netty
  • Federico Razzoli on why he’s investigating MariaDB system versioned tables
  • Jaime Crespo of Wikimedia Foundation will be presenting a entry level (but detailed) tutorial on query optimization, and a break out talk on TLS security, you can find out more in his blog post
  • Tiago Jorge of Oracle on his talk about MySQL 8.0 replication
  • There’s going to be an ElasticSearch 101 tutorial presented by three of the team from ObjectRocket—Antonios Giannopoulos tells you more about that stellar opportunity—while last but by no means least…
  • Arjen Lentz, new CEO of MariaDB Foundation, is keen to share with you the latest information on MariaDB 10.3

Tantalized? Keep meaning to book your seat? There’s not long left now, so head straight to the registration page and book your place. Percona Live Europe will be in  Frankfurt from November 5-7 2018.

About the community blog

We’re really pleased that the community blog is gaining some great support. It offers a platform for all to write on the general topic of open source databases. Commercial and non-commercial. Those who are already prolific bloggers, and those who maybe only want to write a blog or two on a topic that they feel strongly about. If you’d like to join us and write for the community blog, please get in touch! You can email me.

Oct
03
2018
--

Percona Live Europe 2018 Session Programme Published

PLE 2018 Full Agenda Announced

PLE 2018 Full Agenda AnnouncedOffering over 110 conference sessions across Tuesday, 6 and Wednesday, 7 November, and a full tutorial day on Monday 5 November, we hope you’ll find that this fantastic line up of talks for Percona Live Europe 2018 to be one of our best yet! Innovation in technology continues to arrive at an accelerated rate, and you’ll find plenty to help you connect with the latest developments in open source database technologies at this acclaimed annual event.

Representatives from companies at the leading edge of our industry use the platform offered by Percona Live to showcase their latest developments and share plans for the future. If your career is dependent upon the use of open source database technologies you should not miss this conference!

Conference Session Schedule

Conference sessions will take place on Tuesday and Wednesday, November 6-7 and will feature more than 110 in-depth talks by industry experts. Conference session examples include:

  • Deep Dive on MySQL Databases on Amazon RDS – Chayan Biswas, AWS
  • MySQL 8.0 Performance: Scalability & Benchmarks – Dimitri Kravtchuk, Oracle
  • MySQL 8 New Features: Temptable engine – Pep Pla, Pythian
  • Artificial Intelligence Database Performance Tuning – Roel Van de Paar, Percona
  • pg_chameleon – MySQL to PostgreSQL replica made easy – Federico Campoli, Loxodata
  • Highway to Hell or Stairway to Cloud? – Alexander Kukushkin, Zalando
  • Zero to Serverless in 60 Seconds – Sandor Maurice, AWS
  • A Year in Google Cloud – Carmen Mason, Alan Mason, Vital Source Technologies
  • Advanced MySQL Data at Rest Encryption in Percona Server for MySQL – Iwo Panowicz, Percona, and Bart?omiej Ole?, Severalnines
  • Monitoring Kubernetes with Prometheus – Henri Dubois-Ferriere, Sysdig
  • How We Use and Improve Percona XtraBackup at Alibaba Cloud – Bo Wang, Alibaba Cloud
  • Shard 101 – Adamo Tonete, Percona
  • Top 10 Mistakes When Migrating From Oracle to PostgreSQL – Jim Mlodgenski, AWS
  • Explaining the Postgres Query Optimizer – Bruce Momjian, EnterpriseDB
  • MariaDB 10.3 Optimizer and Beyond – Vicentiu Ciorbaru, MariaDB FoundationHA and Clustering Solution: ProxySQL as an Intelligent Router for Galera and Group Replication – René Cannaò, ProxySQL
  • MongoDB WiredTiger WriteConflicts – Paul Agombin, ObjectRocket
  • PostgreSQL Enterprise Features – Michael Banck, credativ GmbH
  • What’s New in MySQL 8.0 Security – Georgi Kodinov, Oracle
  • The MariaDB Foundation and Security – Finding and Fixing Vulnerabilities the Open Source Way – Otto Kekäläinen, MariaDB Foundation
  • ClickHouse 2018: How to Stop Waiting for Your Queries to Complete and Start Having Fun – Alexander Zaitsev, Altinity
  • Open Source Databases and Non-Volatile Memory – Frank Ober, Intel Memory Group
  • MyRocks Production Case Studies at Facebook – Yoshinori Matsunobu, Facebook
  • Need for Speed: Boosting Apache Cassandra’s Performance Using Netty – Dinesh Joshi, Apache Cassandra
  • Demystifying MySQL Replication Crash Safety – Jean-François Gagné, Messagebird

See the full list of sessions

Tutorial schedule

Tutorials will take place throughout the day on Monday, November 5, 2018. Tutorial session examples include:

  • Query Optimization with MySQL 8.0 and MariaDB 10.3: The Basics – Jaime Crespo, Wikimedia Foundation
  • ElasticSearch 101 – Antonios Giannopoulos, ObjectRocket
  • MySQL InnoDB Cluster in a Nutshell: The Saga Continues with 8.0 – Frédéric Descamps, Oracle
  • High Availability PostgreSQL and Kubernetes with Google Cloud – Alexis Guajardo, Google
  • Best Practices for High Availability – Alex Rubin and Alex Poritskiy, Percona

See the full list of tutorials.

Sponsors

We are grateful for the support of our sponsors:

  • Platinum – AWS
  • Silver – Altinity, PingCap
  • Start Up – Severalnines
  • Branding – Intel, Idera
  • Expo – Postgres EU

If you would like to join them Sponsorship opportunities for Percona Live Open Source Database Conference Europe 2018 are available and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

Ready to register? What are you waiting for? Costs will only get higher!
Register now!

 

 

Sep
27
2018
--

Announcement: Alpha Build of Percona Server 8.0

Percona Server for MySQL

Percona server for MySQLAlpha Build of Percona Server 8.0 released

An alpha version of Percona Server 8.0 is now available in the Percona experimental software repositories. This is a 64-bit release only. 

You may experiment with this alpha release by running it in a Docker container:

$ docker run -d -e MYSQL_ROOT_PASSWORD=password -p 3306:3306 perconalab/percona-server:8.0.12.alpha

When the container starts, connect to it as follows:

$ docker exec -ti $(docker ps | grep -F percona-server:8.0.12.alpha | awk '{print $1}') mysql -uroot -ppassword

Note that this release is not ready for use in any production environment.

Percona Server 8.0 alpha is available for the following platforms:

  • RHEL/Centos 6.x
  • RHEL/Centos 7.x
  • Ubuntu 16.04 Xenial
  • Ubuntu 18.04 Bionic
  • Debian 8 Jessie
  • Debian 9 Stretch

Note: The list of supported platforms may be different in the GA release.

Fixed Bugs:

  • PS-4814: TokuDB ‘fast’ replace into is incompatible with 8.0 row replication
  • PS-4834: The encrypted system tablespace has empty uuid

Other fixed bugs: PS-4788PS-4631PS-4736, PS-4818PS-4755

Unfinished Features

The following features are work in progress and are not yet in a working state:

  • Column compression with Data Dictionaries
  • Native Partitioning for TokuDB and for MyRocks
  • Encryption
    • Key Rotation
    • Scrubbing

Known Issues

  • PS-4788: Setting log_slow_verbosity and enabling the slow_query_log could lead to a server crash
  • PS-4803: ALTER TABLE … ADD INDEX … LOCK crash | handle_fatal_signal (sig=11) in dd_table_has_instant_cols
  • PS-4896: handle_fatal_signal (sig=11) in THD::thread_id likely due to enabling innodb_print_lock_wait_timeout_info
  • PS-4820: PS crashes with keyring_vault encryption
  • PS-4796: 8.0 DD and atomic DDL breaks DROP DATABASE for engines that store files in database directory
  • PS-4898: Crash during PAM authentication plugin installation.
  • PS-1782: Optimizer chooses wrong plan when joining 2 tables
  • PS-4850: Toku hot backup plugin dumps tons of info to stdout with no way to disable it
  • PS-4797: rpl.rpl_master_errors failing, likely due to binlog encryption
  • PS-4800: Recovery of prepared XA transactions seems broken in 8.0
  • PS-4853: Installing audit_log plugin causes server to crash
  • PS-4855: Replace http with https in http://bugs.percona.com in server crash messages
  • PS-4857: Improve error message handling for compressed columns
  • PS-4895: Improve error message when encrypted system tablespace was started without keyring plugin
  • PS-3944: Single variable to control logging in QRT
  • PS-4705: crash on snapshot size check in RocksDB
  • PS-4885: Using ALTER … ROW_FORMAT=TOKUDB_QUICKLZ leads to InnoDB: Assertion failure: ha_innodb.cc:12198:m_form->s->row_type == m_create_info->row_type

The post Announcement: Alpha Build of Percona Server 8.0 appeared first on Percona Database Performance Blog.

Sep
21
2018
--

This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This week is clearly what I’d call a “MariaDB week” — plenty of announcements coming from MariaDB Corporation and MariaDB Foundation.

It started with Alibaba Cloud and MariaDB Announce the Launch of ApsaraDB RDS for MariaDB TX, which makes Alibaba Cloud the first public cloud to offer the enterprise offering of MariaDB, MariaDB TX 3.0. It is not available yet as of this announcement for rolling out from the interface, but I expect it will be soon. Exciting, as you can already get MariaDB Server on Amazon RDS for MariaDB, and you can join the waitlist preview for Azure.

MariaDB Corporation has received more funding from ServiceNow Ventures in the Series C round, and has gained a new board member in Pat Casey. ServiceNow is a user of MariaDB, and “ServiceNow’s platform runs on up to 85,000 MariaDB databases that serve more than 25 billion queries per hour.” There was an excellent keynote session at M|18 about how ServiceNow uses MariaDB. The Register refers to this as “protecting ServiceNow’s toolchain”.

For good measure, MariaDB acquired Clustrix as well. This is the second acquisition after MammothDB earlier in the year. It is worth reading the TechCrunch take on this. Clustrix, a Y Combinator company, has been around since 2006 and raised $72 million. The price of the acquisition was not announced. For a bit of behind the scenes chatter from ex-employee shareholders, Hacker News delivers.

From a MariaDB Foundation standpoint, we see Otto Kekäläinen, the MariaDB Foundation CEO stepping down. Thanks for all your hard work Otto! And maybe you missed it, but not long ago, Percona Became a Bronze Sponsor of MariaDB Foundation.

Speaking of conferences, the tutorial schedule and a sneak peek of sessions for Percona Live Europe Frankfurt have been announced. In addition, the Call for Papers – 2019 Annual MariaDB User Conference closes October 31, 2018.

Releases

Link List

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update appeared first on Percona Database Performance Blog.

Sep
06
2018
--

MongoDB: Investigate Queries with explain() and Index Usage (part 2)

MongoDB explain() method optimization tool

MongoDB explain() method optimization toolThis is the second part of a two parts series. In MongoDB: index usage and MongoDB explain() we introduced the main index types supported by MongoDB, and how to create and use them. In this second article, we are going to see some examples on how to use explain() method to investigate queries. Do you need to optimize MongoDB queries? You’ll see how to use explain() to find out how your query uses indexes. Or, perhaps, that it doesn’t!

What is explain()

explain() is a method that you can apply to simple queries or to cursors to investigate the query execution plan. The execution plan is how MongoDB resolves a query. Looking at all the information returned by explain() we can see find out stuff like:

  • how many documents were scanned
  • how many documents were returned
  • which index was used
  • how long the query took to be executed
  • which alternative execution plans were evaluated

…and other useful information.

The aim of using explain() is to find out how to improve the query. For example, by creating missing indexes or by rewriting it in order to use existing indexes more correctly. If you are familiar with the EXPLAIN command in MySQL, the goals of MongoDB’s explain() method are exactly the same.

Explainable object

You can apply the explain() method to a query or a cursor in the following way, as per any other method:

MongoDB > db.mycollection.find().explain()

However, the preferred way to investigate queries in the mongo shell is to create first the explainable object.

We can create an explainable object like this:

MongoDB > var myexp = db.mycollection.explain()

Once you have created the explainable object, then any kind of operation can be run against it to investigate a query or cursor execution plan. For example:

MongoDB > myexp.find()
MongoDB > myexp.update()
MongoDB > myexp.remove()
MongoDB > myexp.aggregate()

Restaurants test database

To see some examples we need a collection with some data.
For our purposes we can use the New York restaurants database. You can download this from the following url:

https://www.w3resource.com/mongodb-exercises/retaurants.zip

Unzip the archive, and import the JSON file into MongoDB:

$ unzip retaurants.zip
$ mongoimport --host 127.0.0.1 -d test -c restaurants --file retaurants.json

This collection has 3772 documents, all the restaurants in New York City. Here is a document sample.

MongoDB > use test
switched to db test
MongoDB > db.restaurants.find().pretty().limit(1)
{
	"_id" : ObjectId("5b71b3281979e24aa18c0121"),
	"address" : {
		"building" : "1007",
		"coord" : [
			-73.856077,
			40.848447
		],
		"street" : "Morris Park Ave",
		"zipcode" : "10462"
	},
	"borough" : "Bronx",
	"cuisine" : "Bakery",
	"grades" : [
		{
			"date" : ISODate("2014-03-03T00:00:00Z"),
			"grade" : "A",
			"score" : 2
		},
		{
			"date" : ISODate("2013-09-11T00:00:00Z"),
			"grade" : "A",
			"score" : 6
		},
		{
			"date" : ISODate("2013-01-24T00:00:00Z"),
			"grade" : "A",
			"score" : 10
		},
		{
			"date" : ISODate("2011-11-23T00:00:00Z"),
			"grade" : "A",
			"score" : 9
		},
		{
			"date" : ISODate("2011-03-10T00:00:00Z"),
			"grade" : "B",
			"score" : 14
		}
	],
	"name" : "Morris Park Bake Shop",
	"restaurant_id" : "30075445"
}

Explain() verbosity

The explain() method has three verbosity modes.

  • queryPlanner – this is the default mode. At this level, explain provides information about the winning plan, including the index used or if a collection scan is needed (COLLSCAN)
  • executionStats – this mode includes all the information provided by the queryPlanner, plus the statistics. Statistics include details such as the number of documents examined and returned, the execution time in milliseconds, and so on.
  • allPlansExecution – this mode includes all the information provided by the executionStats plus information about the discarded execution plans

We’ll see the explain() output in the following examples.

Example 1

It’s time to use the restaurants collection to run our first example: find out all the restaurants in the Manhattan borough.

Let’s create first the explainable object with the executionStats mode.

MongoDB > var exp = db.restaurants.explain("executionStats")

Then let’s investigate the query.

MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 3772,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"borough" : {
					"$eq" : "Manhattan"
				}
			},
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 3774,
			"advanced" : 1883,
			"needTime" : 1890,
			"needYield" : 0,
			"saveState" : 29,
			"restoreState" : 29,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 3772
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Here we can see the output of the explain(). First of all, we can clearly distinguish the “queryPlanner” and the “executionStats” modes. We won’t describe each one of the values, as some are really intuitive. Let’s have a look just at some of them:

queryPlanner.winningPlan.stage = “COLLSCAN”
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query it’s not optimized because all the documents must be read.

queryPlanner.winningPlan.rejectedPlans = []
It’s empty. There aren’t rejected plans. When the query needs to be executed with COLLSCAN the only execution plan is the winning plan. We don’t have any indexes in the collection, apart the one on _id, so there aren’t other execution plans.

executionStats.nReturned = 1883
The number of documents returned is 1883, the number of restaurants located in Manhattan.

executionStats.totalDocsExamined = 3772
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN

executionStats.executionTimeMillis = 1
The execution time of the query. It’s just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!

How can we improve the query?

In this case it’s simple. Let’s try to create a single field index on borough, the only condition we have in the find(). Then let’s try to explain the same query again.

MongoDB > db.restaurants.createIndex( {borough: 1} )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
MongoDB > exp.find( { borough: "Manhattan"} )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"borough" : {
				"$eq" : "Manhattan"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1883,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 1883,
		"totalDocsExamined" : 1883,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 1883,
			"executionTimeMillisEstimate" : 0,
			"works" : 1884,
			"advanced" : 1883,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 14,
			"restoreState" : 14,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 1883,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 1883,
				"executionTimeMillisEstimate" : 0,
				"works" : 1884,
				"advanced" : 1883,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 14,
				"restoreState" : 14,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"borough" : 1
				},
				"indexName" : "borough_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"borough" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"borough" : [
						"[\"Manhattan\", \"Manhattan\"]"
					]
				},
				"keysExamined" : 1883,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Interpreting the output

Now, the output is completely different. Let’s have a look at some of the most relevant values.

queryPlanner.winningPlan.inputStage.stage = “IXSCAN”
This is very important. IXSCAN means that now MongoDB doesn’t need to do a collection scan but an index can be used to find the documents.

queryPlanner.winningPlan.inputStage.indexName = “borough_1”
The name of the index used. This is the default name of an index: the name of the field plus the _1 for ascending or _-1 for descending order.

queryPlanner.winningPlan.inputStage.direction = “forward”
MongoDB traverses the index in a forward direction.

executionStats.nRertuned = 1883
The number of documents returned. Obviously this is the same as before.

executionStats.totalKeysExamined = 1883
The number of keys examined in the index.

executionStats.totalDocsExamined  = 1883
Now the number of documents examined corresponds to the number of elements examined in the index.

We have optimized the query.

Example 2

Now we would like to examine a query to find out all the restaurants with Italian cuisine that received a grade score of greater than 50.

MongoDB > var exp = db.restaurants.explain()
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"cuisine" : {
							"$eq" : "Italian"
						}
					},
					{
						"grades.score" : {
							"$gt" : 50
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

We have a COLLSCAN again. Let’s try to improve the query creating an index on the cuisine field.

MongoDB > var exp = db.restaurants.explain("executionStats")
MongoDB > db.restaurants.createIndex({cuisine:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 4,
		"totalKeysExamined" : 325,
		"totalDocsExamined" : 325,
		"executionStages" : {
			"stage" : "FETCH",
			"filter" : {
				"grades.score" : {
					"$gt" : 50
				}
			},
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 326,
			"advanced" : 6,
			"needTime" : 319,
			"needYield" : 0,
			"saveState" : 2,
			"restoreState" : 2,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 325,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 325,
				"executionTimeMillisEstimate" : 0,
				"works" : 326,
				"advanced" : 325,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 2,
				"restoreState" : 2,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					]
				},
				"keysExamined" : 325,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

The query has improved. The created index cuisine_1 is used, but still we have 325 documents examined with only 6 documents returned. Let’s see if we can do better by creating instead a compound index that uses both the fields in the condition: cuisine and grades.score.

MongoDB > db.restaurants.createIndex({cuisine:1, "grades.score":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 3,
	"numIndexesAfter" : 4,
	"ok" : 1
}
MongoDB > exp.find({$and: [  {"cuisine" : {$eq :"Italian"}},  {"grades.score" : {$gt : 50}}  ]  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.restaurants",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"cuisine" : {
						"$eq" : "Italian"
					}
				},
				{
					"grades.score" : {
						"$gt" : 50
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "FETCH",
				"filter" : {
					"grades.score" : {
						"$gt" : 50
					}
				},
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"cuisine" : 1
					},
					"indexName" : "cuisine_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"cuisine" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"cuisine" : [
							"[\"Italian\", \"Italian\"]"
						]
					}
				}
			}
		]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 6,
		"executionTimeMillis" : 1,
		"totalKeysExamined" : 7,
		"totalDocsExamined" : 6,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 6,
			"executionTimeMillisEstimate" : 0,
			"works" : 9,
			"advanced" : 6,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 6,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 6,
				"executionTimeMillisEstimate" : 0,
				"works" : 8,
				"advanced" : 6,
				"needTime" : 1,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"cuisine" : 1,
					"grades.score" : 1
				},
				"indexName" : "cuisine_1_grades.score_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"grades.score" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"cuisine" : [
						"[\"Italian\", \"Italian\"]"
					],
					"grades.score" : [
						"(50.0, inf.0]"
					]
				},
				"keysExamined" : 7,
				"seeks" : 1,
				"dupsTested" : 7,
				"dupsDropped" : 1,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Now, the winning plan uses the new compound index cuisine_1_grades.score_1 and we have only 6 documents examined. Please note also that now we have a rejected plan, the one that uses the single field index cuisine_1 previously created.

We have optimized the query.

Example 3

Let’s find out all the restaurants that don’t prepare any “American” cuisine in Brooklyn and achieved a grade of ‘A’. We want to see the  results ordered by cuisine, descending.

At this point you should be a little familiar with the explain() output, so in the next box we truncate it for the sake of simplicity, leaving only the relevant parts.

MongoDB > var exp.restaurants.explain("executionStats")
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"borough" : {
							"$eq" : "Brooklyn"
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1
				},
				"indexName" : "cuisine_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"cuisine" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 9,
		"totalKeysExamined" : 2518,
		"totalDocsExamined" : 2517,
...
...
	"serverInfo" : {
		"host" : "Admins-MBP",
		"port" : 27017,
		"version" : "3.6.4",
		"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
	},
	"ok" : 1
}

Looking at the winning plan, we see that we have IXSCAN on the index cuisine_1. The important thing to notice here is the choice of the index on cuisine field—it’s because we have used sort({cuisine:-1}). There is not a SORT stage because the documents are already extracted using the index, and so they are already sorted. Just notice, too, direction:”backward” —this is because we specified descending order in the query. If we try to execute a slightly different query, changing the sorting to name:1 instead of cuisine:-1 we’ll see a completely different winning plan.

MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"name":1})
...
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
				"name" : -1
			},
			"inputStage" : {
				"stage" : "SORT_KEY_GENERATOR",
				"inputStage" : {
					"stage" : "FETCH",
					"filter" : {
						"$and" : [
							{
								"grades.grade" : {
									"$eq" : "A"
								}
							},
							{
								"$nor" : [
									{
										"cuisine" : {
											"$eq" : "American "
										}
									}
								]
							}
						]
					},
					"inputStage" : {
						"stage" : "IXSCAN",
						"keyPattern" : {
							"borough" : 1
						},
						"indexName" : "borough_1",
						"isMultiKey" : false,
						"multiKeyPaths" : {
							"borough" : [ ]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"borough" : [
								"[\"Brooklyn\", \"Brooklyn\"]"
							]
						}
					}
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 13,
		"totalKeysExamined" : 684,
		"totalDocsExamined" : 684,
...
...

In this case, we have fewer documents examined but since the cuisine_1 index cannot be used, a SORT stage is needed, and the index used to fetch the document is borough_1. While MongoDB has examined fewer documents, the execution time is worse because of the extra stage used to sort the documents.

Let’s return now to the original query. We can also notice that the number of documents examined is still too high (2517) compared to the documents returned (493). That’s not optimal. Let’s see if we can further improve the query by adding another compound index on (cuisine,borough,grades.grade).

MongoDB > db.restaurants.createIndex({cuisine:1,borough:1,"grades.grade":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 4,
	"numIndexesAfter" : 5,
	"ok" : 1
}
MongoDB > exp.find( {"cuisine" : {$ne : "American "},
... "grades.grade" :"A",
... "borough": "Brooklyn"}).sort({"cuisine":-1})
...
...		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$nor" : [
					{
						"cuisine" : {
							"$eq" : "American "
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"cuisine" : 1,
					"borough" : 1,
					"grades.grade" : 1
				},
				"indexName" : "cuisine_1_borough_1_grades.grade_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"cuisine" : [ ],
					"borough" : [ ],
					"grades.grade" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "backward",
				"indexBounds" : {
					"cuisine" : [
						"[MaxKey, \"American \")",
						"(\"American \", MinKey]"
					],
					"borough" : [
						"[\"Brooklyn\", \"Brooklyn\"]"
					],
					"grades.grade" : [
						"[\"A\", \"A\"]"
					]
				}
			}
		},
...
...
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 493,
		"executionTimeMillis" : 6,
		"totalKeysExamined" : 591,
		"totalDocsExamined" : 493,
...
...

Now, MongoDB uses the new index and does not need the extra sorting stage. The number of documents examined is the same as the number of documents returned. Also, the execution time is better.

We have optimized the query.

Example 4

This is the final example. Let’s find out the restaurants where the grades array contains a grade of ‘A’ and a score of 9 for a specific date.

MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"grades.date" : {
							"$eq" : ISODate("2014-08-11T00:00:00Z")
						}
					},
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"direction" : "forward"
		},
...

A COLLSCAN again. In the query we notice that all the conditions refer to embedded fields in an array object. So let’s try to create a multikey index. Let’s just a create the index on the date field only and see what happens.

MongoDB > db.restaurants.createIndex({"grades.date":1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 5,
	"numIndexesAfter" : 6,
	"ok" : 1
}
MongoDB > exp.find({"grades.date": ISODate("2014-08-11T00:00:00Z"),
... "grades.grade":"A" ,
... "grades.score" : 9})
{
	"queryPlanner" : {
...
...
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"$and" : [
					{
						"grades.grade" : {
							"$eq" : "A"
						}
					},
					{
						"grades.score" : {
							"$eq" : 9
						}
					}
				]
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"grades.date" : 1
				},
				"indexName" : "grades.date_1",
				"isMultiKey" : true,
				"multiKeyPaths" : {
					"grades.date" : [
						"grades"
					]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"grades.date" : [
						"[new Date(1407715200000), new Date(1407715200000)]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 15,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 22,
		"totalDocsExamined" : 22,
...
...

MongoDB uses the index and the winning plan is already good enough. You can try as an exercise to create a compound index including other embedded fields of the array, like {“grades.date”:1, “grades.grade”:1, “grades.score”:1} and see what happens. You will probably see that the index we have created only on date is good enough. Enlarging the compound index will generate only rejected plans. This is because the date field is the most selective.

Hint: when dealing with compound indexes, please remember that the order of the fields is important. The first field should be the most selective one, and the last one should be the less selective. Or, in cases where you don’t need to put lot of fields in the index, the most selective ones are probably all that you need to improve the queries.

Conclusion

That’s all folks. In this two part series we have seen the indexes available in MongoDB and how to use explain() to investigate queries and find out how to improve their performance. We don’t expect now that you’ll know everything, but we hope this will be a good a starting point for you to practice MongoDB query optimization using explain(). You’ll find plenty more information in the manual and on the internet about indexes and explain(), if you are motivated to find out more.

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

The post MongoDB: Investigate Queries with explain() and Index Usage (part 2) appeared first on Percona Database Performance Blog.

Sep
04
2018
--

MongoDB: index usage and MongoDB explain() (part 1)

MongoDB index types

MongoDB index typesIn this two part series, I’m going to explain explain. No, the repetition is not a pun or a typo. I’m talking about explain(), the most useful MongoDB feature to investigate a query. Here, I’ll introduce the index types available in MongoDB, their properties and how to create and use them. In the next article we’ll see explain() in action on some real examples.

Using explain(), questions like these will find a proper answer:

  • Is the query using an index?
  • How is it using the index?
  • How many documents are scanned?
  • How many documents are returned?
  • For how many milliseconds does the query run?

And many others. The explain() command provides a lot of information.

If you are familiar with MySQL, then you probably know about the command EXPLAIN. In MongoDB we have a similar feature, and the goal of using it is exactly the same: find out how we can improve a query in order to reduce the execution time as far as possible. Based on the information provided, we can decide to create a new index or to rewrite the query to take advantage of indexes that already exist. The same as you do when using MySQL.

Indexes supported by MongoDB

The concept of an index in MongoDB is the same as in relational databases. An index is generally a small structure—in comparison to the collection size—that provides a better way to access documents more quickly. Without an index, the only way that MongoDB has to retrieve the documents is to do a collection scan: reading sequentially all the documents in the collection. This is exactly the same as the full scan table in MySQL. Another similarity to MySQL is that the indexes in MongoDB have a structure based on the well known B-Tree. To understand explain() you’ll need to understand the index structures.

Let’s have an overview of the index types available in MongoDB, and their features and properties. We focus, in particular, on the single, compound and multikey index types. These are by far the most used and most useful in the majority of cases. We’ll present also the other types, but when using the explain(), in the second part of this article series, we’ll use only single field and compound indexes in the examples.

Single Field Indexes

This is an index built on a single field of the documents. The entries could be a single value, such as a string or a number, but also could be an embedded document.

By default each collection has a single field index automatically created on the _id field, the primary key.

The index can be defined in ascending or descending order.

Let’s see some examples.

Assume we have a collection people containing the following type of document:

{
  "_id": 1,
  "person": { name: "John", surname: "Brown" },
  "age": 34,
  "city": "New York"
}

We can define, for example, a single field index on the age field.

db.people.createIndex( { age : 1} )

In this case we have created an ascending index. If we had wanted to create a descending index we would have used this syntax:

db.people.createIndex( {age : -1} )

With this kind of index we can improve all the queries that find documents with a condition an the age field, like the following:

db.people.find( { age : 20 } ) 
db.people.find( { name : "Antony", age : 30 } )
db.people.find( { age : { $gt : 25} } )

It’s interesting to highlight that the index can be used to improve even the sorting of the results. In this case, it doesn’t matter whether you have defined the index as ascending or descending. MongoDB can traverse the items in the index in both directions.

db.people.find().sort( { age: 1} )
db.people.find().sort( {age : -1} )

Both these queries use the index to retrieve all of the documents in the specified order.

The next query can use the index to retrieve the documents in the order specified by the sort.

db.people.find( { age : { $lt : 25 } } ).sort( { age : -1} )

Indexes on embedded documents

We can even define an index on an embedded document.

db.people.createIndex( { person: 1 } )

In this case, each item in the index is the embedded document as a whole. The index can be used when the condition in the query matches exactly the embedded document. This query can retrieve the document using the index:

db.people.find( { person : {name : "John", surname: "Brown" } } )

but the next two examples are not able to use the index, and MongoDB will do a collection scan:

db.people.find( { person : {surname : "Brown", name: "John" } } )
db.prople.find( person.name: "John", person.surname: "Brown" )

More useful is to create indexes on embedded fields rather than on embedded documents. For doing this we can use the dot notation.

db.people.createIndex( { "person.name": 1} )
db.people.createIndex( { "person.surname": 1 } )

We have created two separate ascending indexes on the name and surname embedded fields. Now, queries like the following can rely on the new indexes to be resolved.

db.people.find( { "person.name": "John" } )
db.people.find( { "person.surname": "Brown, "person.name": "John" } )

Compound indexes

A compound index is an index on multiple fields. Using the same people collection we can create a compound index combining the city and age field.

db.people.createIndex( {city: 1, age: 1, person.surname: 1  } )

In this case, we have created a compound index where the first entry is the value of city field; the second is the value of the age field; and the third is the person.name. All the fields here are defined in ascending order.

Queries such as the following can benefit from the index:

db.people.find( { city: "Miami", age: { $gt: 50 } } )
db.people.find( { city: "Boston" } )
db.people.find( { city: "Atlanta", age: {$lt: 25}, "person.surname": "Green" } )

The order we define the fields is important. In fact, only the left prefix of the index can be used to retrieve the documents. In the examples above the index can be used because the fields in the find() are always a left-prefix of the index definition. The following queries will not use the index because the field city is not specified in the query.  A collection scan is required to solve them.

db.people.find( { age: 20 } )
db.people.find( { age: { $gt: 40 }, "person.surname": "Brown" } )
db.people.find( { "person.surname": "Green" }

The ascending or descending order is more important in the case of compound indexes. In fact we need to pay attention when defining the order in the index because not all the queries can rely on it, in particular when using the sort() function.

Assume we have the compound index defined as follow:

db.people.createIndex( {city: 1, age: -1} )

The specified sort direction in the documents, when use sort(), must match the same pattern of the index definition or the inverse pattern. Any other pattern will not be supported by the index.

So, the following queries will use the index for sorting:

db.people.find( ).sort( city: 1, age: -1)
db.people.find( ).sort( city: -1, age: 1)

The following queries will not use the index:

db.people.find( ).sort( { city: 1, age: 1} )
db.people.find( ).sort( { city: -1, age: -1} )

The following queries will use the index both for retreiving the documents and for sorting.

db.people.find( { city: "New York" } ).sort( city: 1, age: -1)
db.people.find( { city: "Miami", age:{ $gt: 50 } } ).sort( city: -1, age: 1)

They use the left prefix and the pattern of the sort() stage matches exactly the index pattern or the inverse pattern.

Multikey indexes

This is the index type for arrays. When creating an index on an array, MongoDB will create an index entry for every element.

Let’s assume we have these documents:

{
   "_id": 1,
   "person": { name: "John", surname: "Brown" },
   "age": 34,
   "city": "New York",
   "hobbies": [ "music", "gardening", "skiing" ]
 }

The multikey index can be created as:

db.people.createIndex( { hobbies: 1} )

Queries such as these next examples will use the index:

db.people.find( { hobbies: "music" } )
db.people.find( { hobbies: "music", hobbies: "gardening" } )

Geospatial and text indexes

MongoDB supports specialized indexes also for geospatial data and for text searches. They are out of scope of the goal of this article, but it is worth mentioning them. We’ll look at these in a future post maybe.

Index options

TTL

The acronym stands for Time To Live. This is a special option that we can apply only to a single field index to permit the automatic deletion of documents after a certain time.

During index creation we can define an expiration time. After that time, all the documents that are older than the expiration time will be removed from the collection. This kind of feature is very useful when we are dealing with data that don’t need to persist in the database. A good example of this is session data.

Let’s see how to define the TTL option on the sessionlog collection.

db.sessionlog.createIndex( { "lastUpdateTime": 1 }, { expireAfterSeconds: 1800 } )

In this case, MongoDB will drop the documents from the collection automatically once half an hour (1800 seconds) has passed since the value in lastUpdateTime field.

MongoDB runs a background process every 60 seconds to drop the expired documents. So, this means that the real deletion of a document can be applied with a short delay, not precisely after expireAfterSeconds.

There are some restrictions:

  • as already mentioned only single field indexes can have the TTL option
  • the _id single field index cannot support the TTL option
  • the indexed field must be a date type
  • a capped collection cannot have a TTL index

In a replica set, documents are deleted only in the primary node; the background process works only on the primary node, with deletions correctly replicated as per any other event.

Partial indexes

A partial index is an index that contains only a subset of the values based on a filter rule. They are useful in cases where:

  • the index size can be reduced
  • we want to index the most relevant and used values in the query conditions
  • we want to index the most selective values of a field

Here’s how to create a partial index using the clause partialFilterExpression.

db.people.createIndex(
   { "city": 1, "person.surname": 1 },
   { partialFilterExpression: { age : { $lt: 30 } } }
)

We have created a compound index on city and person.surname but only for the documents with age less then 30.

In order for the partial index to be used the queries must contain a condition on the age field.

db.people.find( { city: "New Tork", age: { $eq: 20} } )

The following queries cannot use the partial index because the results would be incomplete. In these examples, MongoDB will use a collection scan:

db.people.find( { city: "Miami" } )
db.people.find( { city: "Orlando", age : { $gt: 25 } } )

Sparse indexes

Sparse indexes are a subset of partial indexes. A sparse index only contains elements for the documents that have the indexed field, even if it is null.

Since MongoDB is a schemaless database, the documents in a collection can have different fields, so an indexed field may not be present in some of them.

To create such an index use the sparse option:

db.people.createIndex( { city: 1 }, { sparse: true } )

In this case we are assuming there could be documents in the collection with the field city missing.

Regular indexes—without the sparse option—contain all the documents of the collection with a null value for the elements  whose documents don’t contain the indexed field.

Sparse indexes are based on the existence of a field in the documents, and are useful to reduce the size of the index. Since they are a subset of partial indexes, if you have to decide then you should choose partial indexes.

Defining a partial index with the following filter is the same of having a sparse index:

db.people.createIndex(
  { city: 1 },
  { partialFilterExpression: { city: { $exists: true }  } }
)

Unique indexes

MongoDB can create an index as unique. An index defined this way cannot contain duplicate entries.

To create such an index use the unique option.

db.people.createIndex( { city: 1 }, { unique: true } )

Uniqueness can be defined for compound indexes too.

db.people.createIndex( { city: 1, person.surname: 1}, { unique: true } )

By default the index on _id is automatically created as unique.

You cannot create the index as unique if you already have documents in the collection with duplicates for the proposed index.

As we know, missing fields in the documents are inserted in the index with a null value, even in the case of a unique index creation. Consequently only one null value is permitted in the index because of the uniqueness restraint. Our suggestion is to try to avoid missing fields when creating a unique index.

A couple of useful commands

Here are a couple of useful commands when dealing with indexes.

List of the indexes in a collection

Use the getIndexes() method.

MongoDB > db.restaurants.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "test.restaurants"
	},
	{
		"v" : 2,
		"key" : {
			"borough" : 1
		},
		"name" : "borough_1",
		"ns" : "test.restaurants"
	},
	{
		"v" : 2,
		"key" : {
			"cuisine" : 1
		},
		"name" : "cuisine_1",
		"ns" : "test.restaurants"
	},
	{
		"v" : 2,
		"key" : {
			"cuisine" : 1,
			"grades.score" : 1
		},
		"name" : "cuisine_1_grades.score_1",
		"ns" : "test.restaurants"
	}
]

In this sample collection we have 4 indexes, including the _id.

Drop an existing index

Use the dropIndex() method passing the name of the index to be dropped.

MongoDB > db.restaurants.dropIndex("cuisine_1_grades.score_1")
{ "nIndexesWas" : 4, "ok" : 1 }

Conclusions

In this first part of this two part series I’ve described the indexes available in MongoDB, and how you can create and use them. We’ve also discussed some of their main features. In the next part, we’ll focus on the explain() method, and using some examples we’ll show how we can investigate queries and, if possible, how we can optimize them.

You won’t have to wait long! Come back on Thursday to read about MongoDB explain().

While you are here

If you found this article interesting, you might like to check out some of our other resources. For example, you might find this recorded webinar MongoDB Sharding 101 by my colleague Adamo Tonete to be useful. Or perhaps Tim Vaillancourt’s MongoDB backup and recovery field guide would provide some useful information. If you want to catch our webinars as they happen, please subscribe to receive notifications in good time.

The post MongoDB: index usage and MongoDB explain() (part 1) appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com