May
13
2019
--

PostgreSQL Minor Versions Released May 9, 2019

PostgreSQL Logo

Usually, the PostgreSQL Community releases minor patches on the Thursday of the second week of the second month of each quarter. This can vary depending on the nature of the fixes. For example, if the fixes include critical fixes as a postgres community response to security vulnerabilities or bugs that might affect data integrity. The previous minor release happened on February 14, 2019. In this case, the fysnc failure issue was corrected, along with some other enhancements and fixes.

The latest minor release, published on May 9, 2019, has some security fixes and bug fixes for these PostgreSQL Major versions.

PostgreSQL 11 (11.3)
PostgreSQL 10 (10.8)
PostgreSQL 9.6 (9.6.13)
PostgreSQL 9.5 (9.5.17)
PostgreSQL 9.4 (9.4.22)

Let’s take a look at some of the security fixes in this release.

Security Fixes

CVE-2019-10130 (Applicable to PostgreSQL 9.5, 9.6, 10 and 11 versions)

When you

ANALYZE

a table in PostgreSQL, statistics of all the database objects are stored in

pg_statistic

. The query planner uses this statistical data is and it may contain some sensitive data, for example, min and max values of a column. Some of the planner’s selectivity estimators apply user-defined operators to values found in

pg_statistic

. There was a security fix :

CVE-2017-7484

in the past, that restricted a leaky user-defined operator from disclosing some of the entries of a data column.

Starting from PostgreSQL 9.5, tables in PostgreSQL not only have SQL-standard privilege system but also row security policies. To keep it short and simple, you can restrict a user so that they can only access specific rows of a table. We call this RLS (Row-Level Security).

CVE-2019-10130

is about restricting a user who has SQL permissions to read a column but is forbidden to read some rows due to RLS policy from discovering the restricted rows through a leaky operator. Through this patch, leaky operators to statistical data are only allowed when there is no relevant

RLS

policy. We shall see more about

RLS

in our future blog posts.

CVE-2019-10129 (Applicable to PostgreSQL 11 only)

This fixes the situation where a user could access arbitrary back end memory with correctly crafted DDLs. This is only applicable to PostgreSQL 11 version and no other major release has been impacted.

Fixes roundup

There have been a good number of fixes around partitioning in both PostgreSQL 10 and 11 versions that included a fix to behaviour of an UPDATE or an INSERT on certain scenarios, failure while ALTER INDEX .. ATTACH PARTITION, tuple routing in multi-level partitioned tables that have dropped attributes. etc. .

We should also mention a couple of interesting bugs addressed in this new release.

Bug # 15631 is about the catalog corruption when a temporary table on commit drop is created with an identity column. Due to this issue, PostgreSQL 11.1 would not allow the creation of temporary tables, and you might have observed this error message prior to this release:

# create temporary table foo ( bar int generated by default as identity ) on
# commit drop;
# CREATE TABLE
# \q
# psql postgres postgres
# create temporary table a (b varchar);
ERROR: could not open relation with OID 16388
LINE 1: create temporary table a (b varchar);

Another bug that got fixed is Bug # 15734 which is about a

Walsender

process crashing when executing

SHOW ALL

using replication protocol.

Some more fixes include:

1. Fix

CREATE VIEW

to allow zero-column views.
2. Add missing support for the

CREATE TABLE IF NOT EXISTS .. AS EXECUTE

  .. statement.
3. Fix incompatibility of GIN-index WAL records that were introduced in 11.2, 10.7, 9.6.12, 9.5.16, and 9.4.21 that affected replica servers running these versions reading in changes to GIN indexes from primary servers of older versions.
4. Make

pg_verify_checksums

verify that the data directory it’s pointed at is of the right PostgreSQL version.
5. Fixes for

postgres_fdw

where an

UPDATE

could lead to incorrect results or a crash.
6. Several memory leak fixes as well as fixes to management of dynamic shared memory.

The postgres community have fixed more than just the bug fixes and enhancements we’ve highlighted. We would recommend you to go through the release notes, which you can access by clicking the appropriate release in the beginning of this blog post. Let us know if you have any specific questions you’d like us to address.

Applying updates

We always recommend that you keep your PostgreSQL databases updated to the latest minor versions. Be aware that applying a minor release might need a restart after updating the new binaries.

Here is the sequence of steps you should follow to upgrade to the latest minor versions after thorough testing :

1. Shutdown the PostgreSQL database server
2. Install the updated binaries
3. Restart your PostgreSQL database server

Most of the time, you can choose to update the minor versions in a rolling fashion in a master-slave (replication) setup, because it avoids downtime for both reads and writes simultaneously. PostgreSQL logoFor a rolling style update, you could perform the update on one server after another. However, the best method that we’d almost always recommend is – shutdown, update and restart all instances at once.

If you are currently running your databases on PostgreSQL 9.3.x or earlier, we recommend that you to prepare a plan to upgrade your PostgreSQL databases to a supported version ASAP. We’ve previously published some blog posts about the various options for upgrading your PostgreSQL databases to a supported major version, and you can read our post archive here. Please subscribe to our blog posts so that you can see more interesting topics around PostgreSQL.

And don’t forget that at Percona Live in Austin, May 28-30 2019, we’ll have two days of PostgreSQL content in a postgres dedicated track. Please see all our PostgreSQL talks here.

Oct
18
2018
--

PostgreSQL 11! Our First Take On The New Release

slonik_with_black_text

PostgreSQL logoYou may be aware that the new major version of PostgreSQL has been released today. PostgreSQL 11 is going to be one of the most vibrant releases in recent times. It incorporates many features found in proprietary, industry-leading database systems, further qualifying PostgreSQL as a strong open source alternative.

Without further ado, let’s have a look at some killer features in this new release.

Just In Time (JIT) Compilation of SQL Statements

This is a cutting edge feature in PostgreSQL: SQL statements can get compiled into native code for execution. It’s well know how much Google V8 JIT revolutionized the JavaScript language. JIT in PostgreSQL 11 supports accelerating two important factors—expression evaluation and tuple deforming during query execution—and helps CPU bound queries perform faster. Hopefully this is a new era in the SQL world.

Parallel B-tree Index build

This could be the most sought after feature by DBAs, especially those migrating large databases from other database systems to PostgreSQL. Gone are the days when a lot of time was spent on building indexes during data migration. Index maintenance (rebuild) for very large tables can now make an effective use of multiple cores in the server by parallelizing the operation, taking considerably less time to complete.

Lightweight and super fast ALTER TABLE for NOT NULL column with DEFAULT values

In the process of continuous enhancement and adding new features, we see several application developments that involve schema changes to the database. Most such changes include adding new columns to a table. This can be a nightmare if a new column needs to be added to a large table with a default value and a NOT NULL constraint. This is because an ALTER statement can hold a write lock on the table for a long period. It can also involve excessive IO due to table rewrite. PostgreSQL 11 addresses this issue by ensuring that the column addition with a default value and a NOT NULL constraint avoids a table rewrite.  

Stored procedures with transaction control

PostgreSQL 11 includes stored procedures. What really existed in PostgreSQL so far was functions. The lack of native stored procedures in PostgreSQL made the database code for migrations from other databases complex. They often required extensive manual work from experts. Since stored procedures might include transaction blocks with BEGIN, COMMIT, and ROLLBACK, it was necessary to apply workarounds to meet this requirement in past PostgreSQL versions, but not anymore.

Load cache upon crash or restart – pg_prewarm

Memory is becoming cheaper and faster, year over year. The latest generation of servers is commonly available with several hundreds of GBs of RAM, making it easy to employ large caches (shared_buffers) in PostgreSQL. Until today, you might have used pg_prewarm to warm up the cache manually (or automatically at server start). PostgreSQL 11 now includes a background worker thread that will take care of that for you, recording the contents of the shared_buffers—in fact, the “address” of those—to a file autoprewarm.blocks. Upon crash recovery or normal server restart, two such threads work in the background, reloading those blocks into the cache.

Hash Partition

Until PostgreSQL 9.6 we used table inheritance for partitioning a table. PostgreSQL 10 came up with declarative partitioning, using two of the three most common partitioning methods: list and range. And now, PostgreSQL 11 has introduced the missing piece: hash partitioning.

Advanced partitioning features that were always on demand

There were a lot of new features committed to the partitioning space in PostgreSQL 11. It now allows us to attach an index to a given partition even though it won’t behave as a global index.

Also, row updates now automatically move rows to new partitions (if necessary) based on the updated fields. During query processing, the optimizer may now simply skip “unwanted” partitions from the execution plan, which greatly simplifies the work to be done. Previously, it had to convey all the partitions, even if the target data was to be found in just a subset of them.

We will discuss these new features in detail in a future blog post.

Tables can have default partitions

Until PostgreSQL 10, if a table did not have a default partition, PostgreSQL had to reject a row when the row being inserted did not satisfy any of the existing partitions definitions. That changes with the introduction of default partitions in PostgreSQL 11.

Parallel hash join

Most of the SQLs with equi-joins do hash joins in the background. There is a great opportunity to speed up performance if we can leverage the power of hardware by spinning off multiple parallel workers. PostgreSQL 11 now allows hash joins to be performed in parallel.

Write-Ahead Log (WAL) improvements

Historically, PostgreSQL had a default WAL segment of 16MB and we had to recompile PostgreSQL in order to operate with WAL segments of a different size. Now it is possible to change the WAL size during the initialization of the data directory (initdb) or while resetting WALs using pg_resetwal by means of the parameter –wal-segsize = <wal_segment_size>.

Add extensions to convert JSONB data to/from PL/Perl and PL/Python

Python as a programming language continues to gain popularity. It is always among the top 5 in the TIOBE Index. One of the greatest features of PostgreSQL is that you can write stored procedures and functions in most of the popular programming languages, including Python (with PL/Python). Now it is also possible to transform JSONB (Binary JSON) data type to/from PL/Python. This feature was later made available for PL/Perl too. It can be a great add-on for organizations using PostgreSQL as a document store.

Command line improvements in psql: autocomplete and quit/exit

psql has always been friendly to first time PostgreSQL users through the various options like autocomplete and shortcuts. There’s an exception though: users may find it difficult to understand how to effectively quit from psql, and often attempt to use non-existing quit and exit commands. Eventually, they find \q or ctrl + D, but not without frustrating themselves first. Fortunately, that shouldn’t happen anymore: among many recent fixes and improvements to psql is the addition of the intuitive quit and exit commands to safely leave this popular command line client.

Improved statistics

PostgreSQL 10 introduced the new statement CREATE STATISTICS to collect more statistics information about columns from tables. This has been further improved in PostgreSQL 11. Previously, while collecting optimizer statistics, most-common-values (MCV) were chosen based on their significance compared to all columns. But now, MCVs are chosen based on their significance compared to non-MCV values.

The new features of PostgreSQL 11 are not limited to the ones mentioned above. We will be looking further into most of them in future blog posts. We invite you to leave a comment below and let us know if there is any particular feature you would be interested in knowing more about.

Jun
25
2018
--

Webinar Tues 6/26: MariaDB Server 10.3

MariaDB 10.3 Webinar

MariaDB 10.3 WebinarPlease join Percona’s Chief Evangelist, Colin Charles on Tuesday, June 26th, 2018, as he presents MariaDB Server 10.3 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4).

 

MariaDB Server 10.3 is out. It has some interesting features around system versioned tables, Oracle compatibility, column compression, an integrated SPIDER engine, as well as MyRocks. Learn about what’s new, how you can use it, and how it is different from MySQL.

Register Now

Colin Charles

Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, and had worked at MySQL since 2005, and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within open source communities in APAC, and has spoken at many conferences. Experienced technologist, well known in the open source world for work that spans nearly two decades within the community. Pays attention to emerging technologies from an integration standpoint. Prolific speaker at many industry-wide conferences delivering talks and tutorials with ease. Interests: application development, systems administration, database development, migration, Web-based technologies. Considered expert in Linux and Mac OS X usage/administration/roll-out’s. Specialties: MariaDB, MySQL, Linux, Open Source, Community, speaking & writing to technical audiences as well as business stakeholders.

The post Webinar Tues 6/26: MariaDB Server 10.3 appeared first on Percona Database Performance Blog.

Dec
06
2017
--

MongoDB 3.6 Community Is Here!

MongoDB 3.6 Community

MongoDB 3.6 CommunityBy now you surely know MongoDB 3.6 Community became generally available on Dec 5, 2017. Of course, this is great news: it has some big ticket items that we are all excited about! But I want to also talk about my general thoughts on this release.

It is always a good idea for your internal teams to study and consider new versions. This is crucial for understanding if and when you should start using it. After deciding to use it, there is the question of if you want your developers using the new features (or are they not suitably implemented yet to be used)?

So what is in MongoDB 3.6 Community? Check it out:

  • Sessions
  • Change Streams
  • Retryable Writes
  • Security Improvement
  • Major love for Arrays in Aggregation
  • A better balancer
  • JSON Schema Validation
  • Better Time management
  • Compass is Community
  • Major WiredTiger internal overhaul

As you can see, this is an extensive list. But there are 1400+ implemented Jira tickets just on the server itself (not even in the WiredTigers project).

To that end, I thought we should break my review into a few areas. We will have blog posts out soon covering these areas in more depth. This blog is more about my thoughts on the topics above.

Expected blogs (we will link to them as they become available):

  • Change Streams –  Nov 11 2017
  • Sessions
  • Transactions and new functions
  • Aggregation improvements
  • Security Controls to use ASAP
  • Other changes from diagnostics to Validation

Today let’s quickly recap the above areas.

Sessions

We will have a blog on this (it has some history). This move has been long-awaited by anyone using MongoDB before 2.4. There were connection changes in that release that made it complicated for load balancers due to the inability to “re-attach” to the same session.  If you were not careful in 2.4+, you could easily use a load-balancer and have very odd behavior: from broken to invisibly incomplete getMores (big queries).

Sessions aim is to change this. Now, the client drivers know about the internal session to the database used for reads and writes. Better yet, MongoDB tracks these sessions so even if an election occurs, when your drive fails over so will the session. For anyone who’s applications handled fail-overs badly, this is an amazing improvement. Some of the other new features that make 3.6 a solid release require this new behavior.

Does this mean this solution is perfect and works everywhere? No! It, like newer features we have seen, leave MMAPv1 out in the cold due to its inability without major re-work to support logic that is so native to Wired Tiger. Talking with engine engineers, it’s clear that some of the logic behind the underlying database snapshots and rollbacks added here can cause issues (which we will talk about more in the transactions blog).

Change streams

As one of the most talked about (but most specialized) features, I can see its appeal in a very specific use case (but it is rather limited). However, I am getting ahead of myself! Let’s talk about what it is first and where it came from.

Before this feature, people streamed data out of MySQL and MongoDB into Elastic and Hadoop stacks. I mention MySQL, as this was the primer for the initial method MongoDB used. The tools would read the MySQL binlogs – typically saved off somewhere – and then apply those operations to some other technology. When they went to do the same thing in MongoDB, there was a big issue: if writes are not sent to the majority of the nodes, it can cause a rollback. In fact, such rollbacks were not uncommon. The default was w:1 (meaning the primary only needed to have the write), which resulted in data existing in Elastic that had been removed from MongoDB. Hopefully, everyone can see the issue here, and why a better solution was needed than just reading the oplog.

Enter

$changeStream

, which in the shell has a helper called

.watch()

 . This is a method that uses a multi-node consistent read to ensure the data is on the majority of nodes before the command returns the data in a tailable cursor. For this use case, this is amazing as it allows the data replicating tool much more assurance that data is not going to vanish. 

$changeStream

 is not without limits: if we have 10k collections and we want to watch them all, this is 10k separate operations and cursors. This puts a good deal of strain on the systems, so MongoDB Inc. suggests you do this on no more than 1000 namespaces at a time to prevent overhead issues.

Sadly it is not currently possible to take a mongod-wide snapshot to support this under the hood, as this is done on each namespace to implement the snapshot directly inside WiredTiger’s engine. So for anyone with a very large collection count, this will not be your silver bullet yet. This also means streams between collections and databases are not guaranteed to be in sync. This could be an issue for someone even with a smaller number of namespaces that expect this. Please don’t get me wrong: it’s a step in the correct direction, but it falls short.

I had very high hopes for this to simplify backups in MongoDB. Percona Labs’s GitHub has a tool called MongoDB-Consistent-Backup, which tails multiple oplogs to get a consistent sharded backup without the need to pay for MongoDB’s backup service or use the complicated design that is Ops Manager (when you host it yourself). Due to the inability to do a system-wide change stream, this type of tool still needs to use the oplog. If you are not using

w:majority

  it could trigger a failure if you have an election or if a rollback occurs. Still, I have hopes this will be something that can be considered in the future to make things better for everyone.

Retryable writes

Unlike change streams, this feature is much more helpful to the general MongoDB audience. I am very excited for it. If you have not watched this video, please do right now! Samantha does a good job explaining the issue and solution in detail. However, for now just know there has been a problem that where a write that has an issue (network, app shutdown, DB shutdown, election), you had no way to know if the write failed or not. This unknown situation was terrible for a developer, and they would not know if they needed to run the command again or not. This is especially true if you have an ordering system and you’re trying to remove stock from your inventory system. Sessions, as discussed before, allowed the client to reconnect to a broken connection and keep getting results to know what happened or didn’t. To me, this is the second best feature of 3.6. Only Security is more important to me personally.

Security improvement

In speaking of security, there is one change that the security community wanted (which I don’t think is that big of a deal). For years now, the MongoDB packaging for all OSs (and even the Percona Server for MongoDB packing) by default would limit the bindIP setting to localhost. This was to prevent unintended situations where you had a database open to the public. With 3.6 now the binaries also default to this. So, yes, it will help some. But I would (and have) argued that when you install a database from binaries or source, you are taking more ownership of its setup compared to using Docker, Yum or Apt.

The other major move forward, however, is something I have been waiting for since 2010. Seriously, I am not joking! It offers the ability to limit users to specific CIDR or IP address ranges. Please note MySQL has had this since at least 1998. I can’t recall if it’s just always been there, so let’s say two decades.

This is also known as “authenticationRestriction” and it’s an array you can put into the user document when creating a document. The manual describes it as:

The authentication restrictions the server enforces on the created user. Specifies a list of IP addresses and CIDR ranges from which the user is allowed to connect to the server or from which the server can accept users.

I can not overstate how huge this is. MongoDB Inc. did a fantastic job on it. Not only does it support the classic client address matching, it supports an array of these with matching on the server IP/host also. This means supporting multiple IP segments with a single user is very easy. By extension, I could see a future where you could even limit some actions by range – allowing dev/load test to drop collections, but production apps would not be allowed to. While they should have separate users, I regularly see clients who have one password everywhere. That extension would save them from unplanned outages due to human errors of connecting to the wrong thing.

We will have a whole blog talking about these changes, their importance and using them in practice. Yes, security is that important!

Major love for array and more in Aggregation

This one is a bit easier to summarize. Arrays and dates get some much-needed aggregation love in particular. I could list all the new operators here, but I feel it’s better served in a follow-up blog where we talk about each operator and how to get the most of it. I will say my favorite new option is the $hint. Finally, I can try to control the work plan a bit if it’s making bad decisions, which sometimes happens in any technology.

A better balancer

Like many other areas, there was a good deal that went into balancer improvements. However, there are a few key things that continue the work of 3.4’s parallel balancing improvements.

Some of it makes a good deal of sense for anyone in a support role, such as FTDC now also existing in mongos’. If you do not know what this is, basically MongoDB collects some core metrics and state data and puts it into binary files in dbPath for engineers at companies like Percona and MongoDB Inc. to review. That is not to say you can’t use this data also. However, think of it as a package of performance information if a bug happens. Other diagnostic type improvements include moveChunk, which provides data about what happened when it runs in its output. Previously you could get this data from the config.changeLog or config.actionLog collections in the config servers. Obviously, more and more people are learning MongoDB’s internals and this should be made more available to them.

Having talked about diagnostic items, let’s move more into the operations wheelhouse. The single biggest frustration about sharding and replica-sets is the sensitivity to time variations that cause odd issues, even when using ntpd. To this point, as of 3.6 there is now a logical clock in MongoDB. For the geekier in the crowd, this was implemented using a Lamport Clock (great video of them). For the less geeky, think of it as a cluster-wide clock preventing some of the typical issues related to varying system times. In fact, if you look closer at the oplog record format in 3.6 there is a new wt field for tracking this. Having done that, the team at MongoDB Inc. considered what other things were an issue. At times like elections of the config servers, meta refreshes did not try enough times and could cause a mongos to stop working or fail. Those days are gone! Now it will check three times as much, for a total of ten attempts before giving up. This makes the system much more resilient.

A final change that is still somewhat invisible to the user but helps make dropping collections more stable, is that they remove the issue MongoDB had about dropping and recreating sharded collections. Your namespaces look as they always have. Behind the scenes, however, they have UUID’s attached to them so that if foo.bar drops and gets recreated, it would be a different UUID. This allows for less-blocking drops. Additionally, it prevents confusion in a distributed system if we are talking about the current or old collection.

JSON Schema validation

Some non-developers might not know much about something called JSON Schema. It allows you to set rules on schema design more efficiently and strictly than MongoDB’s internal rules. With 3.6, you can use this directly. Read more about it here. Some key points:

  • Describes your existing data format
  • Clear, human- and machine-readable documentation
  • Complete structural validation, useful for:
    • Automated testing
    • Validating client-submitted data
You can even make it reject when certain fields are missing. As for MySQL DBAs, you might ask why this is a big deal? You could always have a DBA define a schema in an RDBMS, and the point of MongoDB was to be flexible. That’s a fair and correct view. However, the big point of using it is you could apply this in production, not in development. This gives developers the freedom to move quickly, but provides operational teams with control methods to understand when mistakes or bugs are present before production is adversely affected. Taking a step back, its all about bridging the control and freedom ravines to ensure both camps are happy and able to do their jobs efficiently.

Compass is Community

If you have never used Compass, you might think this isn’t that great. You could use things like RoboMongo and such. You absolutely could, but Compass can do visualization as well as CRUD operations. It’s also a very fluid experience that everyone should know is available for use. This is especially true for QA teams who want to review how often some types of data are present, or a business analyst who needs to understand in two seconds what your demographics are.

Major WiredTiger internal overhaul

There is so much here that I recommend any engineer-minded person take a look at this deck, presented by one of the great minds behind WiredTiger. It does a fantastic job explaining all the reasons behind some of the 3.2 and 3.4 scaling issues MongoDB had on WiredTiger. Of particular interest is why it tended to have worse and worse performance as you added more collections and indexes. It then goes into how they fixed those issues:

  • Some key points on what they did
  • Made Evictions Smarts, as they are not collection uniform
  • Improved assumption around the handle cache
  • Made Checkpoints better in all ways
  • Aggressively cleaned up old handles

I hope this provides a peek into the good, bad, and ugly in MongoDB 3.6 Community! Please check back as we publish more in-depth blogs on how these features work in practice, and how to best use them.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com