Oct
19
2020
--

Announcing Percona Distribution for PostgreSQL 13

Percona Distribution PostgreSQL

Percona Distribution PostgreSQLPercona is pleased to announce the release of Percona Distribution for PostgreSQL 13. This release is available for immediate download and includes all of the latest features of PostgreSQL 13 Core Distribution.

Reduce Resource Usage

This release includes enhancements to existing features that deliver a more streamlined and optimized environment, helping you to reduce overall resource usage. This includes minimizing bloat with parallel Vacuum, which helps the Vacuum process to run more quickly and efficiently, improving the use of storage and improving performance. Deduplication of B-Tree indexes also helps save storage space by reducing the amount of storage needed for indexes. It also uses resources more efficiently during query with additional opportunities to benefit from partition enhancements. This means that your query processes less data and frees up resources.

Improve Response Times

With enhancements that deliver more efficient use of indexes and smarter sorting capabilities, you can expect better overall performance and improved response times. This includes incremental sort which avoids sorting data that has already been sorted and delivering query results faster.  New partition wise joins break down a join between partitioned tables into joins between their partitions, resulting in smaller data volumes being processed more quickly. Improvements to GiST, SP-GiST, and GIN indexes, provide overall performance improvement and speed up query processing. There are also some new PostgreSQL commands and authentication changes.

Additionally, we are including a technical preview of pg_stat_monitor, a custom extension written by Percona.  This extension gathers and aggregates query performance data, enabling better and faster query analysis. It can be used alone, but its capabilities are best used when combined with the latest release of Percona Monitoring and Management. This enables you to easily analyze your PostgreSQL queries, using the pg_stat_monitor metrics, to quickly identify and remedy issues with scaling, bottlenecks, and potential outages.

Percona is also planning to extend our Managed Services offerings to include PostgreSQL early next year. This means that you will be able to have your MySQL, MongoDB, PostgreSQL, and Maria DB databases all managed by a single source. As always, we continue to provide support for these open-source database products and offer professional consulting and training services.

For more details on PostgreSQL 13, check out the release notes from PostgreSQL. To learn more about Percona Distribution for PostgreSQL, check out our release notes.

To provide feedback on the technical preview of pg_stat_monitor or its integration with Percona Monitoring and Management, visit the Percona Community Forum. Bugs should be submitted through Jira.

Download Percona Distribution for PostgreSQL

Download Percona Monitoring and Management

Oct
19
2020
--

5 Things Developers Should Know Before Deploying MongoDB

Developers Should Know Before Deploying MongoDB

Developers Should Know Before Deploying MongoDBMongoDB is one of the most popular databases and is one of the easiest NoSQL databases to set up. Oftentimes, developers want a quick environment to just test out an idea they have for an application or to try and figure out a good data model for their data without waiting for their Operations team to spin up the infrastructure.  What can sometimes happen is these quick, one-off instances grow, and before you know it that little test DB is your production DB supporting your new app. For anyone who finds themselves in this situation, I encourage you to check out our Percona blogs as we have lots of great information for those both new and experienced with MongoDB.  Don’t let the ease of installing MongoDB fool you into a false sense of security, there are things you need to consider as a developer before deploying MongoDB.  Here are five things developers should know before deploying MongoDB in production.

1) Enable Authentication and Authorization

Security is of utmost importance to your database.  While gone are the days when security was disabled by default for MongoDB, it’s still easy to start MongoDB without security.  Without security and with your database bound to a public IP, anyone can connect to your database and steal your data.   By simply adding some important security configuration options to your configuration file, you can ensure that your data is protected.  You can also configure MongoDB to utilize native LDAP or Kerberos for authentication.  Setting up authentication and authorization is one of the simplest ways to ensure that your MongoDB database is secure.  The most important configuration option is turning on authorization which enables users and roles and requires you to authenticate and have the proper roles to access your data.

security:
  authorization: enabled
  keyfile: /path/to/our.keyfile

 

2) Connect to a Replica Set/Multiple Mongos, Not Individual Nodes

MongoDB’s drivers all support connecting directly to a standalone node, a replica set, or a mongos for sharded clusters.   Sometimes your database starts off with one specific node that is always your primary.  It’s easy to set your connection string to only connect to that one node.   But what happens when that one node goes down?   If you don’t have a highly available connection string in your application configuration, then you’re missing out on a key advantage of MongoDB replica sets. Connect to the primary no matter which node it is.  All of MongoDB’s supported language drivers support the MongoDB URI connection string format and implement the failover logic.  Here are some examples of connection strings for PyMongo, MongoDB’s Python Driver, of a standalone connection string, a replica set, and an SRV record connection string.  If you have the privilege to set up SRV DNS records, it allows you to standardize your connection string to point to an address without needing to worry about the underlying infrastructure getting changed.

Standalone Connection String:

client = MongoClient('mongodb://hostabc.example.com:27017/?authSource=admin')

 

Replica Set Connection String:

client = MongoClient('mongodb://hostabc.example.com:27017,hostdef:27017,hostxyz.example.com/?replicaSet=foo&authSource=admin')

 

SRV Connection String:

client = MongoClient('mongodb+srv://host.example.com/')

Post-script for clusters: If you’re just starting you’re usually not setting up a sharded cluster. But if it is a cluster then instead of using a replicaset connection you will connect to a mongos node. To get automatic failover in the event of mongos node being restarted (or otherwise being down) start them on multiple hosts and put them, comma-concatenated, in your connection string’s host list. As with replicasets you can use SRV records for these too.

3) Sharding Can Help Performance But Isn’t Always a Silver Bullet

Sharding is how MongoDB handles the partitioning of data.  This practice is used to distribute load across more replicasets for a variety of reasons such as write performance, low-latency geographical writes, and archiving data to shards utilizing slower and cheaper storage.   These sharding approaches are helpful in keeping your working set in memory because it lowers the amount of data each shard has to deal with.

As previously mentioned, sharding can also be used to reduce latency by separating your shards by geographic region, a common example if having a US-based shard, an EU-based shard, and a shard in Asia where the data is kept local to its origin.  Although it is not the only application for shard zoning “Geo-sharding” like this is a common one. This approach can also help applications comply with various data regulations that are becoming more important and more strict throughout the world.

While sharding can oftentimes help write performance, that sometimes comes at the detriment of read performance.  An easy example of a poor read performance would be if we needed to run a query to find all of the orders regardless of their origin. This find query would need to be sent to the US shard, the EU shard, and the shard in Asia, with all the network latency that comes with reading from the non-local regions, and then it would need to sort all the returned records on the mongos query router before returning them to the client. This kind of give and take should help you determine what approach you take to choosing a shard key and weighing its impact on your typical query patterns.

4) Replication ? Backups

MongoDB Replication, while powerful and easy to set up, is not a substitution for a good backup strategy.  Some might think that their replica set members in a DR data center will be sufficient to keep them up in a data loss scenario.   While a replica set member in a DR center will surely help you in a DR situation, it will not help you if you accidentally drop a database or a collection in production as that delete will quickly be replicated to your secondary in your DR data center.

Other common misconceptions are that delayed replica set members keep you safe.   Delayed members still rely on you finding the issue you want to restore from before it gets applied to your delayed member.  Are your processes that rock-solid that you can guarantee that you’ll find the issue before it reaches your delayed member?

Backups are just as important with MongoDB as they were with any other database.  There are tools like mongodump, mongoexport, Percona Backup for MongoDB, and Ops Manager (Enterprise Edition only) that support Point In Time Recovery, Oplog backups, Hot Backups, full and Incremental Backups.  As mentioned, Backups can be run from any node in your replica set.  The best practice is to run your backup from a secondary node so you don’t put unnecessary pressure on your primary node.   In addition to the above methods, you can also take snapshots of your data, this is possible as long as you pause writes to the node that you’re snapshotting by freezing the file system to ensure a consistent snapshot of your MongoDB database.

5) Schemaless is a Myth, Schemas Still Matter

MongoDB was originally touted as a schemaless database, this was attractive to developers who had long struggled to update and maintain their schemas in relational databases.   But these schemas succeeded for good reasons in the early days of databases and while MongoDB allowed you the flexibility to not set up your schema and create it on the fly, this often led to some poor-performing schema designs and anti-patterns.   There are lots of stories out in the wild of users not enforcing any structured schema on their MongoDB data models and running into various performance problems as their schema began to become unwieldy.  Today, MongoDB supports JSON schema and schema validation.  These approaches allow you to apply as much or as little structure to your schemas as is needed, so you still have the flexibility of MongoDB’s looser schema structure while still enforcing schema rules that will keep your application performing well and your data model consistent.

Another aspect that is affected by poor schema design in MongoDB is its aggregation framework.   The aggregation framework lets you do more analytical query patterns such as sorting, grouping, and some useful things such as unwinding of arrays and supporting joins and a whole lot more.  Without a good schema, these sorts of queries can really suffer poor performance.

MongoDB was also popular due to its lack of support for joins. Joins can be expensive and avoiding them allowed MongoDB to run quite fast.  Though MongoDB has since added $lookup to support left outer joins, embedded documents are a typical workaround to this approach.   This approach comes with its pros and cons.  As with relational databases, embedding documents is essentially creating a One-to-N relationship, this is covered in greater detail in this blog.  In MongoDB, the value of N matters, if it’s One-to-few (2-10), one-to-many,(10-1000) this can still be a good schema design as long as your indexes support your queries.   When you get to one-to-tons(10000+) this is where you need to consider things like MongoDB’s 16 MB limit per document or using references to the parent document.

Examples of each of these approaches:

One-to-Few, consider having multiple phone numbers for a user:

{  "_id" : ObjectId("1234567890"),
  "name" :  "John Doe",
  "phone" : [     
     { "type" : "mobile", "number" : "+1-585-555-5555" }, 
     { "type" : "work", "number" : "+1-585-555-1111"}  
            ]
}

One-to-Many, consider a parts list for a product with multiple items:

{ "_id" : ObjectId("123"),
 “Item” : “Widget”,
 “Price” : 100 
}

{  "_id" : ObjectId("0123456789"), 
   "manufacturer" : "Percona",
   "catalog_number" : 123456,
   "parts" : [    
      { “item”: ObjectID("123")},  
      { “item”: ObjectID("456")},
      { “item”: ObjectID("789")},
       ...  
              ] 
}

One-to-Tons, consider a social network type application:

{  "_id" : ObjectId("123"),
   "username" : "Jane Doe" 
}

{  "_id" : ObjectId("456"),
   "username" : "Eve DBA"
 }

{  "_id" : ObjectId("9876543210"),
   "username" : "Percona",
   "followers" : [     
                    ObjectID("123"),
                    ObjectID("456"),
                    ObjectID("789"),
                    ...  
                 ]
}

 

Bonus Topic: Transactions

MongoDB supports multi-document transactions since MongoDB 4.0 (replica sets) and MongoDB 4.2 (sharded clusters).  Transactions in MongoDB work quite similarly to how they work in relational databases.   That is to say that either all actions in the transaction succeed or they all fail.  Here’s an example of a transaction in MongoDB:

rs1:PRIMARY> session.startTransaction() 
rs1:PRIMARY> session.getDatabase("percona").test.insert({today : new Date()})
WriteResult({ "nInserted" : 1 })
rs1:PRIMARY> session.getDatabase("percona").test.insert({some_value : "abc"})
WriteResult({ "nInserted" : 1 }) 
rs1:PRIMARY> session.commitTransaction()

Transactions can be quite powerful if they are truly needed for your application, but do realize the performance implications as all queries in a transaction will wait to finish until the whole transaction succeeds or fails.

Takeaways:

While MongoDB is easy to get started with and has a lower barrier to entry, just like any other database there are some key things that you, as a developer, should consider before deploying MongoDB.   We’ve covered enabling authentication and authorization to ensure you have a secure application and don’t leak data.   We’ve highlighted using Highly available connection strings, whether to your replica set, a mongos node list, or utilizing SRV, to ensure you’re always connecting to the appropriate nodes.  The balancing act of ensuring that when you select your shard key you consider the impact to both reads and writes and understand the tradeoffs that you are making.   The importance of backups and to not rely on replication as a backup method was also covered.  Finally, we covered the fact that schemas still matter with MongoDB, but you still have flexibility in defining how rigid it is. We hope this helps you have a better idea about things to consider when deploying MongoDB for your next application and to be able to understand it better.  Thanks for reading!

Oct
13
2020
--

Armory nabs $40M Series C as commercial biz on top of open-source Spinnaker project takes off

As companies continue to shift more quickly to the cloud, pushed by the pandemic, startups like Armory that work in the cloud-native space are seeing an uptick in interest. Armory is a company built to be a commercial layer on top of the open-source continuous delivery project Spinnaker. Today, it announced a $40 million Series C.

B Capital led the round, with help from new investors Lead Edge Capital and Marc Benioff along with previous investors Insight Partners, Crosslink Capital, Bain Capital Ventures, Mango Capital, Y Combinator and Javelin Venture Partners. Today’s investment brings the total raised to more than $82 million.

“Spinnaker is an open-source project that came out of Netflix and Google, and it is a very sophisticated multi-cloud and software delivery platform,” company co-founder and CEO Daniel R. Odio told TechCrunch.

Odio points out that this project has the backing of industry leaders, including the three leading public cloud infrastructure vendors Amazon, Microsoft and Google, as well as other cloud players like CloudFoundry and HashiCorp. “The fact that there is a lot of open-source community support for this project means that it is becoming the new standard for cloud-native software delivery,” he said.

In the days before the notion of continuous delivery, companies moved forward slowly, releasing large updates over months or years. As software moved to the cloud, this approach no longer made sense and companies began delivering updates more incrementally, adding features when they were ready. Adding a continuous delivery layer helped facilitate this move.

As Odio describes it, Armory extends the Spinnaker project to help implement complex use cases at large organizations, including around compliance and governance and security. It is also in the early stages of implementing a SaaS version of the solution, which should be available next year.

While he didn’t want to discuss customer numbers, he mentioned JPMorgan Chase and Autodesk as customers, along with less specific allusions to “a Fortune Five technology company, a Fortune 20 Bank, a Fortune 50 retailer and a Fortune 100 technology company.”

The company currently has 75 employees, but Odio says business has been booming and he plans to double the team in the next year. As he does, he says that he is deeply committed to diversity and inclusion.

“There’s actually a really big difference between diversity and inclusion, and there’s a great Vern? Myers quote that diversity is being asked to the party and inclusion is being asked to dance, and so it’s actually important for us not only to focus on diversity, but also focus on inclusion because that’s how we win. By having a heterogeneous company, we will outperform a homogeneous company,” he said.

While the company has moved to remote work during COVID, Odio says they intend to remain that way, even after the current crisis is over. “Now obviously COVID been a real challenge for the world, including us. We’ve gone to a fully remote-first model, and we are going to stay remote-first even after COVID. And it’s really important for us to be taking care of our people, so there’s a lot of human empathy here,” he said.

But at the same time, he sees COVID opening up businesses to move to the cloud and that represents an opportunity for his business, one that he will focus on with new capital at his disposal. “In terms of the business opportunity, we exist to help power the transformation that these enterprises are undergoing right now, and there’s a lot of urgency for us to execute on our vision and mission because there is a lot of demand for this right now,” he said.

Oct
06
2020
--

Various Ways to Perform Schema Upgrades with Percona XtraDB Cluster

Perform Schema Upgrades with Percona XtraDB Cluster

Perform Schema Upgrades with Percona XtraDB ClusterSchema changes are the big challenges in Galera replication. So, it is recommended to understand the schema changes operation for everyone who uses the Percona XtraDB Cluster (PXB)/Galera clusters. In this blog, I am going to explain the operation and impact of the various schema changes methods used in the PXB/Galera cluster.

  • Schema changes with “wsrep_OSU_method = TOI”
  • Schema changes with “wsrep_OSU_method = RSU”
  • Schema changes with “ONLINE ALGORITHMS”
  • Schema changes with “pt-osc”
  • Schema changes with “gh-ost”

For testing:

  • I have configured the 3-node Percona Xtradb Cluster (8.0.19).
  • Executing read/write load using the sysbench.
mysql> select @@wsrep_cluster_address\G
*************************** 1. row ***************************
@@wsrep_cluster_address: gcomm://pxc81,pxc82,pxc83
1 row in set (0.00 sec)

mysql> select @@version, @@version_comment\G
*************************** 1. row ***************************
        @@version: 8.0.19-10
@@version_comment: Percona XtraDB Cluster (GPL), Release rel10, Revision 727f180, WSREP version 26.4.3
1 row in set (0.00 sec)

What is the Impact of Schema Changes in Clusters?

  • By default (TOI), all the nodes in the cluster will be pause during the ALTER process. Because the ALTER needs to be replicated on all the nodes. If the ALTER is big it will affect the performance and could be the cause of the downtime.
  • Rollback is not possible on schema upgrade. 
  • You can’t kill the ALTER query immediately during the operation. So, your application may need to wait until the ALTER completion. 
mysql> pager grep alter
PAGER set to 'grep alter'

mysql> show processlist;
| 19 | root            | localhost | schema_changes | Query   |   18 | altering table           | alter table sbtest1 add index idx_c(c) |         0 |             0 |
7 rows in set (0.00 sec)

mysql> kill 19;
ERROR 1095 (HY000): You are not owner of thread 19

  • MDLs are set only on one node. Not across all the nodes in the cluster. So, you need additional control over this. 

Schema Changes with “wsrep_OSU_method = TOI”

TOI: Total Order Isolation

  • TOI is the default method ( wsrep_OSU_method = TOI ) for schema changes.
  • DDL statements are processed in the same order with regard to other transactions in each node. 
  • The full cluster will be blocked/locked during the DDL operation. 
  • This guarantees data consistency.
mysql> select @@wsrep_OSU_method\G
*************************** 1. row ***************************
@@wsrep_OSU_method: TOI
1 row in set (0.00 sec)

Example:

I am going to run the below ALTER on “pxc81”.

alter table sbtest1 add index idx_c(c)

After initiating the ALTER on pxc81, My processlist states the COMMIT and UPDATE ( from sysbench ) statements are paused. Only ALTER is in progress. The COMMIT and UPDATE will be resumed once the ALTER is completed.

| 17 | root            | localhost | schema_changes | Execute |      15 | closing tables                           | COMMIT                                 |         0 |             0 |
| 17 | root            | localhost | schema_changes | Execute |      15 | updating                                 | UPDATE sbtest1 SET c='91668836759-30934071579-18064439108-53267873872-79461377960-32104006456-143369 |         0 |             1 |
| 24 | root            | localhost | schema_changes | Query   |      15 | altering table                           | alter table sbtest1 add index idx_c(c) |         0 |             0 |

But, still, the SELECT statement can be run with “wsrep_sync_wait != 1” because “wsrep_sync_wait = 1” needs the casualty checks from other nodes. So, it will fail.  

SELECT with “wsrep_sync_wait=1”

|  1 | system user     |           | schema_changes | Query   |     0 | altering table         | alter table sbtest1 add index idx_c(c) |         0 |             0 |
| 15 | root            | localhost | schema_changes | Query   |    40 | starting               | select * from sbtest1 where id=1       |         0 |             0 |

mysql> select * from sbtest1 where id=1;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

TOI can be the right choice for quick operations.

  • CREATE STATEMENTS
  • RENAME INDEX
  • RENAME TABLE
  • DROP INDEX
  • ALGORITHM=INSTANT

Schema Changes with “wsrep_OSU_method = RSU”

RSU – Rolling Schema Upgrade

  • In this method, DDL statements will not replicate across the cluster nodes. Need to execute the DDL individually on all nodes.
  • The node which is executing the DDL will desync from the cluster group. The other nodes in the cluster are still operational and receive the application connections.
  • Once the node executes the DDL, it will start to apply the missing writesets.
  • In this method, the important thing is the WRITEs should not be performed on that particular table until the schema upgrade completes on all the nodes. Users should be very clear on this because the failure will break the cluster and the data may be unrecoverable. 
  • Gcache should be good enough to store the writesets.

Example:

At pxc82, I am going to execute the ALTER.

Session 1: (setting up the value to RSU – session-level) 

mysql> set wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.09 sec)

Session 2: (checking the node status)

mysql> show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_recv_queue    | 0      |
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
2 rows in set (0.00 sec)

Session 1: (executing the ALTER )

mysql> alter table sbtest1 add index idx_c(c);

Session 2: (checking again the node status )

Here the node went to Donor/Desynced state once the ALTER started. You can see the queue also keeps increasing.

mysql> nopager;  show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
PAGER set to stdout
+---------------------------+----------------+
| Variable_name             | Value          |
+---------------------------+----------------+
| wsrep_local_recv_queue    | 2053           |
| wsrep_local_state_comment | Donor/Desynced |
+---------------------------+----------------+
2 rows in set (0.21 sec)

Session 1: (ALTER completed)

mysql> alter table sbtest1 add index idx_c(c);
Query OK, 0 rows affected (2 min 6.52 sec)
Records: 0  Duplicates: 0  Warnings: 0

Session 2: (Node synced to cluster)

mysql> show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| wsrep_local_recv_queue    | 0      |
| wsrep_local_state_comment | Synced |
+---------------------------+--------+
2 rows in set (0.00 sec)

This step needs to be executed in pxc81 and pxc83 as well. After completing on all nodes, we are good to allow the WRITEs for that table. 

The RSU method is not truly disruption-free, as there are few bugs reported regarding RSU. Users should be very clear and careful about executing the RSU for schema updates:

https://jira.percona.com/browse/PXC-2620

https://jira.percona.com/browse/PXC-2293

https://jira.percona.com/browse/PXC-1980

Schema Changes with “ONLINE ALGORITHMS”

So far, we have 3 algorithms,

  • INPLACE
  • COPY
  • INSTANT

With TOI:

“ALGORITHM = INPLACE / COPY” still pauses the cluster during the operation. Galera doesn’t allow transactions when an ALTER TABLE statement is run. So if you are using TOI, any ALTER TABLE will block all transactions on all nodes.

| 17 | root            | localhost | schema_changes | Execute |      12 | closing tables                           | COMMIT                                                               |         0 |             0 |
| 18 | root            | localhost | schema_changes | Execute |      12 | closing tables                           | COMMIT                                                               |         0 |             0 |
| 32 | root            | localhost | schema_changes | Query   |      13 | altering table                           | alter table sbtest1 add index idx_c(c), algorithm=inplace,

“ALGORITHM=INSTANT” is supported and faster in TOI.

mysql> alter table sbtest1 add column test_Ins int , algorithm=instant;
Query OK, 0 rows affected (0.24 sec)
Records: 0  Duplicates: 0  Warnings: 0 lock=none

With RSU:

“ALGORITHM = INPLACE/COPY” is still not beneficial on RSU. It pauses the Galera replication and takes the node to Desync. 

mysql> show processlist;
| 62 | root            | localhost | schema_changes | Query   |    51 | altering table                       | alter table sbtest1 add index idx_c(c), algorithm=inplace, lock=none |         0 |             0 |
5 rows in set (0.06 sec)

mysql> nopager;  show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';
PAGER set to stdout
+---------------------------+----------------+
| Variable_name             | Value          |
+---------------------------+----------------+
| wsrep_local_recv_queue    | 7335           |
| wsrep_local_state_comment | Donor/Desynced |
+---------------------------+----------------+
2 rows in set (0.03 sec)

“ALGORITHM=INSTANT” is supported and faster in RSU. But, still, you can use TOI to avoid the additional work.

mysql> alter table sbtest1 add column test_Inss int , algorithm=instant;
Query OK, 0 rows affected (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select @@wsrep_OSU_method;
+--------------------+
| @@wsrep_OSU_method |
+--------------------+
| RSU                |
+--------------------+
1 row in set (0.02 sec)

I would suggest using the “ALGORITHM = INSTANT ” with TOI wherever you can. But, make sure you have the MySQL 8.x + version. Unfortunately, “ALGORITHM=INSTANT” currently only supports adding new columns. 

Schema Changes with “pt-osc”

pt-osc : Percona-online-schema-change

Personally, I like this approach very much and use this mostly in production environments. Pt-osc provides non-blocking schema upgrades on all nodes in one shot. This should be used with the TOI method. The action flow will be like this:

  • Create a new table “_tablename_new” with the required modification
  • Creates triggers for update the modified rows (insert / update / delete)
  • Copy the records from the original table to the new table using chunk operation.
  • Once the copy is completed, it will swap the table ( original ? _old, _new ? original ) and drop the triggers and old table. Direct DDLs ( RENAME TABLE, DROP TABLE ) will be used for this operation ( wsrep_OSU_method=TOI ). 

For the below ALTER, 

alter table schema_changes.sbtest1 add index idx_test_Ins(test_Ins);

Pt-osc flow in SQL syntax:

Creating new table...
CREATE TABLE `schema_changes`.`_sbtest1_new` (
  `id` int NOT NULL AUTO_INCREMENT,
  `k` int NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  `test_Ins` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=20400335 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Created new table schema_changes._sbtest1_new OK.
Altering new table...
ALTER TABLE `schema_changes`.`_sbtest1_new` add index idx_test_Ins(test_Ins)
Altered `schema_changes`.`_sbtest1_new` OK.
Not creating triggers because this is a dry run.
Not copying rows because this is a dry run.
INSERT LOW_PRIORITY IGNORE INTO `schema_changes`.`_sbtest1_new` (`id`, `k`, `c`, `pad`, `test_ins`) SELECT `id`, `k`, `c`, `pad`, `test_ins` FROM `schema_changes`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 9253 copy nibble*/
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `schema_changes`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/
Not swapping tables because this is a dry run.
Not dropping old table because this is a dry run.
Not dropping triggers because this is a dry run.
DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_del`
DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_upd`
DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_ins`
2020-09-30T08:31:17 Dropping new table...
DROP TABLE IF EXISTS `schema_changes`.`_sbtest1_new`;
2020-09-30T08:31:17 Dropped new table OK.

Pt-osc provides several options to perform the effective operations.  You can control the connections, active threads, load, chunk size etc ..

For Galera, we have the option “–max-flow-ctrl”. The option will check the average time cluster spent pausing for FC and make the tool pause if it goes over the percentage indicated in the option. By default, the tool will not check the FC.

[root@pxc81 log]# less /bin/pt-online-schema-change  | grep -i pausing
         print STDERR "Pausing because PXC Flow Control is active\n";
         print STDERR "Pausing because "

To make the schema changes on FOREIGN KEY tables, I would suggest using the “alter-foreign-keys-method = rebuild_constraints”. This helps to maintain the consistency of the schema and its relations. In this approach, before dropping the old table, it runs ALTER on all the child tables to drop existing FK and re-add new FK constraints that points to the columns from the new table. Again, adding and dropping the FOREIGN KEY will be the direct ALTER using TOI.

Schema changes with “gh-ost”

Gh-ost is doing a similar approach like “pt-osc”. It also helps to do the non-blocking ALTERs on all cluster nodes in one shot. The main difference is gh-ost is triggerless. Gh-ost uses the binary log to track the changes. So you need the following variables and thresholds to perform the gh-ost operation. 

log-bin=sakthi-bin
binlog-format=ROW
log-slave-updates=ON

The flow will be like,

  • Creates gh-ost table with the required modifications
  • Copy the records from the original table to the new table using chunk operation.
  • Apply the live changes by reading the DML events from binary logs.
  • Once the binary log events are applied, it will swap the tables ( original –> _old, gh-ost –> original ) and drop the old table.

Example:

[root@pxc81 schema_changes]# gh-ost --alter="add index idx_test_Inss(test_Ins)" --database=schema_changes --table=sbtest1 --user=root --password=Jesus@7sakthI --allow-on-master --execute
[2020/09/30 09:40:56] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql 127.0.0.1 3306 root    false false <nil> false UTC true 0 0s 0s 0 false}
[2020/09/30 09:40:56] [info] binlogsyncer.go:354 begin to sync binlog from position (binlog.000027, 196850993)
[2020/09/30 09:40:56] [info] binlogsyncer.go:203 register slave for master server 127.0.0.1:3306
[2020/09/30 09:40:56] [info] binlogsyncer.go:723 rotate to (binlog.000027, 196850993)
# Migrating `schema_changes`.`sbtest1`; Ghost table is `schema_changes`.`_sbtest1_gho`
# Migrating pxc81:3306; inspecting pxc81:3306; executing on pxc81
# Migration started at Wed Sep 30 09:40:56 +0000 2020
# chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle 
# Serving on unix socket: /tmp/gh-ost.schema_changes.sbtest1.sock
Copy: 0/6563240 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: binlog.000027:196853401; Lag: 0.02s, State: migrating; ETA: N/A
Copy: 0/6563240 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: binlog.000027:196858195; Lag: 0.01s, State: migrating; ETA: N/A
Copy: 22000/6563240 0.3%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: binlog.000027:201067135; Lag: 0.01s, State: migrating; ETA: 9m58s

.......

Copy: 5682000/6563240 86.6%; Applied: 0; Backlog: 0/1000; Time: 16m10s(total), 16m10s(copy); streamer: binlog.000028:213168607; Lag: 0.01s, State: migrating; ETA: 2m30s
Copy: 6563000/6563240 100.0%; Applied: 0; Backlog: 0/1000; Time: 20m20s(total), 20m20s(copy); streamer: binlog.000028:382677405; Lag: 0.01s, State: migrating; ETA: 0s

Gh-ost also provides several options to perform effective operations.  You can control the connections, active threads, load, chunk size, etc.

But unfortunately, “–max-flow-ctl” option is not available in gh-ost. 

Conclusion

So, finally, I would say,

  • Always use the direct ALTER with TOI for the metadata changes and INSTANT ALTERs.
  • Use pt-online-schema-change with TOI and use the optimal flow control thresholds for InnoDB tables.
  • Schedule pt-online-schema-change operation in off-peak hours for FOREIGN KEY tables.
  • If you use RSU, never forget that you need to execute the ALTER on all nodes individually and you should block the WRITEs for that particular table. Make sure, your Gcache size is good enough to hold the writesets.
  • If you are concerned with triggers, you can use the gh-ost to make the ALTERs.
Oct
05
2020
--

Grab Your Percona Swag – For Free!

Percona Swag

Would you like to get the latest in Percona gear 100% free, shipped to you anywhere in the world? Maybe that sounds too good to be true, but it’s true!  It’s easy and takes as little as 20 minutes to earn your swag. Here are some examples of the swag items you can claim:

Percona Swag

So what’s the catch? Percona software products are now listed on four online software directories, but our listings are too new to have accumulated many user reviews. We need reviews!

So our offer is simple. You write one review, you pick one Percona swag item. You write two reviews, you pick two. Seven reviews, pick seven pieces of swag, our limit. But you must post your reviews by November 15, 2020!

Any meaningful review earns swag, be it positive, negative, or mixed. Write whatever you believe; only write something! There’s no swag for a review that gives a rating but says nothing at all or nothing meaningful, so make those reviews count!

Here’s where to post reviews:

Product Capterra   G2           TrustRadius   SourceForge
Percona Monitoring and Management Capterra G2 TrustRadius SourceForge
Percona Server For MySQL Capterra G2 TrustRadius SourceForge
Percona XtraDB Cluster Capterra G2 TrustRadius SourceForge
Percona XtraBackup Capterra G2 TrustRadius SourceForge
Percona Distribution for PostgreSQL n/a G2 TrustRadius SourceForge
Percona Backup for MongoDB n/a G2 TrustRadius SourceForge
Percona Server for MongoDB n/a n/a TrustRadius SourceForge
Percona Kubernetes Operator for Percona XtraDB Cluster n/a G2 TrustRadius SourceForge

You can review several different products and post them on one site, or you can write one product review and post it on multiple sites.  Or post any combination of reviews, up to a max of seven.  The more reviews you post, the more the swag delivered to your home address for free, courtesy of Percona.

To claim your swag, write to <community-team@percona.com>.  Include:

  • Links to each review you posted.
  • Your postal mailing address.
  • Your phone number (for delivery use only, never for marketing)

For t-shirt orders, also state:

  • Color (White, Black, or Blue)
  • Size (Small, Medium, Large, or Extra Large)

It’s that simple! Start writing now!

Oct
01
2020
--

PostgreSQL 13 New Feature: dropdb –force

postgresql dropdb --force

There have been many big features added to PostgreSQL 13, like Parallel Vacuum, de-duplication of indexes, etc., and a complete list can be found at PostgreSQL 13 release notes. Along with the big features, there are also small ones added, including dropdb –force.

Dropdb –force

A new command-line option is added to dropdb command, and a similar SQL option “FORCE” is also added in DROP DATABASE. Using the option -f or –force with dropdb command or FORCE with DROP DATABASE to drop the database, it will terminate all existing connections with the database. Similarly, DROP DATABASE FORCE will do the same.

In the first terminal, create a test database and a database test, and connect to the database.

vagrant@vagrant:~$ createdb test;
vagrant@vagrant:~$ psql test
psql (13.0)
Type "help" for help.

In the second terminal, try to drop the test database and you will get the error message that the test database is being used by another user.

vagrant@vagrant:/usr/local/pgsql.13/bin$ psql postgres
psql (13.0)
Type "help" for help.
postgres=# drop database test;
ERROR:  database "test" is being accessed by other users
DETAIL:  There is 1 other session using the database.

Now try the same command with the FORCE option. You will see that the database is dropped successfully.

postgres=# drop database test WITH ( FORCE );
DROP DATABASE

Note: you can also use the command line dropdb test -f.

The session on the first terminal will be terminated.

test=# \d
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> 

Looking for more info on other PostgreSQL 13 changes? Check out Ibrar’s previous post, Postgresql_fdw Authentication Changes in PostgreSQL 13!

Oct
01
2020
--

Webinar October 12: Percona Server for MongoDB – A Solid and Compatible Alternative to MongoDB

Percona Server MongoDB Italian Webinar

Percona Server MongoDB Italian WebinarThis webinar will be delivered in Italian.

MongoDB is the most used document-oriented database but some of the more requested features are available only in the Enterprise version. Percona Server for MongoDB (PSMDB) is an alternative to MongoDB with its open source, solid, and 100% compatible server. In this webinar, in Italian, you will learn about PSMDB and our Italian financial customer, Fabrick, will discuss its experience with Percona Support Services.

Please join Corrado Pandiani, Sr Consultant, and Maurizio La Plagia Head, of IT PFM & Smart Banking at Fabrick, on Monday, October 12, 2020, at 9 am EDT/ 3 pm CET for the webinar “Percona Server for MongoDB: A Solid and Compatible Alternative to MongoDB“.

Register for Webinar

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Oct
01
2020
--

Altinity grabs $4M seed to build cloud version of ClickHouse open-source data warehouse

Earlier this month, cloud data warehouse Snowflake turned heads when it debuted on the stock market. Today, Altinity, the commercial company behind the open-source ClickHouse data warehouse, announced a $4 million seed round from Accel along with a new cloud service, Altinity.Cloud.

“Fundamentally, the company started out as an open-source services bureau offering support, training and [custom] engineering features into ClickHouse. And what we’re doing now with this investment from Accel is we’re extending it to offer a cloud platform in addition to the other things that we already have,” CEO Robert Hodges told TechCrunch.

As the company describes it, “Altinity.Cloud offers immediate access to production-ready ClickHouse clusters with expert enterprise support during every aspect of the application life cycle.” It also helps with application design and implementation and production assistance, in essence combining the consulting side of the house with the cloud service.

The company was launched in 2017 by CTO Alexander Zaitsev, who was one of the early adopters of ClickHouse. Up until now the startup has been bootstrapped with revenue from the services business.

Hodges came on board last year after a stint at VMware because he saw a company with tremendous potential, and his background in cloud services made him a good person to lead the company as it built the cloud product and moved into its next phase.

ClickHouse at its core is a relational database that can run in the cloud or on-prem with big improvements in performance, Hodges says. And he says that developers are enamored with it because you can start a project on a laptop and scale it up from there.

“We’re very simple to operate, just a single binary. You can start from a Docker image. You can run it anywhere, literally anywhere that Linux runs, from an Intel Nuc all the way up to clusters with hundreds of nodes,” Hodges explained.

The investment from Accel should help them finish building the cloud product, which has been in private beta since July, while helping them build a sales and marketing operation to help sell it to the target enterprise market. The startup currently has 27 people, with plans to hire 15 more.

Hodges says that he wants to build a diverse and inclusive company, something he says the tech industry in general has failed at achieving. He believes that one of the reasons for that is the requirement of a computer science degree, which he says has created “a gate for women and people of color,” and he thinks by hiring people with more diverse backgrounds, you can build a more diverse company.

“So one of the things that’s high up on my list is to get back to a more equitable and diverse population of people working on this thing,” he said.

Over time, the company sees the cloud business overtaking the consulting arm in terms of revenue, but that aspect of the business will always have a role in the revenue mix because this is complex by its nature, even with a cloud service.

“Customers can’t just do it entirely by having a push-button interface. They will actually need humans that work with them, and help them understand how to frame problems, help them understand how to build applications that take care of that […] And then finally, help them deal with problems that naturally arise when you’re when you’re in production,” he said.

Sep
29
2020
--

PostgreSQL Configuration Changes You Need to Make Post-Installation

PostgreSQL Configuration Changes

PostgreSQL Configuration ChangesSo you’ve installed postgres onto your machine, and you want to start working with it.

How?

The key to understanding the post-installation procedure is to realize that it “depends”.

  • It “depends” on the OS i.e. MSWindows vs Linux.
  • It “depends” on the flavor of Linux i.e. Debian vs RedHat.
  • It “depends” if it’s a package install or from source code.

Let’s start by working with the most basic steps common to all installs and we’ll break it down further from there.

A successfully installed postgres, no matter the version, is characterized by the following:

  1. a newly created datacluster is present
  2. a configuration file pg_hba.conf is to be edited
  3. a configuration file postgresql.conf is to be edited

There are other configuration files but we’ll work with these.

For the purposes of discussion let’s further assume you’ve started up the cluster and postgres is running on the host. Here’s an example of what you can see when you run a utility, such as netstat,  that reports the network connections:

$netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address  State     PID/Program name
tcp     0      0    127.0.0.1:5432  0.0.0.0:*        LISTEN     27929/postgres

The first thing is to look at is the “Local Address”. Notice how it says 127.0.0.1:5432. Okay, so that means that the server is currently listening on the localhost on port 5432. But you want 0.0.0.0:5432 otherwise remote connections cannot be accepted. With an editor, open up file pg_hba.conf and look at the “default” rules. Keep in mind that the configuration file can be located in one of several locations, we’ll cover that later.

ATTENTION: Setting the Address (CIDR) to 0.0.0.0 is for connectivity purposes only. As soon as you know everything works you should restrict this to as few permitted connections as possible. This is not something you should do on a production machine.

The actual “rules” per line can vary from one type of postgres installation to another. The good news is that RedHat/Centos look alike and all Debian/Ubuntu have their own similar styles too. The relevant settings are at the bottom of the file as all else above is commented documentation.

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
# IPv6 local connections:
host    all             all             ::1/128                 peer
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            md5
host    replication     all             ::1/128                 md5

Look at the first line, where TYPE is “local”. So long as you can log in locally, via UNIX DOMAIN SOCKETS, and sudo as the superuser, postgres is the default, you can access your service without a password.

METHOD should be peer but if it uses something else, like md5, you’ll need to change the string. Alternatively, if you feel particularly trustful of the other user accounts on the host, you can use the METHOD trust permitting free access to all locally logged-in UNIX accounts.

# ATTENTION:
# the service must be reloaded for any edits to pg_hba.conf to take effect
#
$sudo su - postgres
$psql -c "select 'hello world' as greetings"
greetings
-------------
hello world

Looking at the second line one sees that TYPE is IPV4. This rule, as well as the rule for TYPE IPv6, prevents localhost logins unless one knows the password:

$psql -h localhost -c "select 'hello world' as greetings"
Password for user postgres:

So let’s fix this by assigning a password to ROLE postgres by logging via UNIX DOMAIN SOCKETS since we already permit logins by METHOD peer:

--
-- example invocation, change the password to something real
--
ALTER ROLE postgres WITH PASSWORD 'mypassword';

TIP: edits to pg_hba.conf requires the service to reload the file i.e. SIGHUP

Now that we’ve had connectivity for localhost connections, we’re using an IP v4 socket for this example, we can now proceed to address remote connections.

You’re going to need to add another rule which should be placed after the localhost rule:

host all all 0.0.0.0/0 md5

And here’s a line you can write for IPV6:

host all all ::0/0 md5

TIP: The demonstrated example rules let everybody connect to the host. A knowledge of CIDR is key to enforcing network security.

Keeping in mind that your system will be unique, here’s what the pg_hba.conf should start to look like:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
host    all             all             0.0.0.0/0               md5
# IPv6 local connections:
host    all             all             ::1/128                 md5
host    all             all             ::0/0                   md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            md5
host    replication     all             ::1/128                 md5

You’re almost there!

Now that you’ve added a password to the superuser and updated the configuration file pg_hba.conf, it’s time to visit another configuration file postgresql.conf.

Locate the file and edit runtime parameter listen_addresses. The default setting prohibits remote connections. Resetting the value either to a nic’s IP address or just using the wild card will make it accessible.

TIP: As postgres, execute the following in a psql session in order to locate your configuration files.

select distinct sourcefile from pg_settings;

For those people feeling fancy, one can bind the postgres service to more than one IP address as a comma-separated list:

listen_addresses = '*'
#listen_addresses = 'localhost' # what IP address(es) to listen on;
                                # comma-separated list of addresses;
                                # defaults to 'localhost'; use '*' for all
                                # (change requires restart)

An alternate method updating the runtime parameters can also be accomplished using the SQL statement:

postgres=# ALTER SYSTEM SET listen_addresses = '*';
ALTER SYSTEM

The final step, restarting the service, is where we start splitting hairs again:

  • Redhat distributions require dataclusters to be manually created before they can be administered.
  • PostgreSQL Debian distributions, including Ubuntu, automatically creates and starts up the datacluster.

Systemd:

  • Redhat/Centos:
    /usr/pgsql-11/bin/postgresql-12-setup initdb
    systemctl start|stop postgresql-12
  • Debian/Ubuntu:
    systemctl restart postgresql

Debian derived Linux Distributions include a collection of command-line utilities in order to administer the PostgreSQL service:

# example CLI
#
pg_ctlcluster
Usage: /usr/bin/pg_ctlcluster <version> <cluster> <action> [-- <pg_ctl options>]
# restarting postgres version 12 on a Debian derived distribution
pg_ctlcluster 12 main restart

After a successful service restart you should get something similar to the following:

Active Internet connections (only servers)
Proto Recv-Q Send-Q  Local Address    Foreign Address  State   PID/Program name
tcp     0      0      0.0.0.0:5432    0.0.0.0:*        LISTEN  27929/postgres

And finally, the remote connectivity test:

#
# THE REMOTE LOGIN
#
psql 'host=myhost user=postgres password=mypassword' -c "select 'hello world' as greeetings "

Then there’s replication, but that’s another blog altogether.

That’s it for now!


Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF

Sep
24
2020
--

Webinar October 14: Percona, AWS, & ScienceLogic – Converting DBaaS to a Fully Managed Solution

Converting DBaaS to a Fully Managed Solution

Converting DBaaS to a Fully Managed SolutionDatabase-as-a-service (DBaaS) can be thought of as a platform that can be used to manage an organization’s database environment(s). One of the most well known DBaaS platforms is Aurora powered by AWS.

In this webinar, Ananias Tsalouchidis, Senior MySQL DBA at Percona,  will discuss how Percona can convert your DBaaS to a fully-managed solution ensuring that Aurora is properly optimized for better application performance, creating proper architecture and design, achieving better monitoring and troubleshooting, and performing other mission-critical platform operations.

We’ll be joined by Richard Chart, Chief Scientist at ScienceLogic, who will discuss the hands-on implications of working with Percona and utilizing our technology to support their growing needs. In addition, we’ll be joined by technical experts Vijay Karumajji and Aditya Samant, MySQL Specialist Solutions Architects from AWS, who’ll touch base on the many benefits of Aurora.

Please join Ananias Tsalouchidis, Richard Chart, Vijay Karumajji, and Aditya Samant on Wednesday, October 14, 2020, at 1:00 pm EDT for the webinar “Percona, AWS, & ScienceLogic – Converting DBaaS to a Fully Managed Solution“.

Register for Webinar

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com