Feb
20
2023
--

PostgreSQL Migration From Digital Ocean DBaaS to Digital Ocean Droplet

PostgreSQL Migration

Recently, one of our customers approached us with a unique challenge: they needed to migrate their entire PostgreSQL cluster from DigitalOcean’s Database as a Service (DBaaS) to a DigitalOcean Droplet. The reason for their migration from DBaaS to Droplets was to lower their cost. This task proved to be quite challenging, as DigitalOcean’s documentation clearly states that “We do not currently support migrating databases from clusters inside of DigitalOcean to other clusters inside of DigitalOcean.”

In short, we have to migrate the database as per the client’s request, and we gave them two options:

1. pg_dump

2. Logical replication

The pg_dump method requires downtime as we must take the dump and restore it on the new server. Logical replication keeps the source database operational while transferring data to the target database. Once we reach the desired state, we can cut over to the target database.

For migrating to the logical replication method, all tables required to get replicated must have a Primary Key/Unique Key.

Prerequisites for migration

To migrate an existing database to a DigitalOcean database cluster, we need to ensure logical replication is enabled on the source database, have the source database’s connection credentials and disable or update any firewalls between the databases.

Have Superuser permissions: For preparing a database for migration and to migrate a database, we need superuser permissions on the source database.

Make database publicly accessible: To migrate a database, the source database’s hostname or IP address must be accessible from the public internet. Public connection information for DigitalOcean databases are in the database’s Connection Details in the control panel.

Allow remote connections: First, verify that the database allows all remote connections. This is determined by the database’s listen_addresses variable, which allows all remote connections when its value is set to *. To check its current value, run the following query in the PostgreSQL (psql) terminal:

SHOW listen_addresses;
If enabled, the command line returns:
listen_addresses
-----------
*
(1 row)

If the output is different, allow remote connections in your database by running the following query:

ALTER SYSTEM SET listen_addresses = '*';

We must also change your local IPv4 connection to allow all incoming IPs. To do this, find the configuration file pg_hba.conf with the following query:

SHOW hba_file;

Open pg_hba.conf in your text editor, such as nano: nano pg_hba.conf

Under IPv4 local connections, find and replace the IP address with 0.0.0.0/0, which allows all IPv4 addresses:

# TYPE DATABASE USER ADDRESS METHOD

# IPv4 local connections:
host all all 0.0.0.0/0 md5
# IPv6 local connections:
host all all ::/0 md5

Enable logical replication:

Most cloud database providers have logical replication enabled by default. Logical replication may not be enabled if you migrate a database from an on-premises server. If your database is not set up for logical replication, the migration process will not work because the database can only move your schemas, not your data.

To verify that logical replication has been enabled, run the following query in the PostgreSQL (psql) terminal:

show wal_level;
If enabled, the output returns:
wal_level
-----------
logical
(1 row)
If the output is different, enable logical replication in your database by setting wal_level to logical:
ALTER SYSTEM SET wal_level = logical;

Change max replication slots:

After enabling logical replication, we need to verify that the database’s max_replication_slots value is equal to or greater than the number of databases we have in your PostgreSQL server. To check the current value, run the following query in the PostgreSQL (psql) terminal:

show max_replication_slots;

The output returns:

max_replication_slots
-----------

(1 row)

If it is smaller than the number of databases in our PostgreSQL server, adjust it by running the following query, where use_your_number is the number of databases in our server:

ALTER SYSTEM SET max_replication_slots = use_your_number;

And restart the server.

Challenges we face during migration

There are some challenges when we implement a logical replication without having any primary key. There are two different methods to implement logical replication without having a PK column, the second being by using a unique key.

This can be implemented with a similar set of steps that we perform. Also its function is similar. Here, instead of the primary key, a unique key is going to keep updates.

Caveats

  • It does not support DELETE/UPDATE without a replica identity.
  • A unique index can not be used with a replica identity if NULLs are allowed.
  • Using REPLICA IDENTITY to FULL
  • When no appropriate index is found for replica identity, we may set replica identity to FULL. In this case, all the table columns collectively act as a primary key.
  • Due to supplemental logging, this generates a huge amount of WALs.
  • This may be slower than the traditional one.

Things to consider

We need to set the replica identity full for the tables that are logically migrated using only the UNIQUE key as otherwise DELETE/UPDATE won’t be supported.

After data gets synced from the DBaaS fork to the new droplet VM, we need to perform the pg_dump and pg_restore method for sequences. Now here is a question that arises: Why do we need to dump the sequence and why cannot we replicate it via logical replication?

Logical replication is designed to track the WAL changes and report to subscribers about the current states and values. It would be quite contradicting to replicate a sequence because the current sequence value does not equal the value stored in the WAL. To remedy this, PostgreSQL documentation suggests manually copying over the sequence values or using a utility such as pg_dump to do the copying.

  • Dump the sequences from the DBaaS DB fork
  • Stop the DBaaS DB fork
  • Restore the sequences on the new droplet
  • Disable the logical subscriptions

Below is the short summary that has been followed to migrate the environment:

Source: Digital Ocean DBasS
Destination: Digital Ocean Droplets
Process:

  • The client has chosen migration via a logical replication process to reduce downtime.
  • On the target VM, we installed Percona Distribution for PostgreSQL 13.7.
  • Dumped the roles from source to destination, i.e., DBasS.
  • Listed out the tables that don’t have PK and informed them.
  • The client added the PK for some tables and the UNIQUE key for some tables.
  • Installed the extensions on the VM that was present on the source cluster.
  • Dumped only schema from source, i.e., DBasS
  • Restored the schema on destination, i.e., Droplets
  • Adjusted the logical replication-related parameters on the source and destination like max_replication_slots, max_logical_replication_workers, and max_wal_senders.
  • Configured the logical replication by creating the publication and subscription between the source and destination.
  • Once the destination is in sync, disable the subscribers.
  • Dumped the sequences from the source and restored them on the destination.
  • Adjusted the listen_address, pg_hba files on the destination.
  • Dropped the subscribers on the destination

Conclusion

As we all know, PostgreSQL is an open source, object-relational database built with a focus on extensibility, data integrity, and speed. Its concurrency support makes it fully ACID-compliant, and we can achieve the result of migrating customer data from DBasS to Droplets by using one of the great features of PostgreSQL, i.e., logical replication, and we dumped the sequences from the source and restored them on the destination.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jun
30
2021
--

Device42 introduces multi-cloud migration analysis and recommendation tool

In 2020 lots of workloads shifted to the cloud due to the pandemic, but that doesn’t mean that figuring out how to migrate those workloads got any easier. Device42, a startup that helps companies understand their infrastructure, has a new product that is designed to analyze your infrastructure and make recommendations about the most cost-effective way to migrate to the cloud.

Raj Jalan, CEO and co-founder, says that the tool uses machine learning to help discover the best configuration, and supports four cloud vendors including AWS, Microsoft, Google and Oracle plus VMware running on AWS.

“The [new tool] that’s coming out is a multi-cloud migration and recommendation [engine]. Basically, with machine learning what we have done is in addition to our discovery tool […] is we can constantly update based on your existing utilization of your resources, what it is going to cost you to run these resources across each of these multiple clouds,” Jalan explained.

This capability builds on the company’s core competency, which is providing a map of resources wherever they exist along with the dependencies that exist across the infrastructure, something that’s extremely hard for organizations to understand. “Our focus is on hybrid IT discovery and dependency mapping, [whether the] infrastructure is on prem, in colocation facilities or in the cloud,” he said.

That helps Device42 customers see how all of the different pieces of infrastructure including applications work together. “You can’t find a tool that does everything together, and also gives you a very deep discovery where you can go from the physical layer all the way to the logical layer, and see things like, ‘this is where my storage sits on this web server…’,” Jalan said.

It’s important to note that this isn’t about managing resources or making any changes to allocation. It’s about understanding your entire infrastructure wherever it lives and how the different parts fit together, while the newest piece finds the most cost-effective way to migrate to the cloud it from its current location.

The company has been around since 2012, has around 100 employees. It has raised around $38 million including a $34 million Series A in 2019. It hasn’t required a ton of outside investment as Jalan reports they are cash flow positive with “decent growth.”

Mar
18
2020
--

Big opening for startups that help move entrenched on-prem workloads to the cloud

AWS CEO Andy Jassy showed signs of frustration at his AWS re:Invent keynote address in December.

Customers weren’t moving to the cloud nearly fast enough for his taste, and he prodded them to move along. Some of their hesitation, as Jassy pointed out, was due to institutional inertia, but some of it also was due to a technology problem related to getting entrenched, on-prem workloads to the cloud.

When a challenge of this magnitude presents itself and you have the head of the world’s largest cloud infrastructure vendor imploring customers to move faster, you can be sure any number of players will start paying attention.

Sure enough, cloud infrastructure vendors (ISVs) have developed new migration solutions to help break that big data logjam. Large ISVs like Accenture and Deloitte are also happy to help your company deal with migration issues, but this opportunity also offers a big opening for startups aiming to solve the hard problems associated with moving certain workloads to the cloud.

Think about problems like getting data off of a mainframe and into the cloud or moving an on-prem data warehouse. We spoke to a number of experts to figure out where this migration market is going and if the future looks bright for cloud-migration startups.

Cloud-migration blues

It’s hard to nail down exactly the percentage of workloads that have been moved to the cloud at this point, but most experts agree there’s still a great deal of growth ahead. Some of the more optimistic projections have pegged it at around 20%, with the U.S. far ahead of the rest of the world.

Mar
05
2020
--

Etsy’s 2-year migration to the cloud brought flexibility to the online marketplace

Founded in 2005, Etsy was born before cloud infrastructure was even a thing.

As the company expanded, it managed all of its operations in the same way startups did in those days — using private data centers. But a couple of years ago, the online marketplace for crafts and vintage items decided to modernize and began its journey to the cloud.

That decision coincided with the arrival of CTO Mike Fisher in July 2017. He was originally brought in as a consultant to look at the impact of running data centers on Etsy’s ability to innovate. As you might expect, he concluded that it was having an adverse impact and began a process that would lead to him being hired to lead a long-term migration to the cloud.

That process concluded last month. This is the story of how a company born in data centers made the switch to the cloud, and the lessons it offers.

Stuck in a hardware refresh loop

When Fisher walked through the door, Etsy operated out of private data centers. It was not even taking advantage of a virtualization layer to maximize the capacity of each machine. The approach meant IT spent an inordinate amount of time on resource planning.

Feb
20
2019
--

Why Daimler moved its big data platform to the cloud

Like virtually every big enterprise company, a few years ago, the German auto giant Daimler decided to invest in its own on-premises data centers. And while those aren’t going away anytime soon, the company today announced that it has successfully moved its on-premises big data platform to Microsoft’s Azure cloud. This new platform, which the company calls eXtollo, is Daimler’s first major service to run outside of its own data centers, though it’ll probably not be the last.

As Daimler’s head of its corporate center of excellence for advanced analytics and big data Guido Vetter told me, the company started getting interested in big data about five years ago. “We invested in technology — the classical way, on-premise — and got a couple of people on it. And we were investigating what we could do with data because data is transforming our whole business as well,” he said.

By 2016, the size of the organization had grown to the point where a more formal structure was needed to enable the company to handle its data at a global scale. At the time, the buzz phrase was “data lakes” and the company started building its own in order to build out its analytics capacities.

Electric lineup, Daimler AG

“Sooner or later, we hit the limits as it’s not our core business to run these big environments,” Vetter said. “Flexibility and scalability are what you need for AI and advanced analytics and our whole operations are not set up for that. Our backend operations are set up for keeping a plant running and keeping everything safe and secure.” But in this new world of enterprise IT, companies need to be able to be flexible and experiment — and, if necessary, throw out failed experiments quickly.

So about a year and a half ago, Vetter’s team started the eXtollo project to bring all the company’s activities around advanced analytics, big data and artificial intelligence into the Azure Cloud, and just over two weeks ago, the team shut down its last on-premises servers after slowly turning on its solutions in Microsoft’s data centers in Europe, the U.S. and Asia. All in all, the actual transition between the on-premises data centers and the Azure cloud took about nine months. That may not seem fast, but for an enterprise project like this, that’s about as fast as it gets (and for a while, it fed all new data into both its on-premises data lake and Azure).

If you work for a startup, then all of this probably doesn’t seem like a big deal, but for a more traditional enterprise like Daimler, even just giving up control over the physical hardware where your data resides was a major culture change and something that took quite a bit of convincing. In the end, the solution came down to encryption.

“We needed the means to secure the data in the Microsoft data center with our own means that ensure that only we have access to the raw data and work with the data,” explained Vetter. In the end, the company decided to use the Azure Key Vault to manage and rotate its encryption keys. Indeed, Vetter noted that knowing that the company had full control over its own data was what allowed this project to move forward.

Vetter tells me the company obviously looked at Microsoft’s competitors as well, but he noted that his team didn’t find a compelling offer from other vendors in terms of functionality and the security features that it needed.

Today, Daimler’s big data unit uses tools like HD Insights and Azure Databricks, which covers more than 90 percents of the company’s current use cases. In the future, Vetter also wants to make it easier for less experienced users to use self-service tools to launch AI and analytics services.

While cost is often a factor that counts against the cloud, because renting server capacity isn’t cheap, Vetter argues that this move will actually save the company money and that storage costs, especially, are going to be cheaper in the cloud than in its on-premises data center (and chances are that Daimler, given its size and prestige as a customer, isn’t exactly paying the same rack rate that others are paying for the Azure services).

As with so many big data AI projects, predictions are the focus of much of what Daimler is doing. That may mean looking at a car’s data and error code and helping the technician diagnose an issue or doing predictive maintenance on a commercial vehicle. Interestingly, the company isn’t currently bringing to the cloud any of its own IoT data from its plants. That’s all managed in the company’s on-premises data centers because it wants to avoid the risk of having to shut down a plant because its tools lost the connection to a data center, for example.

Jan
08
2019
--

Amazon reportedly acquired Israeli disaster recovery service CloudEndure for around $200M

Amazon has reportedly acquired Israeli disaster recovery startup CloudEndure. Neither company has responded to our request for confirmation, but we have heard from multiple sources that the deal has happened. While some outlets have been reporting the deal was worth $250 million, we are hearing it’s closer to $200 million.

The company provides disaster recovery for cloud customers. You may be thinking that disaster recovery is precisely why we put our trust in cloud vendors. If something goes wrong, it’s the vendor’s problem — and you would be right to make this assumption, but nothing is simple. If you have a hybrid or multi-cloud scenario, you need to have ways to recover your data in the event of a disaster like weather, a cyberattack or political issue.

That’s where a company like CloudEndure comes into play. It can help you recover and get back and running in another place, no matter where your data lives, by providing a continuous backup and migration between clouds and private data centers. While CloudEndure currently works with AWS, Azure and Google Cloud Platform, it’s not clear if Amazon would continue to support these other vendors.

The company was backed by Dell Technologies Capital, Infosys and Magma Venture Partners, among others. Ray Wang, founder and principal analyst at Constellation Research, says Infosys recently divested its part of the deal and that might have precipitated the sale. “So much information is sitting in the cloud that you need backups and regions to make sure you have seamless recovery in the event of a disaster,” Wang told TechCrunch.

While he isn’t clear what Amazon will do with the company, he says it will test just how open it is. “If you have multi-cloud and want your on-prem data backed up, or if you have backup on one cloud like AWS and want it on Google or Azure, you could do this today with CloudEndure,” he said. “That’s why I’m curious if they’ll keep supporting Azure or GCP,” he added.

CloudEndure was founded in 2012 and has raised just over $18 million. Its most recent investment came in 2016 when it raised $6 million, led by Infosys and Magma.

May
09
2018
--

Google to acquire cloud migration startup Velostrata

Google announced today it was going to acquire Israeli cloud migration startup, Velostrata. The companies did not share the purchase price.

Velostrata helps companies migrate from on-premises datacenters to the cloud, a common requirement today as companies try to shift more workloads to the cloud. It’s not always a simple matter though to transfer those legacy applications, and that’s where Velostrata could help Google Cloud customers.

As I wrote in 2014 about their debut, the startup figured out a way to decouple storage and compute and that had wide usage and appeal. “The company has a sophisticated hybrid cloud solution that decouples storage from compute resources, leaving the storage in place on-premises while running a virtual machine in the cloud,” I wrote at the time.

But more than that, in a hybrid world where customer applications and data can live in the public cloud or on prem (or a combination), Velostrata gives them control to move and adapt the workloads as needed and prepare it for delivery on cloud virtual machines.

“This means [customers] can easily and quickly migrate virtual machine-based workloads like large databases, enterprise applications, DevOps, and large batch processing to and from the cloud,” Eyal Manor VP of engineering at Google Cloud wrote in the blog post announcing the acquisition.

This of course takes Velostrata from being a general purpose cloud migration tool to one tuned specifically for Google Cloud in the future, but one that gives Google a valuable tool in its battle to gain cloud marketshare.

In the past, Google Cloud head Diane Greene has talked about the business opportunities they have seen in simply “lifting and shifting” data loads to the cloud. This acquisition gives them a key service to help customers who want to do that with the Google Cloud.

Velostrata was founded in 2014. It has raised over $31 million from investors including Intel Capital and Norwest Venture partners.

Nov
28
2017
--

VMware expands AWS partnership with new migration and disaster recovery tools

 Remember how VMware was supposed to be disrupted by AWS? Somewhere along the way it made a smart move. Instead of fighting the popular cloud platform, it decided to make it easier for IT to use its products on AWS. Today, at the opening of the AWS re:invent customer conference, it announced plans to expand that partnership with some new migration and disaster recovery services. As Mark… Read More

Dec
09
2015
--

CloudEndure Disaster Recovery Service Secures $7 Million Investment

Frustrated IT executive sitting on floor of data center. Disasters can take many forms from weather events to database corruptions. CloudEndure, a cloud-based disaster recovery service, announced a $7 million investment today led by Indian consulting firm Infosys and previous investor Magma Venture Partners. Today’s investment brings the total to just over $12 million. At first blush, Infosys may seem like an odd partner, a traditional… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com