Jan
06
2021
--

Google Cloud Platform: MySQL at Scale with Reliable HA Webinar Q&A

MySQL at Scale with Reliable HA

Earlier in November, we had a chance to present the “Google Cloud Platform: MySQL at Scale with Reliable HA.” We discussed different approaches to hosting MySQL in Google Cloud Platform with the available options’ pros and cons. This webinar was recorded and can be viewed here at any time. We had several great questions, which we would like to address and elaborate on the answers given during the webinar.

MySQL at Scale with Reliable HA

Q: What is your view on Cloud SQL High Availability in Google Cloud?

A: Google Cloud SQL provides High Availability through regional instances. If your Cloud SQL database is regional, it means that there’s a standby instance in another zone within the same region. Both instances (primary and standby) are kept synced through synchronous replication on the persistent disk level. Thanks to this approach, in case of an unexpected failover, no data is lost. The biggest disadvantage of this approach is that you have to pay for standby resources even though you can’t use the standby instance for any traffic, which means you double your costs with no performance benefits. Failover typically takes more than 30 seconds.

To sum up, High Availability in Google Cloud SQL is reliable but can be expensive, and failover time is not always enough for critical applications.

 

Q: How would one migrate from Google Cloud SQL to AWS RDS?

A: The easiest way to migrate if you can afford downtime is stopping the write workload to the Cloud SQL instance, taking a logical backup (mysql or mydumper), restoring it on AWS RDS, and then moving the entire workload to AWS RDS. In most cases, it’s not enough. The situation is more complex when you want to make it with no (or minimal) downtime.

To avoid downtime, you need to establish replication between your Cloud SQL (source) and RDS instances (replica). Cloud SQL can be used as a source instance for external replicas, as described in this documentation. You can take a logical backup from running a Cloud SQL instance (e.g., using mydumper), restore it to RDS and establish the replication between Cloud SQL and RDS. Using an external source for RDS is described here. It’s typically a good idea to use a VPN connection between both cloud regions to ensure your connection is secure and the database is not exposed to the public internet. Once replication is established, the steps are as follows:

  • Stop write traffic on Google Cloud SQL instance
  • Wait for the replication to catch up (synch all binlogs)
  • Make RDS instance writable and stop Cloud SQL -> RDS replication
  • Move write traffic to the RDS instance
  • Decommission your Cloud SQL instance

AWS DMS service can also be used as an intermediary in this operation.

 

Q: Is replication possible cross-cloud, e.g., Google Cloud SQL to AWS RDS, AWS RDS to Google Cloud SQL? If GCP is down, will RDS act as a primary and vice versa?

A: In general, replication between clouds is possible (see the previous question). Both Google Cloud SQL and AWS RDS can act as source and replica, including external instances as a part of your replication topology. High-availability solutions, though, in both cases, are very specific for a cloud provider implementation, and they can’t cooperate. So it’s not possible to automatically failover from RDS to GCP and vice versa. For such setups, we would recommend custom installation on Google Compute Instance and AWS EC2 with Percona Managed Database Services – if you don’t want to manage such a complex setup on your own.



Q: How did you calculate IOPS and throughput for the storage options?

A: We did not calculate the presented values in any way. Those are taken directly from Google Cloud Platform Documentation.


Q: How does GCP achieve synchronous replication?

A: Synchronous replication is possible only between the source and respective standby instance; it’s impossible to have synchronous replication between the primary and your read replicas. Each instance has its own persistent disk. Those disks are kept in sync – so replication happens on the storage layer, not the database layer. There are no implementation details about how it works available.


Q: Could you explain how to keep the primary instance available and writable during the maintenance window?

A: It’s not possible to guarantee the primary instance availability. Remember that even if you choose your maintenance window when you can accept downtime, it may or may not be followed (it’s just a preference). Maintenance events can happen at any point in time if they’re critical and may not be finished during the assigned window. If that’s not possible to accept by your application, we recommend designing a highly-available solution, e.g., with Percona XtraDB Cluster on Google Compute Engine instances instead. Such a solution won’t have such maintenance window problems.

Dec
02
2020
--

Google acquires Actifio to step into the area of data management and business continuity

In the same week that Amazon is holding its big AWS confab, Google is also announcing a move to raise its own enterprise game with Google Cloud. Today the company announced that it is acquiring Actifio, a data management company that helps companies with data continuity to be better prepared in the event of a security breach or other need for disaster recovery. The deal squares Google up as a competitor against the likes of Rubrik, another big player in data continuity.

The terms of the deal were not disclosed in the announcement; we’re looking and will update as we learn more. Notably, when the company was valued at over $1 billion in a funding round back in 2014, it had said it was preparing for an IPO (which never happened). PitchBook data estimated its value at $1.3 billion in 2018, but earlier this year it appeared to be raising money at about a 60% discount to its recent valuation, according to data provided to us by Prime Unicorn Index.

The company was also involved in a patent infringement suit against Rubrik, which it also filed earlier this year.

It had raised around $461 million, with investors including Andreessen Horowitz, TCV, Tiger, 83 North, and more.

With Actifio, Google is moving into what is one of the key investment areas for enterprises in recent years. The growth of increasingly sophisticated security breaches, coupled with stronger data protection regulation, has given a new priority to the task of holding and using business data more responsibly, and business continuity is a cornerstone of that.

Google describes the startup as as a “leader in backup and disaster recovery” providing virtual copies of data that can be managed and updated for storage, testing, and more. The fact that it covers data in a number of environments — including SAP HANA, Oracle, Microsoft SQL Server, PostgreSQL, and MySQL, virtual machines (VMs) in VMware, Hyper-V, physical servers, and of course Google Compute Engine — means that it also gives Google a strong play to work with companies in hybrid and multi-vendor environments rather than just all-Google shops.

“We know that customers have many options when it comes to cloud solutions, including backup and DR, and the acquisition of Actifio will help us to better serve enterprises as they deploy and manage business-critical workloads, including in hybrid scenarios,” writes Brad Calder, VP, engineering, in the blog post. :In addition, we are committed to supporting our backup and DR technology and channel partner ecosystem, providing customers with a variety of options so they can choose the solution that best fits their needs.”

The company will join Google Cloud.

“We’re excited to join Google Cloud and build on the success we’ve had as partners over the past four years,” said Ash Ashutosh, CEO at Actifio, in a statement. “Backup and recovery is essential to enterprise cloud adoption and, together with Google Cloud, we are well-positioned to serve the needs of data-driven customers across industries.”

Oct
22
2020
--

Using Volume Snapshot/Clone in Kubernetes

Volume snapshot and clone Kubernetes

Volume snapshot and clone KubernetesOne of the most exciting storage-related features in Kubernetes is Volume snapshot and clone. It allows you to take a snapshot of data volume and later to clone into a new volume, which opens a variety of possibilities like instant backups or testing upgrades. This feature also brings Kubernetes deployments close to cloud providers, which allow you to get volume snapshots with one click.

Word of caution: for the database, it still might be required to apply fsfreeze and FLUSH TABLES WITH READ LOCK or

LOCK BINLOG FOR BACKUP

.

It is much easier in MySQL 8 now, because as with atomic DDL, MySQL 8 should provide crash-safe consistent snapshots without additional locking.

Let’s review how we can use this feature with Google Cloud Kubernetes Engine and Percona Kubernetes Operator for XtraDB Cluster.

First, the snapshot feature is still beta, so it is not available by default. You need to use GKE version 1.14 or later and you need to have the following enabled in your GKE: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver#enabling_on_a_new_cluster.

It is done by enabling “Compute Engine persistent disk CSI Driver“.

Now we need to create a Cluster using storageClassName: standard-rwo for PersistentVolumeClaims. So the relevant part in the resource definition looks like this:

persistentVolumeClaim:
        storageClassName: standard-rwo
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 11Gi

Let’s assume we have cluster1 running:

NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          49m
cluster1-haproxy-1                                 2/2     Running   0          48m
cluster1-haproxy-2                                 2/2     Running   0          48m
cluster1-pxc-0                                     1/1     Running   0          50m
cluster1-pxc-1                                     1/1     Running   0          48m
cluster1-pxc-2                                     1/1     Running   0          47m
percona-xtradb-cluster-operator-79d786dcfb-btkw2   1/1     Running   0          5h34m

And we want to clone a cluster into a new cluster, provisioning with the same dataset. Of course, it can be done using backup into a new volume, but snapshot and clone allow for achieving this much easier. There are still some additional required steps, I will list them as a Cheat Sheet.

1. Create VolumeSnapshotClass (I am not sure why this one is not present by default)

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotClass
metadata:
        name: onesc
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

2. Create snapshot

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: snapshot-for-newcluster
spec:
  volumeSnapshotClassName: onesc
  source:
    persistentVolumeClaimName: datadir-cluster1-pxc-0

3. Clone into a new volume

Here I should note that we need to use the following as volume name convention used by Percona XtraDB Cluster Operator, it is:

datadir-<CLUSTERNAME>-pxc-0

Where CLUSTERNAME is the name used when we create clusters. So now we can clone snapshot into a volume:

datadir-newcluster-pxc-0

Where newcluster is the name of the new cluster.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datadir-newcluster-pxc-0
spec:
  dataSource:
    name: snapshot-for-newcluster
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 11Gi

Important: the volume spec in storageClassName and accessModes and storage size should match the original volume.

After volume claim created, now we can start newcluster, however, there is still a caveat; we need to use:

forceUnsafeBootstrap: true

Because otherwise, Percona XtraDB Cluster will think the data from the snapshot was not after clean shutdown (which is true) and will refuse to start.

There is still some limitation to this approach, which you may find inconvenient: the volume can be cloned in only the same namespace, so it can’t be easily transferred from the PRODUCTION namespace into the QA namespace.

Though it still can be done but will require some extra steps and admin Kubernetes privileges, I will show how in the following blog posts.

Sep
08
2020
--

Google Cloud launches its Business Application Platform based on Apigee and AppSheet

Unlike some of its competitors, Google Cloud has recently started emphasizing how its large lineup of different services can be combined to solve common business problems. Instead of trying to sell individual services, Google is focusing on solutions and the latest effort here is what it calls its Business Application Platform, which combines the API management capabilities of Apigee with the no-code application development platform of AppSheet, which Google acquired earlier this year.

As part of this process, Google is also launching a number of new features for both services today. The company is launching the beta of a new API Gateway, built on top of the open-source Envoy project, for example. This is a fully managed service that is meant to make it easier for developers to secure and manage their API across Google’s cloud computing services and serverless offerings like Cloud Functions and Cloud Run. The new gateway, which has been in alpha for a while now, offers all the standard features you’d expect, including authentication, key validation and rate limiting.

As for its low-code service AppSheet, the Google Cloud team is now making it easier to bring in data from third-party applications thanks to the general availability to Apigee as a data source for the service. AppSheet already supported standard sources like MySQL, Salesforce and G Suite, but this new feature adds a lot of flexibility to the service.

With more data comes more complexity, so AppSheet is also launching new tools for automating processes inside the service today, thanks to the early access launch of AppSheet Automation. Like the rest of AppSheet, the promise here is that developers won’t have to write any code. Instead, AppSheet Automation provides a visual interface, that, according to Google, “provides contextual suggestions based on natural language inputs.” 

“We are confident the new category of business application platforms will help empower both technical and line of business developers with the core ability to create and extend applications, build and automate workflows, and connect and modernize applications,” Google notes in today’s announcement. And indeed, this looks like a smart way to combine the no-code environment of AppSheet with the power of Apigee .

Sep
01
2020
--

Google Cloud lets businesses create their own text-to-speech voices

Google launched a few updates to its Contact Center AI product today, but the most interesting one is probably the beta of its new Custom Voice service, which will let brands create their own text-to-speech voices to best represent their own brands.

Maybe your company has a well-known spokesperson for example, but it would be pretty arduous to have them record every sentence in an automated response system or bring them back to the studio whenever you launch a new product or procedure. With Custom Voice, businesses can bring in their voice talent to the studio and have them record a script provided by Google. The company will then take those recordings and train its speech models based on them.

As of now, this seems to be a somewhat manual task on Google’s side. Training and evaluating the model will take “several weeks,” the company says and Google itself will conduct its own tests of the trained model before sending it back to the business that commissioned the model. After that, the business must follow Google’s own testing process to evaluate the results and sign off on it.

For now, these custom voices are still in beta and only American English is supported so far.

It’s also worth noting that Google’s review process is meant to ensure that the result is aligned with its internal AI Principles, which it released back in 2018.

Like with similar projects, I would expect that this lengthy process of creating custom voices for these contact center solutions will become mainstream quickly. While it will just be a gimmick for some brands (remember those custom voices for stand-alone GPS systems back in the day?), it will allow the more forward-thinking brands to distinguish their own contact center experiences from those of the competition. Nobody likes calling customer support, but a more thoughtful experience that doesn’t make you think you’re talking to a random phone tree may just help alleviate some of the stress at least.

Aug
25
2020
--

Google Cloud Anthos update brings support for on-prem, bare metal

When Google announced Anthos last year at Google Cloud Next, it was a pretty big deal. Here was a cloud company releasing a product that purported to help you move your applications between cloud companies like AWS and Azure — GCP’s competitors — because it’s what customers demanded.

Google tapped into genuine anxiety that tech leaders at customer companies are having over vendor lock-in in the cloud. Back in the client-server days, most of these folks got locked into a tech stack where they were at the mercy of the vendor. It’s something companies desperately want to avoid this go-round.

With Anthos, Google claimed you could take an application, package it in a container and then move it freely between clouds without having to rewrite it for the underlying infrastructure. It was and remains a compelling idea.

This year, the company is updating the product to include a couple of specialty workloads that didn’t get into version 1.0 last year. For starters, many customers aren’t just multi-cloud, meaning they have workloads on various infrastructure cloud vendors, they are also hybrid. That means they still have workloads on-prem in their own data centers, as well as in the cloud, and Google wanted to provide a way to include these workloads in Anthos.

Pali Bhat, VP of product and design at Google Cloud, says they have heard customers still have plenty of applications on premises and they want a way to package them as containerized, cloud-native workloads.

“They do want to be able to bring all of the benefits of cloud to both their own data centers, but also to any cloud they choose to use. And what Anthos enables them to do is go on this journey of modernization and digital transformation and be able to take advantage of it by writing once and running it anywhere, and that’s a really cool vision,” Bhat said.

And while some companies have made the move from on prem to the cloud, they still want the comfort of working on bare metal where they are the only tenant. The cloud typically offers a multi-tenant environment where users share space on servers, but bare metal gives a customer the benefits of being in the cloud with the ability to control their own destiny as they do on prem.

Customers were asking for Anthos to support bare metal, and so Google gave the people what they wanted and are releasing a beta of Anthos for bare metal this week, which Bhat says provides the answer for companies looking to have the benefits of Anthos at the edge.

“[The bare metal support] lets customers run Anthos […] at edge locations without using any hypervisor. So this is a huge benefit for customers who are looking to minimize unnecessary overhead and unlock new use cases, especially both in the cloud and on the edge,” Bhat said.

Anthos is part of a broader cloud modernization platform that Google Cloud is offering customers that includes GKE (the Kubernetes engine), Cloud Functions (the serverless offering) and Cloud Run (container run time platform). Bhat says this set of products taps into a couple of trends they are seeing with customers. First of all, as we move deeper into the pandemic, companies are looking for ways to cut costs while making a faster push to the cloud. The second is taking advantage of that push by becoming more agile and innovative.

It seems to be working. Bhat reports that in Q2, the company has seen a lot of interest. “One of the things in Q2 of 2020 that we’ve seen is that just Q2, over 100,000 companies used our application modernization platform and services,” he said.

Jul
07
2020
--

Nvidia’s Ampere GPUs come to Google Cloud

Nvidia today announced that its new Ampere-based data center GPUs, the A100 Tensor Core GPUs, are now available in alpha on Google Cloud. As the name implies, these GPUs were designed for AI workloads, as well as data analytics and high-performance computing solutions.

The A100 promises a significant performance improvement over previous generations. Nvidia says the A100 can boost training and inference performance by over 20x compared to its predecessors (though you’ll mostly see 6x or 7x improvements in most benchmarks) and tops out at about 19.5 TFLOPs in single-precision performance and 156 TFLOPs for Tensor Float 32 workloads.

Image Credits: Nvidia

“Google Cloud customers often look to us to provide the latest hardware and software services to help them drive innovation on AI and scientific computing workloads,” said Manish Sainani, Director of Product Management at Google Cloud, in today’s announcement. “With our new A2 VM family, we are proud to be the first major cloud provider to market Nvidia A100 GPUs, just as we were with Nvidia’s T4 GPUs. We are excited to see what our customers will do with these new capabilities.”

Google Cloud users can get access to instances with up to 16 of these A100 GPUs, for a total of 640GB of GPU memory and 1.3TB of system memory.

Jun
16
2020
--

Google Cloud launches Filestore High Scale, a new storage tier for high-performance computing workloads

Google Cloud today announced the launch of Filestore High Scale, a new storage option — and tier of Google’s existing Filestore service — for workloads that can benefit from access to a distributed high-performance storage option.

With Filestore High Scale, which is based on technology Google acquired when it bought Elastifile in 2019, users can deploy shared file systems with hundreds of thousands of IOPS, 10s of GB/s of throughput and at a scale of 100s of TBs.

“Virtual screening allows us to computationally screen billions of small molecules against a target protein in order to discover potential treatments and therapies much faster than traditional experimental testing methods,” says Christoph Gorgulla, a postdoctoral research fellow at Harvard Medical School’s Wagner Lab., which already put the new service through its paces. “As researchers, we hardly have the time to invest in learning how to set up and manage a needlessly complicated file system cluster, or to constantly monitor the health of our storage system. We needed a file system that could handle the load generated concurrently by thousands of clients, which have hundreds of thousands of vCPUs.”

The standard Google Cloud Filestore service already supports some of these use cases, but the company notes that it specifically built Filestore High Scale for high-performance computing (HPC) workloads. In today’s announcement, the company specifically focuses on biotech use cases around COVID-19. Filestore High Scale is meant to support tens of thousands of concurrent clients, which isn’t necessarily a standard use case, but developers who need this kind of power can now get it in Google Cloud.

In addition to High Scale, Google also today announced that all Filestore tiers now offer beta support for NFS IP-based access controls, an important new feature for those companies that have advanced security requirements on top of their need for a high-performance, fully managed file storage service.

May
14
2020
--

Google makes it easier to migrate VMware environments to its cloud

Google Cloud today announced the next step in its partnership with VMware: the Google Cloud VMware Engine. This fully managed service provides businesses with a full VMware Cloud Foundation stack on Google Cloud to help businesses easily migrate their existing VMware-based environments to Google’s infrastructure. Cloud Foundation is VMware’s stack for hybrid and private cloud deployments.

Given Google Cloud’s focus on enterprise customers, it’s no surprise that the company continues to bet on partnerships with the likes of VMware to attract more of these companies’ workloads. Less than a year ago, Google announced that VMware Cloud Foundation would come to Google Cloud and that it would start supporting VMware workloads. Then, last November, Google Cloud acquired CloudSimple, a company that specialized in running VMware environments and that Google had already partnered with for its original VMware deployments. The company describes today’s announcement as the third step in this journey.

VMware Engine provides users with all of the standard Cloud Foundation components: vSphere, vCenter, vSAN, NSX-T and HCX. With this, Google Cloud General Manager June Yang notes in today’s announcement, businesses can quickly stand up their own software-defined data center in the Google Cloud.

“Google Cloud VMware Engine is designed to minimize your operational burden, so you can focus on your business,” she notes. “We take care of the lifecycle of the VMware software stack and manage all related infrastructure and upgrades. Customers can continue to leverage IT management tools and third-party services consistent with their on-premises environment.”

Google is also working with third-party providers like NetApp, Veeam, Zerto, Cohesity and Dell Technologies to ensure that their solutions work on Google’s platform, too.

“As customers look to simplify their cloud migration journey, we’re committed to build cloud services to help customers benefit from the increased agility and efficiency of running VMware workloads on Google Cloud,” said Bob Black, Dell Technologies Global Lead Alliance Principal at Deloitte Consulting. “By combining Google Cloud’s technology and Deloitte’s business transformation experience, we can enable our joint customers to accelerate their cloud migration, unify operations, and benefit from innovative Google Cloud services as they look to modernize applications.”

May
12
2020
--

Microsoft partners with Redis Labs to improve its Azure Cache for Redis

For a few years now, Microsoft has offered Azure Cache for Redis, a fully managed caching solution built on top of the open-source Redis project. Today, it is expanding this service by adding Redis Enterprise, Redis Lab’s commercial offering, to its platform. It’s doing so in partnership with Redis Labs and while Microsoft will offer some basic support for the service, Redis Labs will handle most of the software support itself.

Julia Liuson, Microsoft’s corporate VP of its developer tools division, told me that the company wants to be seen as a partner to open-source companies like Redis Labs, which was among the first companies to change its license to prevent cloud vendors from commercializing and repackaging their free code without contributing back to the community. Last year, Redis Labs partnered with Google Cloud to bring its own fully managed service to its platform and so maybe it’s no surprise that we are now seeing Microsoft make a similar move.

Liuson tells me that with this new tier for Azure Cache for Redis, users will get a single bill and native Azure management, as well as the option to deploy natively on SSD flash storage. The native Azure integration should also make it easier for developers on Azure to integrate Redis Enterprise into their applications.

It’s also worth noting that Microsoft will support Redis Labs’ own Redis modules, including RediSearch, a Redis-powered search engine, as well as RedisBloom and RedisTimeSeries, which provide support for new datatypes in Redis.

“For years, developers have utilized the speed and throughput of Redis to produce unbeatable responsiveness and scale in their applications,” says Liuson. “We’ve seen tremendous adoption of Azure Cache for Redis, our managed solution built on open source Redis, as Azure customers have leveraged Redis performance as a distributed cache, session store, and message broker. The incorporation of the Redis Labs Redis Enterprise technology extends the range of use cases in which developers can utilize Redis, while providing enhanced operational resiliency and security.”

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com