Feb
10
2023
--

Storage Autoscaling With Percona Operator for MongoDB

Previously, deploying and maintaining a database usually meant many burdensome chores and repetitive tasks to ensure proper functioning. In the cloud era, however, developers and operation engineers started fully embracing automation tools making their job significantly easier. Of those tools, Kubernetes operators are certainly one of the most prominent, as they are oftentimes used as a building block for DBaaS.

ViewBlock, a blockchain explorer, uses the Percona Operator for MongoDB to store critical data. Today along with their team, we will see how pvc-autoresizer can automate storage scaling for MongoDB clusters on Kubernetes.

Reasoning and goal

Nobody enjoys waking up in the middle of the night from disk usage alerts or, worse, a downed cluster due to a lack of free space on a volume that wasn’t properly set up to send proper warnings to its stakeholders.

Our goal is to automate storage scaling when our disk reaches a certain threshold of use and simultaneously reduce the amount of alert noise related to that. Specifically for our Operator, the volumes used by replicaset nodes are defined by their Persistent Volume Claims (PVCs), which are our targets to enable autoscaling.

Percona Kubernetes Operator

Let’s go

Prerequisites

The currently supported Kubernetes versions of pvc-autoresizer are 1.23 – 1.25. In addition to pvc-autoresizer that you can install through their Helm chart, Prometheus also needs to be running in your cluster to provide the resizer with the metrics it needs to determine if a PVC needs to be scaled.

Your CSI driver needs to support Volume Expansion and NodeGetVolumeStats, and the Storage Class used by your PVCs also requires the Volume Expansion feature to be active.

In our lab we will use AWS EKS with a standard storage class.

In action

You first need to add the following annotation to your Storage Class:

resize.topolvm.io/enabled=true

From that point on, your only requirement is to properly annotate the PVCs created by the Operator.

Assuming a simple 3-node Percona Server for MongoDB cluster without sharding and nothing else in your current namespace, you should be able to run the following commands to allow for automatic storage resizing of all the PVCs.

kubectl annotate pvc --all resize.topolvm.io/storage_limit="100Gi"
kubectl annotate pvc --all resize.topolvm.io/increase="20Gi"
kubectl annotate pvc --all resize.topolvm.io/threshold="20%"
kubectl annotate pvc --all resize.topolvm.io/enabled="true"

To describe that particular configuration, it tells pvc-autoresizer that your PVCs should increase by 20Gi when only 20% of space is left on them, with a maximum size of 100Gi beyond which they will not increase automatically.

Note: If you use version 1.14.0 of the Operator that’s scheduled for release this month, it will be possible to annotate the PVCs directly from your CR configuration file, for example:

spec:
  replsets:
  - name: rs0
    volumeSpec:
      persistentVolumeClaim:
        annotations:
         resize.topolvm.io/storage_limit: 100Gi
          resize.topolvm.io/increase: 20Gi
          resize.topolvm.io/threshold: 20%
          resize.topolvm.io/enabled: "true"

Limitations

Manual downscaling

It is possible to increase the storage automatically, but decreasing it is a manual process requiring replacing the volumes entirely. For example, AWS EBS volumes cannot be downsized and you’d need to delete the volumes before creating new ones with a smaller size. In Change Storage Class on Kubernetes on the Fly, we described how to change the storage class on the fly — a similar process will apply to decrease the size of the volumes.

Percentage threshold

As our increase thresholds are specified in percentages, the space available upon resize will inherently grow along with the size of the volume. When dealing with “big” disks, say 1TB, it is advised to set a low threshold to avoid triggering a rescale when a lot of space is still available.

Scaling quotas

Cloud providers, for example, AWS, have scaling quotas for the volumes. For EBS, you can resize a volume once in six hours. If your data ingestion is bigger than the increase amount you set and happens in less than that time, the resizing will fail. It is essential to consider your ingest rate and disk growth so that you can set the appropriate autoresizer configurations that suit your needs. Failure to do so might result in unwanted alerts and the need to transfer data to new PVCs, which you can learn about in Percona Operator Volume Expansion Without Downtime.

Statefulset and custom resource synchronization

When you provision new nodes or recreate one, their PVC will get bootstrapped with the volume request it gets from the immutable StatefulSet created at the initialization of your cluster, not with its latest size. You will need to set the appropriate annotations for those new PVCs to ensure that pvc-autoresizer scales them enough to allow for the replication to have enough space to proceed and that it doesn’t need to scale more than what your cloud provider permits. It is generally recommended to make sure the storage specs in StatefulSets and Custom Resources are in sync with real volumes.

Conclusion

And there you have it, a Percona Operator for MongoDB configuration that automatically scales its storage based on your needs!

It is still advised to have, at the very least, a minimal alerting setup in case you’re close to hitting the storage limit specified in the annotations since pvc-autoresizer requires it to be set.

ViewBlock is a blockchain-agnostic explorer allowing anyone to inspect blocks, transactions, address history, advanced statistics & much more.

Percona Operator for MongoDB automates deployment and management of replica sets and sharded clusters on Kubernetes.

Feb
17
2021
--

The Most Important Skills for an SRE, DBRE, or DBA

Important Skills for an SRE DBRE or DBA

Important Skills for an SRE DBRE or DBAI have talked extensively about the DBA’s evolving role and how many DBA’s and operations professionals are now becoming SRE’s (site reliability engineers) or DBRE’s (database reliability engineers). Often, databases get blamed as the bottleneck for application slowdowns and issues, so DBAs have had to develop the skills needed to chase problems up and down the stack over the years. This full-stack approach to hunting out problems has resulted in many former DBAs and Sysadmins successfully taking on the role of an SRE/DBRE.

The question is, then, what are the most critical skills for this important role?

I personally have interviewed 1000’s of technical candidates over the last 10 years and have hired hundreds in various roles here at Percona. I often get asked the most critical skill for the next generation of technical engineers, SREs, or DBREs. The answer has been consistent for me over my career – I want engineers, SREs, etc., with good problem-solving skills and the ability to think outside the box. I am not looking for book knowledge or a detailed understanding of every concept; I want people who can see something new and…

  1. Be curious enough to ask “Why?” and want to know the answer.
  2. Will dig into the ambiguous and want to learn, and can learn the why.
  3. Can solve the issue, answer the question, and share that knowledge effectively.

From a technical perspective, while it is wonderful to have a great depth of knowledge, I generally am not looking for the experts’ expert. Rather, I look for people who are smart, passionate, and who learn quickly. I am not alone in this. I asked the question of those involved in hiring technical people here at Percona.

Peter Zaitsev (CEO) said the number one skill he is looking for is this: “Attitude and Ability to do independent research and find information to solve the problem at hand.” For many like Peter, having an encyclopedic knowledge of how things work or the right commands to use is secondary to solving problems never seen before. Many problems and issues that come up you cannot specifically train for. The unique nature of the workload, size, and way too many external factors often offer unique challenges to even the most experienced SRE. Peter added: “So many people now have this ‘I have not been trained on this’ attitude instead of doing some basic googling for the answer.” Indeed, there is a lot of information out there, and while searching for an answer quickly during an outage or performance event may seem like a no-brainer, more than half the people I have interviewed don’t even think about it. Thinking on your feet, reacting quickly, and restoring service can save companies millions of dollars of lost revenue and business in an outage.

Have open source expertise you want to share? Submit your talk for Percona Live ONLINE 2021!

Marco Tusa (MySQL Tech Lead) echoed Peter’s sentiment by saying that there are two important skills for him. One of these is the ability to learn what they don’t know. “This is because no matter what, often the best one on tech knowledge won’t know some important stuff. The will to learn is the key.Lenz Grimmer (Sr Director of Server Engineering) could not have agreed more, adding: “I’m seeking talent that is open-minded about acquiring new skills. So fast learners with the right sense of humility and the right attitude.

Teamwork Makes the Dream Work…

Attitude and humility are critical in building an effective team (especially in a remote team). This was Marco’s second trait he is looking for. Marco went on to add he is also looking strongly into their fit with the team and if they will be a team player. The “no jerks” or “no soloist prima donna” mottos are very important. You have to be willing to share what you learned and look for help from your teammates.

This is the same thing Jay Janssen (Director of IT) said when asked about the number one thing he looks for: “Humility comes to mind — smart and humble is a good combination. While kind of cliche, it’s generally true.” We are all looking to hire smart people, but smart people who are Jerks or flaunt how smart they are generally don’t operate well in a team environment. You want someone who is smart but does not make other people feel small or insignificant.

Sanja Bonic (Head of Percona’s Open Source Program Office) also values teamwork and makes sure she tries to understand how people handle positive and negative interactions as a team.  Sanja, who has previously led Engineering teams at OpenShift and now works with Percona’s community, asks people in interviews about “their best and worst experiences in teams they’ve previously worked with. This usually shows what people are paying attention to, and you’ll quickly get a hint of what people attribute value to.”

While you need people to work and learn independently, you equally also need them to function as a unit (or as a team). Remember to ensure the uptime, availability, and performance of the entire application spanning potentially hundreds or thousands of nodes, you need to use all the resources at your disposal when things go wrong, and having teammates who you trust, can help, and can augment your knowledge with is very important. You can’t do it all alone, so having the ability to “team-up” and work with others is a must.

The strength of the team is each individual member. The strength of each member is the team.” ~ Phil Jackson

Sharing is Caring…

The ability for smart people to effectively share their knowledge and have good meaningful conversations is also critical in this role. Vadim Tkachenko (CTO) said he is looking for “People who have a brain and can have a meaningful conversation.” He went on to say he is looking for people who “Can speak well about previous relevant experiences.” This ability to share goes a long way internally to increase the collaborative spirit within the team. But this is not merely about speaking a single language; rather, it’s being able to talk about the technologies and match your audience’s expectations (or teammates).

Tate Mcdaniel (DBA Manager) says this is the number one thing he looks for when hiring people. His approach, in his words – “I ask questions about contentious and complicated things, and I look for answers that explain the complexity in ways a layperson can understand while giving pros/cons judiciously.” Taking the complex, explaining it, and educating others is of critical importance.

It is why Peter, Vadim, Jay, Marco, Tate, Lenz, and myself all said we look online at what people have written, where they have talked at conferences, what code they may have written, and other traces of their public persona before interviewing someone.

When I asked Lenz Grimmer if he looked at a candidate’s online persona, he said: “Absolutely, that’s one of the beauties of hiring in the open-source ecosystem. A public track record of contributions in various forms tells me much more than a CV. Forum and mailing list contributions, YouTube videos, all of which help get a better understanding of the candidate.”

One Person is an Island…

I personally highly value people’s willingness to share their insights, knowledge, and sometimes struggles. This is especially critical in the open-source space. I mentioned that no one person could manage a complex environment alone. Training and educating team members and others in the community is critical. The willingness to share and educate via online blogs, articles, and technical talks is, in my opinion, essential to the SRE/DBRE community as a whole.

So what do we see as the must-have skills?

  1. Problem-solving skills, the ability to troubleshoot unique and challenging problems.
  2. The passion and desire to learn, research, and acquire skills quickly.
  3. Humility and the ability to be a “team player” – No jerks allowed!
  4. The ability and passion for sharing their knowledge and educating others.

What do you think? Did we miss any?

Mar
20
2019
--

Blameless emerges from stealth with $20M investment to help companies transition to SRE

Site Reliability Engineering (SRE) is an extension of DevOps designed for more complex environments. The problem is that this type of approach is difficult to implement and has usually only been in reach of large companies, requiring custom software. Blameless, a Bay Area startup, wants to put it reach of everyone. It emerged from stealth today with an SRE platform for the masses and around $20 million in funding.

For starters, the company announced two rounds of funding with $3.6 million in seed money last April and a $16.5 million Series A investment more recently in January. Investors included Accel,  Lightspeed Venture Partners and others.

Company co-founder and CEO Ashar Rizqi knows first-hand just how difficult it is to implement an SRE system. He built custom systems for Box and Mulesoft before launching Blameless two years ago. He and his co-founder COO Lyon Wong saw a gap in the market where companies who wanted to implement SRE were being limited because of a lack of tooling and decided to build it themselves.

Rizqi says SRE changes the way you work and interact and Blameless gives structure to that change. “It changes the way you communicate, prioritize and work, but we’re adding data and metrics to support that shift” he said.

Screenshot: Blameless

As companies move to containers and continuous delivery models, it brings a level of complexity to managing the developers, who are working to maintain the delivery schedule, and operations, who must make sure the latest builds get out with a minimum of bugs. It’s not easy to manage, especially given the speed involved.

Over time, the bugs build up and the blame circulates around the DevOps team as they surface. The company name comes because their platform should remove blame from the equation by providing the tooling to get deeper visibility into all aspects of the delivery model.

At that point, companies can understand more clearly the kinds of compromises they need to make to get products out the door, rather than randomly building up this technical debt over time. This is exacerbated by the fact that companies are building their software from a variety of sources, whether open source or API services, and it’s hard to know the impact that external code is having on your product.

“Technical debt is accelerating as there is greater reliability on micro services. It’s a black box. You don’t own all the lines of code you are executing,” Rizqi explained. His company’s solution is designed to help with that problem.

The company currently has 23 employees and 20 customers including DigitalOcean and Home Depot.

Aug
17
2018
--

This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Beyond the MongoDB content that will be at Percona Live Europe 2018, there is also a bit of an agenda for MongoDB Europe 2018, happening on November 8 in London—a day after Percona Live in Frankfurt. I expect you’ll see a diverse set of MongoDB content at Percona Live.

The Percona Live Europe Call for Papers closes TODAY! (Friday August 17, 2018)

From Amazon, there have been some good MySQL changes. You now have access to time delayed replication as a strategy for your High Availability and disaster recovery. This works with versions 5.7.22, 5.6.40 and later. It is worth noting that this isn’t documented as working for MariaDB (yet?). It arrived in MariaDB Server in 10.2.3.

Another MySQL change from Amazon? Aurora Serverless MySQL is now generally available. You can build and run applications without thinking about instances: previously, the database function was not all that focused on serverless. This on-demand auto-scaling serverless Aurora should be fun to use. Only Aurora MySQL 5.6 is supported at the moment and also, be aware that this is not available in all regions yet (e.g. Singapore).

Releases

  • pgmetrics is described as an open-source, zero-dependency, single-binary tool that can collect a lot of information and statistics from a running PostgreSQL server and display it in easy-to-read text format or export it as JSON for scripting.
  • PostgreSQL 10.5, 9.6.10, 9.5.14, 9.4.19, 9.3.24, And 11 Beta 3 has two fixed security vulnerabilities may inspire an upgrade.

Link List

Industry Updates

  • Martin Arrieta (LinkedIn) is now a Site Reliability Engineer at Fastly. Formerly of Pythian and Percona.
  • Ivan Zoratti (LinkedIn) is now Director of Product Management at Neo4j. He was previously on founding teams, was the CTO of MariaDB Corporation (then SkySQL), and is a long time MySQL veteran.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL appeared first on Percona Database Performance Blog.

Oct
11
2017
--

Webinar Thursday, October 12, 2017: MongoDB Readiness from an SRE and Ops Viewpoint

MongoDB Readiness

MongoDB ReadinessJoin Percona’s MongoDB Practice Manager David Murphy on Thursday, October 12, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7) as he discusses MongoDB Readiness from an SRE and Ops Viewpoint.

Operations teams (SRE, PE, DevOps, etc.) are being asked to take a more active role in database provisioning and scaling. Much of the MongoDB material available online is from one, two, three or even five years ago (or more). Finding useful content online that is helpful in breaking through the current state of MongoDB maturity and stability can be challenging with all this outdated material exists – especially when MongoDB is massively different than it was even in the 2.X series.

This webinar will cut through the noise and provide the 2017 state of MongoDB. You can expect to leave knowing more about how it behaves, when to use it and how it handles things like high availability and backup and recovery.

We will also review both the good and bad history of MongoDB, and talk about why you need to know how something works today (not how it worked in 2010) in this fast-paced environment. You will leave knowing MongoDB’s current maturity, a high-level view of how it works today and what your risk/benefit charts should look like when considering using it.

Key ops areas covered:

  • MongoDB architecture
  • High availability
  • Ansible and MongoDB
  • Cloud provisioning
  • Effective monitoring solutions
  • How to make sure you have consistency
  • Top five ops challenges and their solutions
  • How to think about multiple regions with MongoDB

Register for the webinar here.

MongoDB BackupsDavid Murphy, MongoDB Practice Manager

David is the Practice Manager for MongoDB @ Percona. He joined Percona in Oct 2015, before that he has been deep in both the MySQL and MongoDB database communities for some time. Other passions include DevOps, tool building and security.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com