Why Database Schema Optimization Matters

Schema Optimization

If you have been around MySQL for any length of time, you are probably aware that choosing the correct data types and optimizing your schema are actually important tasks.  A few years back at Percona Live 2016, I gave an introductory talk on schema review and optimization. Here’s the video:


I was thinking about that talk in the context of some of my current clients.  Though I had worked on extremely large database deployments during my earlier tenure at Percona, it was often more of an outlier.  Now, working as a Technical Account Manager with our largest clients, it is much more common.

The Fundamental Problem

I’d like to expand my thoughts on the “choosing the smallest data type you can” principle from my 2016 slides through the lens of a few of my 2019 clients.  I gave an example of two copies of the same table (a simple 4 column, 3 index table with ~4 million rows), one using a bigint for the primary key and one using a regular unsigned int for the primary key:

[root@sample-host plive_2016]# ls -alh action*ibd
-rw-rw---- 1 mysql mysql 908M Apr 7 16:22 action_bigint.ibd 
-rw-rw---- 1 mysql mysql 636M Apr 7 16:23 action.ibd

In this example, there was almost a 30% space savings associated with using the smaller data type.  Obviously, at the scale of 1GB of space, this is trivial. One comment I made during the talk references the adage “storage is cheap”.  While this can be true at a certain scale, I’m seeing this thinking break down more frequently with my largest clients.

The Problem Magnified at Scale

As an illustration, one of my clients is running roughly 10,000 Percona Server instances in their sharded deployment.  These servers are running on bare metal with above-average hardware (for performance concerns). While that sounds like a lot of servers (and it definitely is), you also have to take into consideration other operational concerns such as backups.  For the sake of some easier math, let’s assume the following:

  • 5 servers/shard
  • 500G data directory
  • 5 backups of each shard (various time ranges such as daily, weekly, monthly)

Using those numbers as an estimate, one would be looking at roughly the following for space:

  • ~4.7 petabytes storage for running instances (SSD)
  • ~6 petabytes storage for backups (HDD)

The Business Impact on Large Deployments

Suddenly, at that scale, the 30% space that seemed trivial in my example seems a bit more important.  Let’s run some numbers based on the current $/TB pricing of SSD and spinning HDD with the 30% reduction in space:

  • SSD Savings ~ $140,000
    • $100/TB Cost
    • 30% of 4.7PB = 1.4PB
  • HDD Savings ~ $46,000
    • $25/TB Cost
    • 30% of 6PB = 1.8PB

Saving 3 petabytes of storage would reduce the raw storage costs by nearly $200,000.  On top of the actual capital costs of 3PB of storage, you have to consider server count, power, and maintenance (among others) which would increase this cost significantly.  Clearly, this is just a theoretical example of the potential impact a small mistake like picking the wrong data type can have on the overall infrastructure cost at scale. Generally, by the time a company reaches this scale, these sorts of optimizations have already been made and we need to start looking deeper and more creatively at other alternatives.

While this is definitely an oversimplification of calculating storage costs based on raw hardware prices, it does beg the question: even though it may not seem important now, why not ensure your schema is optimized before it becomes an expensive problem to fix!

For a more holistic view of how optimizations such as this one can save you money, check out our Database Management Savings Calculator or reach out to us for a more thorough review.


Want to be a superhero? Join the Database Performance Team!

database performance team

database performance team

Admit it, you’ve always wanted to fight danger, win battles and save the day! Who doesn’t? Do you want to be a superhero? Percona can show you how!

We don’t have any radioactive spiders or billionaire gadgets, but we do have our own team of superheroes dedicated to protecting your database performance: The Database Performance Team!

The Database Performance Team is comprised of our services experts, who work tirelessly every day to guarantee the performance of your database. Percona’s database services are some of our most valuable customer resources – besides the software itself. Whether it’s support, consulting, technical account managers, or remote DBAs, our support team is made up of superheroes that make sure your database is running at peak performance.

We want you to join us in the fight against poor performance. Join our Database Performance Team crew as part of the Database Street Team!

We’ll be introducing the members of our super group in the coming weeks. As we introduce the Database Performance Team (the “characters” below), we want you! We’ll be offering “missions” for you to complete: challenges, puzzles, or actions that get you prizes for success!

Your first mission: guess the identities of our secret team before we reveal them!

Mystery Character 1

Mystery Character 2

Mystery Character 3

Mystery Character 4

Character 5

Mystery-1 Mystery-2 Mystery-3 Mystery-4 Mystery-5
Hint: Hint: Hint: Hint: Hint:
Funny, friendly, quick-witted, supporting, fast and courteous – but still able to get the job done with amazing competence. Computer-like smarts, instant recall, a counselor, able to understand a problem and the solution quickly. Technical, with clairvoyant foresight, with the knowledge and statistics to account for all issues, manages problems before they happen. Remotely all-seeing, a director, good at multi-tasking, adapts-on-the-fly, cool in a crisis. Insanely strong, can’t be stopped, hard to knock down, the product of rigorous testing, unlimited endurance.
Who am I? Who am I? Who am I? Who am I? Who am I?

Follow @Percona on Twitter and use the hashtag #DatabasePerformanceTeam to cast your guess on who any mystery character is. Correctly guess any of their names or roles, and the lucky winner gets their choice of our mystery T-shirt in either men’s or women’s style.

Stay tuned, as we reveal the identities of the Database Performance Team over the coming weeks! Respond with your guess in the comments below.

Join the ranks of the Database Street Team! Fun games, cool prizes – more info is coming soon!

Some facts:*

Gartner has estimated the average cost of downtime at $5,000 per minute!

Join The Database Performance Team today!



*Source: Global Cloud-based Database Market 2014-2018

Powered by WordPress | Theme: Aeros 2.0 by