Mar
28
2016
--

MySQL 5.7 primary key lookup results: is it really faster?

MySQL 5.7 primary key lookup results

MySQL 5.7 primary key lookup resultsThis blog examines MySQL 5.7’s primary key lookup results, and determines if MySQL 5.7 is really faster than its early versions.

MySQL 5.7 was released some time ago, and now that the dust has settled it’s a good time to review its performance improvements.

I’m not doing this just to satisfy my own curiosity! Many customers still running MySQL 5.6 (or even MySQL 5.5) often ask “How much performance gain we can expect by switching to 5.7? Or will it actually be a performance hit, especially after Peter’s report here: https://www.percona.com/blog/2013/02/18/is-mysql-5-6-slower-than-mysql-5-5/?”

To determine the answer, we’ll look at some statistics. There are a variety workloads to consider, and we will start with the simplest one: MySQL primary key lookup for data that fits into memory. This workload does not involve transactions and is fully CPU-bound.

The full results, scripts and configurations can be found on our GitHub page.

For this test, my server is a 56-logical-thread system (2 sockets / 14 cores each / 2 hyper-threads each) powered by “Intel(R) Xeon(R) E5-2683 v3 @ 2.00GHz” CPUs.

These are the primary results:

Up to 20 threads, MySQL 5.5 clearly outperforms MySQL 5.7. After 20 threads, however, it hits scalability issues and starts struggling with throughput. MySQL 5.6 is a different story – it outperforms 5.7 up to 120 threads. After that 120 threads, MySQL 5.7 again scales much better, and it can maintain throughput to all 1000 threads.

The above results are on a system where the client and server use the same server. To verify the results, I also ran the test on a system configuration where the client and server are located on different servers, connected via 10GB network.

Here results are the results from that setup:

In this case, we pushed more load on the server (since the client does not share resources with MySQL), and we can see that MySQL 5.7 outperformed MySQL 5.6 after 68 threads (with MySQL 5.6 showing scalability problems even sooner).

There is another way to improve MySQL 5.6 results on large numbers of threads: good old innodb-thread-concurrency. Let’s see the MySQL 5.6 results after setting innodb-thread-concurrency=64:

We can see that using innodb-thread-concurrency improves MySQL 5.6 results when getting into hundreds of threads.

While investigating ways to improve overall throughput, I found disabling PERFORMANCE_SCHEMA during MySQL startup is a good option. The numbers got better after doing so. Below are the numbers for 5.6 and 5.7 with PERFORMANCE_SCHEMA disabled.

For MySQL 5.6:

For MySQL 5.7:

For MySQL 5.7, PERFORMANCE_SCHEMA’s overhead is quite visible.

Conclusions

I can say that Oracle clearly did a good job with MySQL 5.7, but they focused on primary keys lookups. They wanted to report 1.6M QPS.

I was not able to get to 1.6M; the best I could achieve was 470K QPS (with a disabled PERFORMANCE_SCHEMA). Full disclosure: I used sysbench 0.5 with LUA scripts and no prepared statements during this test. Oracle used the older sysbench 0.4 (with prepared statements), and their system had 144-logical threads.

MySQL 5.7, however, continues their tradition of slowness in low threads ranges. MySQL 5.6 was slower than MySQL 5.5, and MySQL 5.7 slower than MySQL 5.6.

PRIMARY KEY lookups aren’t the only workload type – there are many cases, some much more interesting! I will show the performance metrics for other workloads in upcoming posts.

Mar
28
2016
--

NTT to buy Dell’s services division for $3.05 billion

shutterstock dell logo You may know Dell as a computer and server maker, but Dell also operates a substantial IT services division — at least it did until today. NTT Data, the IT services company of NTT, is acquiring Dell Systems for $3.05 billion.
The main reason why Dell sold off its division is that the company needs cash, and quickly. When Dell acquired EMC for $67 Read More

Mar
27
2016
--

Dell reportedly near closing the sale of its IT services unit to Japan’s NTT Data

shutterstock dell logo Dell is reportedly finalizing the sale of its IT consulting unit to NTT Data for $3.5 billion. According to Reuters, the deal, which will help Dell pay down debt from its $67 billion acquisition of EMC Corp., may officially be announced today. Read More

Mar
24
2016
--

Percona How To: Field Names and Document Size in MongoDB

document size in MongoDB

document size in MongoDBIn this blog post, we’ll discuss how shorter field names impact performance and document size in MongoDB.

The MongoDB Manual Developer Notes state:

Shortening field names reduce expressiveness and does not provide considerable benefit for larger documents and where document overhead is not of significant concern. Shorter field names do not lessen the size of indexes because indexes have a predefined structure. In general, it is not necessary to use short field names.

This is a pretty one-sided statement, and we should be careful not to fall into this trap. At first glance, you might think “Oh that makes sense due to compression!” However, compression is only one part of the story. When we consider the size of a single document, we need to consider several things:

  • Size of the data in the application memory
  • Size over the network
  • Size in the replication log
  • Size in memory in the cache
  • Amount of data being sent to the compressor
  • Size on disk*
  • Size in the journal files*

As you can see, this is a pretty expansive list, and this is just for consideration on field naming – we haven’t even gotten to using the right data types for the value yet.

Further, only the last two items in the list (“*” starred) represent any part of the system that has compression (to date). Put another way, the conversation about compression only covers about 25% of the discussion about field names. MongoDB Inc’s comment is trying to sidestep nearly 75% of the rest of the conversation.

To ensure an even debate, I want to break size down into two major areas: Field Optimization and Value Optimization. They both touch on all of the areas listed above except for sorting, which is only about value optimization.

Field Optimization

When we talk about field optimization, it is purely considering using smaller field names. This might seem obvious, but when your database field names become object properties in your application code, the developers want these to be expressive (i.e., longer and space-intensive).

Consider the following:

locations=[];
for (i=1;i<=1000;i++){
   locations.push({ longitude : 28.2211, latitude : 128.2828 })
}
devices=[];
for (i=1;i<=10;i++){
   devices.push( {
       name:"iphone6",
       last_ping: ISODate(),
       version: 8.1 ,
       security_pass: true,
       last_10_locations: locations.slice(10,20)
   })
}
x={
   _id : ObjectId(),
   first_name: "David",
   last_name:     "Murphy",
   birthdate:     "Aug 16 2080",
   address :     "123 nowhere drive Nonya, TX, USA , 78701",
   phone_number1:     "512-555-5555",
   phone_number2:    "512-555-5556",
   known_locations: locations,
   last_checkin : ISODate(),
   devices : devices
}
>Object.bsonsize(x)
54879

Seems pretty standard, but wow! That’s 54.8k per document! Now let’s consider another format:

locations2=[];
for (i=1;i<=1000;i++){
   locations2.push({ lon : 28.2211, lat : 128.2828 })
}
devices2=[];
for (i=1;i<=10;i++){
   devices2.push( {
       n:"iphone6",
       lp: ISODate(),
       v: 8.1 ,
       sp: true,
       l10: locations.slice(10,20)
   })
}
y={
   _id : ObjectId(),
   fn:     "David",
   ln:     "Murphy",
   bd:     "Aug 16 2080",
   a :     "123 nowhere drive Nonya, TX, USA , 78701",
   pn1:     "512-555-5555",
   pn2:    "512-555-5556",
   kl:     locations2,
   lc :     ISODate(),
   d :     devices2
}
> Object.bsonsize(y)
41392
> Object.bsonsize(y)/Object.bsonsize(x)
0.754241148708978

This minor change saves space by 25%, without changing any actual data. I know you can already see things like kl or l10 and are wondering, “What the heck is that!” This is where some clever tricks with the application code can come in.

You can make a mapping collection in MongoDB, or keep it in your application code – so in the code  self.l10 is renamed to self.last_10_locations. Some people go so far as using constants – for example “self.LAST_10_LOCATIONS”  to “self.l10 = self.get_value(LAST_10_LOCATIONS)” – to reduce the field size.

Value Optimization

Using the same example, let’s assume we want to improve the field usage. We know we will always pull a user by their _id,  or the most recent people to check-in. To help optimize this further, let us assume “x” is still our main document:

locations=[];
for (i=1;i<=1000;i++){
   locations.push({ longitude : 28.2211, latitude : 128.2828 })
}
devices=[];
for (i=1;i<=10;i++){
   devices.push( {
       name:"iphone6",
       last_ping: ISODate(),
       version: 8.1 ,
       security_pass: true,
       last_10_locations: locations.slice(10,20)
   })
}
x={
   _id : ObjectId(),
   first_name: "David",
   last_name:     "Murphy",
   birthdate:     "Aug 16 2080",
   address :     "123 nowhere drive Nonya, TX, USA , 78701",
   phone_number1:     "512-555-5555",
   phone_number2:    "512-555-5556",
   known_locations: locations,
   laat_checkin : ISODate(),
   devices : devices
}
>Object.bsonsize(x)
54879

But now, instead of optimizing field names, we want to optimize the values:

locations=[];
for (i=1;i<=1000;i++){
   locations.push({ longitude : 28.2211, latitude : 128.2828 })
}
devices=[];
for (i=1;i<=10;i++){
   devices.push( {
       name:"iphone6",
       last_ping: ISODate(),
       version: 8.1 ,
       security_pass: true,
       last_10_locations: locations.slice(10,20)
   })
}
z={
   _id : ObjectId(),
   first_name: "David",
   last_name:     "Murphy",
   birthdate:     ISODate("2080-08-16T00:00:00Z"),
   address :     "123 nowhere drive Nonya, TX, USA , 78701",
   phone_number1:    5125555555,
   phone_number2:    5125555556,
   known_locations: locations,
   last_checkin : ISODate(),
   devices : devices
}
>Object.bsonsize(z)
54853

In this example, we changed phone numbers to integers and used the “Date Type” for dates (as already done in the devices document). The savings were much smaller than earlier, coming in at only 26 bytes, but this could have a significant impact when multiplied out to many fields and documents. If we had started this example quoting the floats as many people do, we would see more of a difference. But always watch out for numbers and dates shown as strings: these almost always waste space.

When you combine both sets of savings  you have:

54853- 26 - 41392 = 13435

That’s right: 24.5% smaller memory size on the network and for the application to parse with its CPU! Easy wins to reduce your resource needs, and to make the COO happier.

Mar
24
2016
--

The top 8 startups from Y Combinator Winter ’16 Demo Day 2

ycombinator-bestof2 Supersonic planes, food harvesting robots and instant HIV diagnosis were some of the ideas that wowed us on Demo Day 2 for Y Combinator’s Winter 2016 batch. We queried investors and TechCrunch’s writers to come up with our top 8 picks from all 59 startups that presented. Check out our coverage of the 60 YC startups from Tuesday, plus our 7 favorites. Additional reporting by Megan… Read More

Mar
24
2016
--

Want to be a superhero? Join the Database Performance Team!

database performance team

database performance team

Admit it, you’ve always wanted to fight danger, win battles and save the day! Who doesn’t? Do you want to be a superhero? Percona can show you how!

We don’t have any radioactive spiders or billionaire gadgets, but we do have our own team of superheroes dedicated to protecting your database performance: The Database Performance Team!

The Database Performance Team is comprised of our services experts, who work tirelessly every day to guarantee the performance of your database. Percona’s database services are some of our most valuable customer resources – besides the software itself. Whether it’s support, consulting, technical account managers, or remote DBAs, our support team is made up of superheroes that make sure your database is running at peak performance.

We want you to join us in the fight against poor performance. Join our Database Performance Team crew as part of the Database Street Team!

We’ll be introducing the members of our super group in the coming weeks. As we introduce the Database Performance Team (the “characters” below), we want you! We’ll be offering “missions” for you to complete: challenges, puzzles, or actions that get you prizes for success!

Your first mission: guess the identities of our secret team before we reveal them!

Mystery Character 1

Mystery Character 2

Mystery Character 3

Mystery Character 4

Mystery
Character 5

Mystery-1 Mystery-2 Mystery-3 Mystery-4 Mystery-5
Hint: Hint: Hint: Hint: Hint:
Funny, friendly, quick-witted, supporting, fast and courteous – but still able to get the job done with amazing competence. Computer-like smarts, instant recall, a counselor, able to understand a problem and the solution quickly. Technical, with clairvoyant foresight, with the knowledge and statistics to account for all issues, manages problems before they happen. Remotely all-seeing, a director, good at multi-tasking, adapts-on-the-fly, cool in a crisis. Insanely strong, can’t be stopped, hard to knock down, the product of rigorous testing, unlimited endurance.
Who am I? Who am I? Who am I? Who am I? Who am I?

Follow @Percona on Twitter and use the hashtag #DatabasePerformanceTeam to cast your guess on who any mystery character is. Correctly guess any of their names or roles, and the lucky winner gets their choice of our mystery T-shirt in either men’s or women’s style.

Stay tuned, as we reveal the identities of the Database Performance Team over the coming weeks! Respond with your guess in the comments below.

Join the ranks of the Database Street Team! Fun games, cool prizes – more info is coming soon!

Some facts:*

Gartner has estimated the average cost of downtime at $5,000 per minute!

Join The Database Performance Team today!

 

 

*Source: Global Cloud-based Database Market 2014-2018

Mar
24
2016
--

Newcomer Galactic Exchange can spin up a Hadoop cluster in five minutes

Business running with computer to illustrate speed. A new company with a cool name, Galactic Exchange, came out of stealth today with a great idea. It claims it can spin up a Hadoop cluster for you in five minutes, ready to go. That’s no small feat if it works as advertised and greatly simplifies what has traditionally been a process wrought with complexity.
The new product called ClusterGX is being released in Beta this week… Read More

Mar
24
2016
--

Verdigris takes $9M to power its AI energy consumption analytics b2b startup

Verdigris We hear a lot about the Internet of Things on the consumer side. The oft trotted out example of the ‘smart’ refrigerator that tells consumers when they’ve run out of the milk, and so on. But more serious potential for IoT — and potentially seriously big wins — are likely to be on the enterprise side where connected sensors can be deployed to automate at scale. Read More

Mar
23
2016
--

Here are the 59 startups that demoed at Y Combinator Winter ’16 Demo Day 2

ycombinator “Food, housing, healthcare, transportation. Life essentials made better and more affordable.” These are the types of startups that partner Paul Buchheit said were demoing today at Y Combinator’s Winter 2016 Demo Day 2. Yesterday, we covered the first 60 startups from the batch, and picked our 7 favorites. Plus, check out our picks for the top 8 startups from these 59.… Read More

Mar
23
2016
--

Percona Live featured talk with Stewart Smith: Why Would I Run MySQL or MariaDB on POWER Anyway?

Perconal Live Talks Stewart Smith

Percona Live featured talk with Stewart SmithWelcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus! This blog is Percona Live featured talk with Stewart Smith.

In this installment, we’ll meet Stewart Smith, OPAL Architect at IBM. His talk is Why Would I Run MySQL or MariaDB on POWER Anyway? and will present the technical reasons why you might want to consider the POWER architecture for MySQL/MariaDB production workloads.

I got a chance to discuss Stewart’s talk with him:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Stewart: While at the AUUG (Australian Unix Users Group) Conference in Melbourne way back in 2004, I mentioned to Arjen Lentz (MySQL employee #25) that I was starting to look for my next adventure. Shortly after, at lunch time, instead of going to eat lunch Arjen hands me his cell phone and on the other end of the line was Brian Aker, then Director of Architecture at MySQL AB. That that was my first interview.

So in 2004, I started working on MySQL Cluster (NDB). Four years later, having encountered pretty much all of the warts in the MySQL Server while working on MySQL Cluster, I became the second person to commit code to Drizzle – a fork of MySQL where we tried to rethink everything and reshape the code base to be a) modern, efficient and scalable on modern multi-core machines, and b) designed for large scale web applications. Luckily – instead of firing us all for forking the product in secret – Sun moved us to the CTO group and we got to work on Drizzle full time.

From 2011 to 2014 I was Director of Server Development at Percona where I continued my focus on iterative change and automated QA.

All of these were amazing learning experiences and I’m really proud of what we achieved during these times, the impact of which is still being felt in the database world.

Ultimately, though, it was time for a change – and in my role at IBM as OPAL Architect, I work on the OpenPower Abstraction Layer (OPAL), the completely Open Source firmware for OpenPOWER systems. Of course, at some point, somebody discovered I knew something about MySQL, and I managed to (as a side project) port MySQL to POWER and get a world record of 1 million queries per second. This led to MariaDB being officially supported on POWER8 processors and IBM investing more time in the reliability and performance of MySQL on POWER.

So while my day job is no longer database internals, I stick my head in occasionally as there’s still some part of me that enjoys it.

Percona: Your talk is going to be on “Why would I run MySQL or MariaDB on POWER anyway?” So does that mean you’re firmly on the side of increasing HW power before optimization to achieve performance? If so, why? And for what workload types?

Stewart: We’ve always been scaling both up and out with MySQL. It used to be that scaling up was going from a single processor with a single core to two processors. Now, a cell phone with fewer than four cores is low end.

Of course, POWER CPUs have more benefits than just more cores and threads. There’s a long history of building POWER CPUs to be reliable, with an ethos of *never* acting on bad data, so there’s lots of error checking throughout the CPU itself in addition to ECC memory.

So while POWER can bring raw computing performance to the table, it can also bring reliability and our density with virtual machines can be really interesting.

Percona: Virtualization, SDN, cloud deployments – all of these use distributed resources. How does this affect the use of MySQL/MariaDB on POWER? And how can these types of setups affect application performance – positively and negatively?

Stewart: We’re lucky on POWER in that our hardware has been designed to last a great many years with the idea of partitioning it out to multiple operating systems.

The big advantage for cloud deployments is isolation between tenants, and if we can do this with minimal or zero performance impact, that’s a win for everyone. A challenge to cloud environments is always IO. Databases love lots of low latency IOPs and too often, adding virtualization adds latency, reducing density (tenants per physical machine).

Percona: What do you see as an issue that we the open source database community > needs to be on top of with regard white box development? What keeps you up at night with regard to the future of white box deployments?

Stewart: I think there’s a few big challenges to address for today’s hardware, the main one being scaling to the number of CPU cores/threads we have now as well as to the number of IOPs we have now. These are, however, not new problems – they’re ones that MySQL and InnoDB have been struggling with for over a decade.

Other open source databases (e.g., MongoDB) have re-learned the lesson the hard way: big global locks don’t scale. With storage backends coming to Mongo such as TokuDB and WiredTiger, it has the opportunity to
become an interesting player.

Non-volatile memory has the opportunity to change things more than cheap, high-performance SSDs have. When the unit of persistence is a byte, and not a 512/4096-byte block or multi-kilobyte/megabyte erase
block, things get *different*. Engines such as MyRocks may fare a lot better than more traditional engines like InnoDB – but more than likely there are new designs yet to exist.

I think the biggest challenge is going to be creating a vibrant and innovative development community around an SQL front-end – an open source project (rather than an open source product) where new ideas and experimentation can flourish.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Stewart: I’m really looking forward to some of the internals talks. I’m a deeply technical person and I want the deep dive into the interesting details of how things work. Of special interest are the MyRocks and TokuDB sessions, as well as how InnoDB is evolving.

I’ll likely poke my head into several sessions around managing MySQL deployments in order to keep up to date on how people are deploying and using relational databases today.

These days, I’m the old gray-beard of the MySQL world inside IBM, where I try and keep to mostly being advisory while focusing on my primary role which is open source firmware for POWER.

Also, over the years working on databases, I’ve made a great many friends who I don’t get to see often enough. I’m really looking forward to catching up with friends and former colleagues – and this is one of the
things I treasure about this community.

You can read more about POWER and Stewart’s thoughts at his personal blog.

To see Stewart’s talk, register for Percona Live Data Performance Conference 2016. Use the code “FeaturedTalk” and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com