Jun
22
2018
--

This Week in Data with Colin Charles 43: Polyglots, Security and DataOps.Barcelona

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

This is a short working week for me due to a family emergency. It caused me to skip speaking at DataOps.Barcelona and miss hanging out with the awesome of speakers and attendees. This is the first time I’ve missed a scheduled talk, and I received many messages about my absence. I am sure we will all meet again soon.

One of the talks I was planning to give at DataOps.Barcelona will be available as a Percona webinar next week: Securing Your Database Servers from External Attacks on Thursday, June 28, 2018, at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4). I am also giving a MariaDB 10.3 overview on Tuesday, June 26, 2018, at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4). I will “virtually” see you there.

If you haven’t already read Werner Vogel’s post A one size fits all database doesn’t fit anyone, I highly recommend it. It is true there is no “one size fits all” solution when it comes to databases. This is why Percona has made “the polyglot world” a theme. It’s why Amazon offers different database flavors: relational (Aurora for MySQL/PostgreSQL, RDS for MySQL/PostgreSQL/MariaDB Server), key-value (DynamoDB), document (DynamoDB), graph (Neptune), in-memory (ElastiCache for Redis & Memcached), search (Elasticsearch service). The article has a plethora of use cases, from AirBnB using Aurora, to Snapchat Stories and Tinder using DynamoDB, to Thomson Reuters using Neptune, down to McDonald’s using ElastiCache and Expedia using Elasticsearch. This kind of detail, and customer use case, is great.

There are plenty more stories and anecdotes in the post, and it validates why Percona is focused not just on MySQL, but also MariaDB, MongoDB, PostgreSQL and polyglot solutions. From a MySQL lens, it’s also worth noting that not one storage engine fits every use case. Facebook famously migrated a lot of their workload from InnoDB to MyRocks, and it is exciting to see Mark Callaghan stating that there are already three big workloads on MyRocks in production, with another two coming soon.

Releases

  • MariaDB 10.1.34 – including fixes for InnoDB defragmentation and full text search (MDEV-15824). This was from the WebScaleSQL tree, ported by KakaoTalk to MariaDB Server.
  • Percona XtraDB Cluster 5.6.40-26.25 – now with Percona Server for MySQL 5.6.40, including a new variable to configure rolling schema upgrade (RSU) wait for active commit connection timeouts.
  • Are you using the MariaDB Connector/C, Connector/J or Connector/ODBC? A slew of updates abound.

Link List

Industry Updates

Upcoming appearances

  • OSCON – Portland, Oregon, USA – July 16-19 2018

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 43: Polyglots, Security and DataOps.Barcelona appeared first on Percona Database Performance Blog.

Jun
20
2018
--

Is Serverless Just a New Word for Cloud Based?

serverless architecture

serverless architectureServerless is a new buzzword in the database industry. Even though it gets tossed around often, there is some confusion about what it really means and how it really works. Serverless architectures rely on third-party Backend as a Service (BaaS) services. They can also include custom code that is run in managed, ephemeral containers on a Functions as a Service (FaaS) platform. In comparison to traditional Platform as a Service (PaaS) server architecture, where you pay a predetermined sum for your instances, serverless applications benefit from reduced costs of operations and lower complexity. They are also considered to be more agile, allowing for reduced engineering efforts.

In reality, there are still servers in a serverless architecture: they are just being used, managed, and maintained outside of the application. But isn’t that a lot like what cloud providers, such as Amazon RDS, Google Cloud, and Microsoft Azure, are already offering? Well, yes, but with several caveats.

When you use any of the aforementioned platforms, you still need to provision the types of instances that you plan to use and define how those platforms will act. For example, will it run MySQL, MongoDB, PostgreSQL, or some other tool? With serverless, these decisions are no longer needed. Instead, you simply consume resources from a shared resource pool, using whatever application suits your needs at that time. In addition, in a serverless world, you are only charged for the time that you use the server instead of being charged whether you use it a lot or a little (or not at all).

Remember When You Joined That Gym?

How many of us have purchased a gym membership at some point in our life? Oftentimes, you walk in with the best of intentions and happily enroll in a monthly plan. “For only $29.95 per month, you can use all of the resources of the gym as much as you want.” But, many of us have purchased such a membership and found that our visits to the gym dwindle over time, leaving us paying the same monthly fee for less usage.

Traditional Database as a Service (DBaaS) offerings are similar to your gym membership: you sign up, select your service options, and start using them right away. There are certainly cases of companies using those services consistently, just like there are gym members who show up faithfully month after month. But there are also companies who spin up database instances for a specific purpose, use the database instance for some amount of time, and then slowly find that they are accessing that instance less and less. However, the fees for the instance, much like the fees for your gym membership, keep getting charged.

What if we had a “pay as you go” gym plan? Well, some of those certainly exist. Serverless architecture is somewhat like this plan: you only pay for the resources when you use them, and you only pay for your specific usage. This would be like charging $5 for access to the weight room and $3 for access to the swimming pool, each time you use one or the other. The one big difference with serverless architecture for databases is that you still need to have your data stored somewhere in the environment and made available to you as needed. This would be like renting a gym locker to store your workout gear so that didn’t have to bring it back and forth each time you visited.

Obviously, you will pay for that storage, whether it is your data or your workout gear, but the storage fees are going to be less than your standard membership. The big advantage is that you have what you need when you need it, and you can access the necessary resources to use whatever you are storing.

With a serverless architecture, you store your data securely on low cost storage devices and access as needed. The resources required to process that data are available on an on demand basis. So, your charges are likely to be lower since you are paying a low fee for data storage and a usage fee on resources. This can work great for companies that do not need 24x7x365 access to their data since they are only paying for the services when they are using them. It’s also ideal for developers, who may find that they spend far more time working on their application code than testing it against the database. Instead of paying for the database resources while the data is just sitting there doing nothing, you now pay to store the data and incur the database associated fees at use time.

Benefits and Risks of Going Serverless

One of the biggest possible benefits of going with a serverless architecture is that you save money and hassle. Money can be saved since you only pay for the resources when you use them. Hassle is reduced since you don’t need to worry about the hardware on which your application runs. These can be big wins for a company, but you need to be aware of some pitfalls.

First, serverless can save you money, but there is no guarantee that it will save you money.

Consider 2 different people who have the exact same cell phone – maybe it’s your dad and your teenage daughter. These 2 users probably have very different patterns of usage: your dad uses the phone sporadically (if at all!) and your teenage daughter seems to have her phone physically attached to her. These 2 people would benefit from different service plans with their provider. For your dad, a basic plan that allows some usage (similar to the base cost of storage in our serverless database) with charges for usage above that cap would probably suffice. However, such a plan for your teenage daughter would probably spiral out of control and incur very high usage fees. For her, an unlimited plan makes sense. What is a great fit for one user is a poor fit for another, and the same is true when comparing serverless and DBaaS options.

The good news is that serverless architectures and DBaaS options, like Amazon RDS, Microsoft Azure, and Google Cloud, reduce a lot of the hassle of owning and managing servers. You no longer need to be concerned about Mean Time Between Failures, power and cooling issues, or many of the other headaches that come with maintaining your hardware. However, this can also have a negative consequence.

The challenge of enforced updates

About the only thing that is consistent about software in today’s world is that it is constantly changing. New versions are released with new features that may or may not be important to you. When a serverless provider decides to implement a new version or patch of their backend, there may be some downstream issues for you to manage. It is always important to test any new updates, but now some of the decisions about how and when to upgrade may be out of your control. Proper notification from the provider gives you a window of time for testing, but they are probably going to flip the switch regardless of whether or not you have completed all of your test cycles. This is true of both serverless and DBaaS options.

A risk of vendor lock-in

A common mantra in the software world is that we want to avoid vendor lock-in. Of course, from the provider’s side, they want to avoid customer churn, so we often find ourselves on opposite sides of the same issue. Moving to a new platform or provider becomes more complex as you cede more aspects of server management to the host. This means that serverless can cause deep lock-in since your application is designed to work with the environment as your provider has configured it. If you choose to move to a different provider, you need to extract your application and your data from the current provider and probably need to rework it to fit the requirements of the new provider.

The challenge of client-side optimization

Another consideration is that optimizations of server-side configurations must necessarily be more generic compared to those you might make to self-hosted servers. Optimization can no longer be done at the server level for your specific application and use; instead, you now rely on a smarter client to perform your necessary optimizations. This requires a skill set that may not exist with some developers: the ability to tune applications client-side.

Conclusion

Serverless is not going away. In fact, it is likely to grow as people come to a better understanding and comfort level with it. You need to be able to make an informed decision regarding whether serverless is right for you. Careful consideration of the pros and cons is imperative for making a solid determination. Understanding your usage patterns, user expectations, development capabilities, and a lot more will help to guide that decision.

In a future post, I’ll review the architectural differences between on-premises, PaaS, DBaaS and serverless database environments.

 

The post Is Serverless Just a New Word for Cloud Based? appeared first on Percona Database Performance Blog.

Jun
15
2018
--

Tuning PostgreSQL for sysbench-tpcc

PostgreSQL benchmark

tuning PostgreSQL performance for sysbench-tpccPercona has a long tradition of performance investigation and benchmarking. Peter Zaitsev, CEO and Vadim Tkachenko, CTO, led their crew into a series of experiments with MySQL in this space. The discussion that always follows on the results achieved is well known and praised even by the PostgreSQL community. So when Avi joined the team and settled at Percona just enough to get acquainted with my colleagues, sure enough one of the first questions they asked him was: “did you know sysbench-tpcc also works with PostgreSQL now ?!“. 

sysbench

sysbench is “a scriptable multi-threaded benchmark tool based on LuaJIT (…) most frequently used for database benchmarks“, created and maintained by Alexey Kopytov. It’s been around for a long time now and has been a main reference for MySQL benchmarking since its inception. One of the favorites of Netflix’ Brendan Gregg, we now know. You may remember Sveta Smirnova and Alexander Korotkov’s report on their experiments in Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads here. In fact, that post may serve as a nice prelude for the tests we want to show you today. It provides a good starting point as a MySQL vs PostgreSQL performance comparison.

The idea behind Sveta and Alexander’s experiments was “to provide an honest comparison for the two popular RDBMSs“, MySQL and PostgreSQL, using “the same tool, under the same challenging workloads and using the same configuration parameters (where possible)“. Since neither pgbench nor sysbench would work effectively with MySQL and PostgreSQL for both writes and reads they attempted to port pgbench‘s workload as a sysbench benchmark. 

sysbench-tpcc

More recently, Vadim came up with an implementation of the famous TPC-C workload benchmark for sysbench, sysbench-tpcc. He has since published a series of tests using Percona Server and MySQL, and worked to make it compatible with PostgreSQL too. For real now, hence the request that awaited us.

Our goal this time was less ambitious than Sveta and Alexander’s. We wanted to show you how we setup PostgreSQL to perform optimally for sysbench-tpcc, highlighting the settings we tuned the most to accomplish this. We ran our tests on the same box used by Vadim in his recent experiments with Percona Server for MySQL and MySQL.

A valid benchmark – benchmark rules

Before we present our results we shall note there are several ways to speed up database performance. You may for example disable full_page_writes, which would make a server crash unsafe, and use a minimalistic wal_level mode, which would block replication capability. These would speed things up but at the expense of reliability, making the server inappropriate for production usage.

For our benchmarks, we made sure we had all the necessary parameters in place to satisfy the following:

  1. ACID Compliance
  2. Point-in-time-recovery
  3. WALs usable by Replica/Slave for Replication
  4. Crash Recovery
  5. Frequent Checkpointing to reduce time for Crash Recovery
  6. Autovacuum

When we initially prepared sysbench-tpcc with PostgreSQL 10.3 the database size was 118 GB. By the time we completed the test, i.e. after 36000 seconds, the DB size had grown up to 335 GB. We have a total of “only” 256 GB of memory available in this server, however, based on the observations from pg_stat_database, pg_statio_user_tables and pg_statio_user_indexes 99.7% of the blocks were always in-memory:

postgres=# select ((blks_hit)*100.00)/(blks_hit+blks_read) AS “perc_mem_hit” from pg_stat_database where datname like ‘sbtest’;
   perc_mem_hit
---------------------
99.7267224322546
(1 row)

Hence, we consider it to be an in-memory workload with the whole active data set in RAM. In this post we explain how we tuned our PostgreSQL Instance for an in-memory workload, as was the case here.

Preparing the database before running sysbench

In order to run a sysbench-tpcc, we must first prepare the database to load some data. In our case, as mentioned above, this initial step resulted in a 118 GB database:

postgres=# select datname, pg_size_pretty(pg_database_size(datname)) as "DB_Size" from pg_stat_database where datname = 'sbtest';
 datname | DB_Size
---------+---------
 sbtest  | 118 GB
(1 row)

This may change depending on the arguments used. Here is the actual command we used to prepare the PostgreSQL Database for sysbench-tpcc:

$ ./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=120 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --db-driver=pgsql prepare

While we were loading the data, we wanted to see if we could speed-up the process. Here’s the customized PostgreSQL settings we used, some of them directly targeted to accelerate the data load:

shared_buffers = 192GB
maintenance_work_mem = '20GB'
wal_level = 'minimal'
autovacuum = 'OFF'
wal_compression = 'ON'
max_wal_size = '20GB'
checkpoint_timeout = '1h'
checkpoint_completion_target = '0.9'
random_page_cost = 1
max_wal_senders = 0
full_page_writes = ON
synchronous_commit = ON

We’ll discuss most of these parameters in the sections that follow, but we would like to highlight two of them here. We increased maintenance_work_mem to speed  up index creation and max_wal_size to delay checkpointing further, but not too much — this is a write-intensive phase after all. Using these parameters it took us 33 minutes to complete the prepare stage compared with 55 minutes when using the default parameters. 

If you are not concerned about crash recovery or ACID, you could turn off full_page_writes, fsync and synchrnous_commit. That would speed up the data load much more. 

Running a manual VACUUM ANALYZE after sysbench-tpcc’s initial prepare stage

Once we had prepared the database, as it is a newly created DB Instance, we ran a manual VACUUM ANALYZE on the database (in parallel jobs) using the command below. We employed all the 56 vCPUs available in the server since there was nothing else running in the machine:

$ /usr/lib/postgresql/10/bin/vacuumdb -j 56 -d sbtest -z

Having run a vacuum for the entire database we restarted PostgreSQL and cleared the OS cache before executing the benchmark in “run” mode. We repeated this process after each round.

First attempt with sysbench-tpcc

When we ran sysbench-tpcc for the first time, we observed a resulting TPS of 1978.48 for PostgreSQL with the server not properly tuned, running with default settings. We used the following command to run sysbench-tpcc for PostgreSQL for 10 hours (or 36000 seconds) for all rounds:

./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=36000 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --pgsql-password=oracle --db-driver=pgsql run

PostgreSQL performance tuning of parameters for sysbench-tpcc (crash safe)

After getting an initial idea of how PostgreSQL performed with the default settings and the actual demands of the sysbench-tpcc workload, we began making progressive adjustments in the settings, observing how they impacted the server’s performance. After several rounds we came up with the following list of parameters (all of these satisfy ACID properties):

shared_buffers = '192GB'
work_mem = '4MB'
random_page_cost = '1'
maintenance_work_mem = '2GB'
wal_level = 'replica'
max_wal_senders = '3'
synchronous_commit = 'on'
seq_page_cost = '1'
max_wal_size = '100GB'
checkpoint_timeout = '1h'
synchronous_commit = 'on'
checkpoint_completion_target = '0.9'
autovacuum_vacuum_scale_factor = '0.4'
effective_cache_size = '200GB'
min_wal_size = '1GB'
bgwriter_lru_maxpages = '1000'
bgwriter_lru_multiplier = '10.0'
logging_collector = 'ON'
wal_compression = 'ON'
log_checkpoints = 'ON'
archive_mode = 'ON'
full_page_writes = 'ON'
fsync = 'ON'

Let’s discuss our reasoning behind the tuning of the most important settings:

shared_buffers

Defines the amount of memory PostgreSQL uses for shared memory buffers. It’s arguably its most important setting, often compared (for better or worse) to MySQL’s innodb_buffer_pool_size. The biggest difference, if we dare to compare shared_buffers to the Buffer Pool, is that InnoDB bypasses the OS cache to directly access (read and write) data in the underlying storage subsystem whereas PostgreSQL do not.

Does this mean PostgreSQL does “double caching” by first loading data from disk into the OS cache to then make a copy of these pages into the shared_buffers area? Yes.

Does this “double caching” makes PostgreSQL inferior to InnoDB and MySQL in terms of memory management? No. We’ll discuss why that is the case in a follow up blog post. For now it suffice to say the actual performance depends on the workload (mix of reads and writes), the size of the “hot data” (the portion of the dataset that is most accessed and modified) and how often checkpointing takes place.

How we chose the setting for shared_buffers to optimize PostgreSQL performance

Due to these factors, the documented suggested formula of setting shared_buffers to 25% of RAM or the magic number of “8GB” is hardly ideal. What seems to be good reasoning, though, is this:

  • If you can fit the whole of your “hot data” in memory, then dedicating most of your memory to shared_buffers pays off nicely, making PostgreSQL behave as close to an in-memory database as possible.
  • If the size of your “hot data” surpasses the amount of memory you have available in the server, then you’re probably better off working with a much smaller shared_buffers area and relying more on the OS cache.

For this benchmark, considering the options we used, we found that dedicating 75% of all the available memory to shared_buffers is ideal. It is enough to fit the entire “hot data” and still leave sufficient memory for the OS to operate, handle connections and everything else.

work_mem

This setting defines the amount of memory that can be used by each query (not session) for internal sort operations (such as ORDER BY and DISTINCT), and hash tables (such as when doing hash-based aggregation). Beyond this, PostgreSQL moves the data into temporary disk files. The challenge is usually finding a good balance here. We want to avoid the use of temporary disk files, which slow down query completion and in turn may cause contention. But we don’t want to over-commit memory, which could even lead to OOM; working with high values for work_mem may be destructive when it is not really needed.

We analyzed the workload produced by sysbench-tpcc and found with some surprise that work_mem doesn’t play a role here, considering the queries that were executed. So we kept the default value of 4MB. Please note that this is seldom the case in production workloads, so it is important to always keep an eye on that parameter.

random_page_cost

This setting stipulates the cost that a non-sequentially-fetched disk page would have, and directly affects the query planner’s decisions. Going with a conservative value is particularly important when using high latency storage, such as spinning disks. This wasn’t our case, hence we could afford to equalize random_page_cost to seq_page_cost. So, we set this parameter to 1 as well, down from the default value of 4.

wal_level, max_wal_senders and archive_mode

To set up streaming replication wal_level needs to be set to at least “replica” and archive_mode must be enabled. This means the amount of WAL data produced increases significantly compared to when using default settings for these parameters, which in turn impacts IO. However, we considered these with a production environment in mind.

wal_compression

For this workload, we observed total WALs produced of size 3359 GB with wal_compression disabled and 1962 GB with wal_compression. We enabled wal_compression to reduce IO — the amount (and, most importantly, the rate) of WAL files being written to disk — at the expense of some additional CPU cycles. This proved to be very effective in our case as we had a surplus of CPU available.

checkpoint_timeout, checkpoint_completion_target and max_wal_size

We set the checkpoint_timeout to 1 hour and checkpoint_completion_target to 0.9. This means a checkpoint is forced every 1 hour and it has 90% of the time before the next checkpoint to spread the writes. However, a checkpoint is also forced when max_wal_size of WAL’s have been generated. With these parameters for a sysbench-tpcc workload, we saw that there were 3 to 4 checkpoints every 1 hour. This is especially because of the amount of WALs being generated.

In production environments we would always recommend you perform a manual CHECKPOINT before shutting down PostgreSQL in order to allow for a faster restart (recovery) time. In this context, issuing a manual CHECKPOINT took us between 1 and 2 minutes, after which we were able to restart PostgreSQL in just about 4 seconds. Please note that in our testing environment, taking time to restart PostgreSQL was not a concern, so working with this checkpoint rate benefited us. However, if you cannot afford a couple of minutes for crash recovery it is always suggested to force checkpointing to take place more often, even at the cost of some degraded performance.

full_page_writes, fsync and synchronous_commit

We set all of these parameters to ON to satisfy ACID properties.

autovacuum

We enabled autovacuum and other vacuum settings to ensure vacuum is being performed in the backend. We will discuss the importance of maintaining autovacuum enabled in a production environment, as well as the danger of doing otherwise, in a separate post. 

Amount of WAL’s (Transaction Logs) generated after 10 hours of sysbench-tpcc

Before we start to discuss the numbers it is important to highlight that we enabled wal_compression before starting sysbench. As we mentioned above, the amount of WALs generated with wal_compression set to OFF was more than twice the amount of WALs generated when having compression enabled. We observed that enabling wal_compression resulted in an increase in TPS of 21%. No wonder, the production of WALs has an important impact on IO: so much so that it is very common to find PostgreSQL servers with a dedicated storage for WALs only. Thus, it is important to highlight the fact wal_compression may benefit write-intensive workloads by sparing IO at the expense of additional CPU usage.

To find out the total amount of WALs generated after 10 Hours, we took note at the WAL offset from before we started the test and after the test completed:

WAL Offset before starting the sysbench-tpcc ? 2C/860000D0
WAL Offset after 10 hours of sysbench-tpcc   ? 217/14A49C50

and subtracted one from the other using pg_wal_lsn_diff, as follows:

postgres=# SELECT pg_size_pretty(pg_wal_lsn_diff('217/14A49C50','2C/860000D0'));
pg_size_pretty
----------------
1962 GB
(1 row)

1962 GB of WALs is a fairly big amount of transaction logs produced over 10 hours, considering we had enabled wal_compression .

We contemplated making use of a separate disk to store WALs to find out by how much more a dedicated storage for transaction logs would benefit overall performance. However, we wanted to keep using the same hardware Vadim had used for his previous tests, so decided against this.

Crash unsafe parameters

Setting full_page_writes, fsync and synchronous_commit to OFF may speed up the performance but it is always crash unsafe unless we have enough backup in place to consider these needs. For example, if you are using a COW FileSystem with Journaling, you may be fine with full_page_writes set to OFF. This may not be true 100% of the time though.

However, we still want to share the results with the crash unsafe parameters mentioned in the paragraph above as a reference.

Results after 10 Hours of sysbench-tpcc for PostgreSQL with default, crash safe and crash unsafe parameters

Here are the final numbers we obtained after running sysbench-tpcc for 10 hours considering each of the scenarios above:

Parameters

TPS

Default / Untuned

1978.48

Tuned (crash safe)

5736.66

Tuned (crash unsafe)

7881.72

Did we expect to get these numbers? Yes and no.

Certainly we expected a properly tuned server would outperform one running with default settings considerably but we can’t say we expected it to be almost three times better (2.899). With PostgreSQL making use of the OS cache it is not always the case that tuning shared_buffers in particular will make such a dramatic difference. By comparison, tuning MySQL’s InnoDB Buffer Pool almost always makes a difference. For PostgreSQL high performance it depends on the workload. In this case for sysbench-tpcc benchmarks, tuning shared_buffers definitely makes a difference.

On the other hand experiencing an additional order of magnitude faster (4x), when using crash unsafe settings, was not much of a surprise.

Here’s an alternative view of the results of our PostgreSQL insert performance tuning benchmarks:

Sysbench-TPCC with PostgreSQL

What did you think about this experiment? Please let us know in the comments section below and let’s get the conversation going.

Hardware spec

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: SAMSUNG  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Ubuntu 16.04.4, kernel 4.13.0-36-generic
  • PostgreSQL: version 10.3
  • sysbench-tpcc: https://github.com/Percona-Lab/sysbench-tpcc

The post Tuning PostgreSQL for sysbench-tpcc appeared first on Percona Database Performance Blog.

Jun
01
2018
--

This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Shortly after the last dispatch, I jetted off for a spot of vacation (which really meant I was checking out the hype behind Blockchain with a database developer lens at the Blockchain Week NYC), and then some customer visits in Seoul, which explains the short hiatus. Here’s to making this more regular as the summer approaches.

I am about to embark on a fairly long trip, covering a few upcoming appearances: Lisbon for the Percona Engineering meeting, SouthEastLinuxFest in Charlotte, the Open Source Data Centre Conference in Berlin and then the DataOps Barcelona event. I have some discount codes: 50% discount for OSDC with the code OSDC_FOR_FRIENDS, and 50% discount for DataOps Barcelona with the code dataopsbcn50. Expect this column to reflect my travels over the next few weeks.

There has been a lot of news on the MariaDB front: MariaDB 10.3.7 went stable/GA! You might have noticed more fanfare around the release name MariaDB TX 3.0, but the reality is you can still get this download from your usual MariaDB Foundation site. It is worth noting that the MariaDB Foundation 2017 financials have also been released. Some may have noticed a couple months back there was a press release titled Report “State of the Open-Source DBMS Market, 2018” by Gartner Includes Pricing Comparison With MariaDB. This led to a Gartner report on the State of the Open-Source DBMS Market, 2018; although the report has since been pulled. Hopefully we see it surface again.

In the meantime, please do try out MariaDB 10.3.7 and it would be great to hear feedback. I also have an upcoming Percona webinar on MariaDB Server 10.3 on June 26 2018 — when the sign up link appears, I will be sure to include it here.

Well written, and something worth discussing: Should Red Hat Buy or Build a Database?. The Twitter discussion is also worth looking at.

Releases

Link List

Upcoming appearances

Feedback

I look forward to receiving feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

The post This Week in Data with Colin Charles 40: a Peak at Blockchain, Lots of MariaDB News, then Back on the Road appeared first on Percona Database Performance Blog.

May
04
2018
--

Percona Live 2018 Community Report

So, after a whirlwind few days, Percona Live 2018 has been and gone. There was a great energy about the conference, and it was fantastic to meet so many open source database enthusiasts and supporters. A few things that I experienced:

  • Your great willingness to share knowledge. It was a fantastic place to learn for those who have experience from a different field of technology. Almost everyone seemed to be very open and generous with their time.
  • The “superstars” from our industry are not so scary. They are as willing to be open and generous with their experience and views as any of the other attendees, and equally as interested in making new discoveries.
  • There aren’t many times you can sit down to a (community) dinner, to share food and anecdotes with people from USA, UK, Germany and Armenia at the same time. I thoroughly enjoyed the company, and wish there were more opportunities for similar encounters. Thanks to Pythian for setting that up.
  • My Percona colleagues are wonderful, committed human beings with more than a passing interest in music – the Percona Sessions have got to happen…
  • That you can run a very long way in a day between the Santa Clara Convention Center and the Hyatt Regency Hotel.

I had very many positive conversations with delegates. You offered any criticisms along with a suggestion of how we should tweak things for the better. Our community is a creative, generous, problem-solving machine, though I shouldn’t be surprised at that.

So, with only a few more duties to complete, I’d like to thank you for your company. For those that did not make it to this year’s event, I hope that you might be persuaded to join us in the future — either at Percona Live Europe 2018 or at Percona Live 2019.

Packt Prizes

Our media sponsor, Packt, generously provided us with three free ebooks and two free instruction videos as prizes for delegates:

  1. Mastering MongoDB 3.x
  2. MySQL 8 Cookbook
  3. MongoDB Administrator’s Guide
  4. Elastic Databases and Data Processing with AWS [Video]
  5. AWS Administration – Database, Networking, and Beyond [Video]

There are another 10 titles for which we can offer delegates a 50% discount: you should have received your emails. Thanks are due again to Packt.

Community Blog

While I have your attention, I’d like to let you know about the forthcoming Percona community blog. Having been some time in the planning, this is starting really soon, and is like a year-round, online, Percona Live. We already have some keen writers for this, but if you would be interested in creating content (whether written, podcast or webcast) for the community blog, then please get in touch. The brief is very wide — as long as your submission is relevant to the open source database community then it would be welcome.

Finally, I would like to invite feedback on how to make the event shine even brighter — please drop me an email if you have suggestions or ideas. Meanwhile, I hope you enjoy these photographs of the MySQL Community Awards Winners, presented at PL18. You can read more about this community initiative.

Perhaps you’ll be able to join us in Frankfurt in November? Time to start thinking about those submissions for the call for papers!

Or perhaps next year at Percona Live Open Source Database Conference in 2019 – wherever it may be!







Photographs: Randy Tunnell Photography

The post Percona Live 2018 Community Report appeared first on Percona Database Performance Blog.

Apr
25
2018
--

Google Cloud expands its bet on managed database services

Google announced a number of updates to its cloud-based database services today. For the most part, we’re not talking about any groundbreaking new products here, but all of these updates address specific pain points that enterprises suffer when they move to the cloud.

As Google Director of Product Management Dominic Preuss told me ahead of today’s announcements, Google long saw itself as a thought leader in the database space. For the longest time, though, that thought leadership was all about things like the Bigtable paper and didn’t really manifest itself in the form of products. Projects like the globally distributed Cloud Spanner database are now allowing Google Cloud to put its stamp on this market.

Preuss also noted that many of Google’s enterprise users often start with lifting and shifting their existing workloads to the cloud. Once they have done that, though, they are also looking to launch new applications in the cloud — and at that point, they typically want managed services that free them from having to do the grunt work of managing their own infrastructure.

Today’s announcements mostly fit into this mold of offering enterprises the kind of managed database services they are asking for.

The first of these is the beta launch of Cloud Memorystore for Redis, a fully managed in-memory data store for users who need in-memory caching for capacity buffering and similar use cases.

Google is also launching a new feature for Cloud Bigtable, the company’s NoSQL database service for big data workloads. Bigtable now features regional replication (or at least it will, once this has rolled out to all users within the next week or so). The general idea here is to give enterprises that previously used Cassandra for their on-premises workloads an alternative in the Google Cloud portfolio, and these cross-zone replications increase the availability and durability of the data they store in the service.

With this update, Google is also making Cloud SQL for PostgreSQL generally available with a 99.95 percent SLA, and it’s adding commit timestamps to Cloud Spanner.

What’s next for Google’s database portfolio? Unsurprisingly, Preuss wouldn’t say, but he did note that the company wants to help enterprises move as many of their workloads to the cloud as they can — and for the most part, that means managed services.

Apr
23
2018
--

Check Out the Percona Live 2018 Live Stream!

Percona Live 2018 Live Stream

Announcing the Percona Live 2018 live stream.

This year at Percona Live Open Source Database Conference 2018 we are live streaming the Keynote Talks on Day 1 and 2.

Percona is streaming the keynotes on Tuesday, April 24, 2018, and Wednesday, April 25, 2018 beginning at 9 AM PDT (both days). The keynote speakers include people from VividCortex, Upwork, Oracle, Netflix and many more. The keynote panels feature a cloud discussion and a cool technologies showcase.

Use the live stream link if you don’t want to miss a keynote, but can’t be at the main stage. The link for the live stream is:

The list of keynote talks and speakers for each day is:

Day 1

Day 2

The post Check Out the Percona Live 2018 Live Stream! appeared first on Percona Database Performance Blog.

Apr
12
2018
--

Percona Live 2018 Featured Talk: Containerizing Databases at New Relic (What We Learned) with Joshua Galbraith and Bryant Vinisky

Joshua Bryant Percona Live 2018 New Relic

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Joshua Galbraith, Senior Software Engineer and Bryant Vinisky, Site Reliability Engineer at New Relic. Their session talk is titled Containerizing Databases at New Relic: What We Learned. There are many trade-offs when containerizing databases, and there are many open source support options for database containers such as Kubernetes and Apache Mesos. In our conversation, we discussed what containers can bring to a database environment:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Joshua: My name is Joshua Galbraith, and I’m a Senior Software Engineer and Technical Product Manager on the Database Engineering team at New Relic. I’ve been working with open-source databases for the past ten years. I started my software engineering career by writing scripts to load several years of high-resolution geophysical sensor data from flat files into a Postgres database. I’ve been writing software that interacts with databases and tools to make operating them easier ever since. I’ve run MySQL, MongoDB, ElasticSearch, Cassandra and Redis in a variety of production environments. A little over three years ago, I co-founded a DBaaS startup and built a container-based platform for Apache Cassandra. I’ve been containerizing databases ever since — most recently by working on a project called Megabase at New Relic.

Joshua Bryant Percona Live 2018 New RelicBryant: Hello, my name is Bryant Vinisky. I currently work on the Database Engineering team at New Relic as a Senior Site Reliability Engineer. My professional experience with databases and the open source ecosystem started over eight years ago when I joined the engineering team at NWEA and helped roll out a SaaS platform for delivering adaptive assessments to students over the web. That platform was backed by a number of different database and related technologies — namely PostgreSQL, MongoDB and Redis.

Though in the beginning, my role didn’t formally involve databases, they were clearly a very important part of the greater system. Armed with an unquenchable curiosity, I was never shy about crossing boundaries into DB land. On the development side, much of the tooling and side projects I worked on over the years frequently touched databases for a storage backend. As the assessment platform scaled out to support hundreds of thousands of concurrent users, my role shifted to a reliability focus, often involving pre-release load testing of the major system components as well as doing follow up for problems that occurred on the production system. Much of the work involved dealing with the databases as a scaling bottleneck.

Containers came into the picture for me over two years ago when I moved to New Relic, where they had been using containers for stateless services for years and largely skipped over VMs. It wasn’t long before we started exploring containers as a solution for stateful database services with our Megabase project, an area where I’ve spent a lot of my time since

Your tutorial is titled “Containerizing Databases at New Relic: What We Learned”. How did you decide on containers as your database solution?

Joshua/Bryant: At New Relic, we had already built a Container Fabric for our stateless services. We knew we needed to provide databases to our internal teams in a way that was fast, cost-efficient and repeatable. Using containers would get us most of the way towards those goals, but we didn’t have a proven pattern that we could follow to reach them. We heard about other companies succeeding in building similar solutions, and we knew that we were moving away from virtual machines. Containers seemed to fit the bill.

What are the benefits and drawbacks of containerizing databases?

Joshua/Bryant: Containers are great for packaging and deployment. They allow us to deploy a known version of configuration, code and environment in a deterministic way. Unfortunately, containers are not perfect mechanisms for resource isolation, especially when large amounts of persistent storage are required. The challenge of containerizing databases is to make the right trade-offs between portability and performance, complexity and ease-of-operation given the specific context and goals of your team and organization.

Were specific database software (MySQL, Redis, PostgreSQL) easier to containerize? Why?

Joshua/Bryant: In general, the databases that are easiest to containerize are the databases that manage their own state and make themselves effectively stateless from the point of the scheduler and container orchestration framework. Databases that provide cluster membership, fault-tolerance and data replication are easier to run with a traditional orchestration framework. Databases that do not do these things on their own require additional “sidecar” services and/or application-specific frameworks to operate reliably.

Why should people attend your talk? What do you hope people will take away from it?

Joshua/Bryant: As a team, we’ve learned a lot of lessons about what to do, and not to do, when running stateful applications in containers. We’d like to pass that knowledge on to our audience. We also want to send people away with enough information to make the right decisions about whether or not to containerize the databases they are responsible for, and how to do it in a way that is successful given their own goals and context. At the very least, we hope it will be entertaining — and maybe we’ll learn some things too.

What are you looking forward to at Percona Live (besides your talk)?

Joshua/Bryant: We’re very excited to hear about open-source tools like Vitess and ProxySQL. I’m also looking forward to hearing about the latest monitoring and performance analysis tools. I always love deep-dives into specific database problems faced by a company, and I love talking to people in the expo hall. I always come away from Percona Live events a little more excited about my day-to-day work and the future of open-source databases.

Want to find out more about this Percona Live 2018 featured talk, and containerizing stateful databases? Register for Percona Live 2018, and see Joshua and Bryant’s session talk Containerizing Databases at New Relic: What We Learned. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The post Percona Live 2018 Featured Talk: Containerizing Databases at New Relic (What We Learned) with Joshua Galbraith and Bryant Vinisky appeared first on Percona Database Performance Blog.

Apr
11
2018
--

Calling All Polyglots: Percona Live 2018 Keynote Schedule Now Available!

Percona Live 2018 Keynotes

Percona Live 2018 KeynotesWe’ve posted the Percona Live 2018 keynote addresses for the seventh annual Percona Live Open Source Database Conference 2018, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, CA. 

This year’s keynotes explore topics ranging from how cloud and open source database adoption accelerates business growth, to leading-edge emerging technologies, to the importance of MySQL 8.0, to the growing popularity of PostgreSQL.

We’re excited by the great lineup of speakers, including our friends at Alibaba Cloud, Grafana, Microsoft, Oracle, Upwork and VividCortex, the innovative leaders on the Cool Technologies panel, and Brendan Gregg from Netflix, who will discuss how to get the most out of your database on a Linux OS, using his experiences at Netflix to highlight examples.  

With the theme of “Championing Open Source Databases,” the conference will feature multiple tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Once again, Percona will be offering a low-cost database 101 track for beginning users who want to learn how to use and operate open source databases.

The Percona Live 2018 keynotes include:

Tuesday, April 24, 2018

  • Open Source for the Modern Business – Peter Zaitsev of Percona will discuss how open source database adoption continues to grow in enterprise organizations, the expectations and definitions of what constitutes success continue to change. A single technology for everything is no longer an option; welcome to the polyglot world. The talk will include several compelling open source projects and trends of interest to the open source database community and will be followed by a round of lightning talks taking a closer look at some of those projects.
  • Cool Technologies Showcase – Four industry leaders will introduce key emerging industry developments. Andy Pavlo of Carnegie Mellon University will discuss the requirements for enabling autonomous database optimizations. Nikolay Samokhvalov of PostgreSQL.org will discuss new PostgreSQL tools. Sugu Sougoumarane of PlanetScale Data will explore how Vitess became a high-performance, scalable and available MySQL clustering cloud solution in line with today’s NewSQL storage systems. Shuhao Wu of Shopify explains how to use Ghostferry as a data migration tool for incompatible cloud platforms.
  • State of the Dolphin 8.0 – Tomas Ulin of Oracle will discuss the focus, strategy, investments and innovations that are evolving MySQL to power next-generation web, mobile, cloud and embedded applications – and why MySQL 8.0 is the most significant MySQL release in its history.
  • Linux Performance 2018 – Brendan Gregg of Netflix will summarize recent performance features to help users get the most out of their Linux systems, whether they are databases or application servers. Topics include the KPTI patches for Meltdown, eBPF for performance observability, Kyber for disk I/O scheduling, BBR for TCP congestion control, and more.

Wednesday, April 25, 2018

  • Panel Discussion: Database Evolution in the Cloud – An expert panel of industry leaders, including Lixun Peng of Alibaba, Sunil Kamath of Microsoft, and Baron Schwartz of VividCortex, will discuss the rapid changes occurring with databases deployed in the cloud and what that means for the future of databases, management and monitoring and the role of the DBA and developer.
  • Future Perfect: The New Shape of the Data Tier – Baron Schwartz of VividCortex will discuss the impact of macro trends such as cloud computing, microservices, containerization, and serverless applications. He will explore where these trends are headed, touching on topics such as whether we are about to see basic administrative tasks become more automated, the role of open source and free software, and whether databases as we know them today are headed for extinction.
  • MongoDB at Upwork – Scott Simpson of Upwork, the largest freelancing website for connecting clients and freelancers, will discuss how MongoDB is used at Upwork, how the company chose the database, and how Percona helps make the company successful.

We will also present the Percona Live 2018 Community Awards and Lightning Talks on Monday, April 23, 2018, during the Opening Night Reception. Don’t miss the first day of tutorials and Opening Night Reception!

Register for the conference on the Percona Live Open Source Database Conference 2018 website.

Sponsorships

Limited Sponsorship opportunities for Percona Live 2018 Open Source Database Conference are still available, and offer the opportunity to interact with the DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. Contact live@percona.com for sponsorship details.

  • Diamond Sponsors – Percona, VividCortex
  • Platinum – Alibaba Cloud, Microsoft
  • Gold Sponsors – Facebook, Grafana
  • Bronze Sponsors – Altinity, BlazingDB, Box, Dynimize, ObjectRocket, Pingcap, Shannon Systems, SolarWinds, TimescaleDB, TwinDB, Yelp
  • Contributing Sponsors – cPanel, Github, Google Cloud, NaviCat
  • Media Sponsors – Database Trends & Applications, Datanami, EnterpriseTech, HPCWire, ODBMS.org, Packt

The post Calling All Polyglots: Percona Live 2018 Keynote Schedule Now Available! appeared first on Percona Database Performance Blog.

Apr
05
2018
--

Percona Live Europe 2018 – Save the Date!

Percona Live Europe 2018

Percona Live Europe 2018We’ve been searching for a great venue for Percona Live Europe 2018, and I am thrilled to announce we’ll be hosting it in Frankfurt, Germany! Please block November 5-7, 2018 on your calendar now and plan to join us at the Radisson Blu Frankfurt for the premier open source database conference.

We’re in the final days of organizing for the Percona Live 2018 in Santa Clara. You can still purchase tickets for an amazing lineup of keynote speakers, tutorials and sessions. We have ten tracks, including MySQL, MongoDB, Cloud, PostgreSQL, Containers and Automation, Monitoring and Ops, and Database Security. Major areas of focus at the conference will include:

  • Database operations and automation at scale, featuring speakers from Facebook, Slack, Github and more
  • Databases in the cloud – how database-as-a-service (DBaaS) is changing the DB landscape, featuring speakers from AWS, Microsoft, Alibaba and more
  • Security and compliance – how GDPR and other government regulations are changing the way we manage databases, featuring speakers from Fastly, Facebook, Pythian, Percona and more
  • Bridging the gap between developers and DBAs – finding common ground, featuring speakers from Square, Oracle, Percona and more

The Call for Papers for Percona Live Europe will open soon. We look forward to seeing you in Santa Clara!

The post Percona Live Europe 2018 – Save the Date! appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com