Hyperscale operators are defined as enormous companies like Amazon, Apple, Facebook and Google that need to provide computing on a massive scale. You would think that there would be a limited number of this type of highly specialized data center, but recent research from Synergy Research found that 2017 was actually a breakout year for new hyperscale data centers across the world — with… Read More
30
2017
Hyperscale data centers reached over 390 worldwide in 2017
28
2017
AWS showed no signs of slowing down in 2017
AWS had a successful year by any measure. The company continued to behave like a startup with the kind of energy and momentum to invest in new areas not usually seen in an incumbent with a significant marketshare lead. How good a year was it? According to numbers from Synergy Research, the company remains the category leader by far with around 35 percent marketshare. Microsoft sits well behind… Read More
22
2017
MongoDB 3.6 Sorting Changes and Useful New Features
In this blog, we will cover new MongoDB 3.6 sorting change and other useful features.
The new MongoDB 3.6 GA release can help you build applications and run them on MongoDB, and we will cover areas where there are changes to server behavior that may affect your application usage of MongoDB.
Sorting Changes
The most significant behavior change from an application perspective is sorting on arrays in find and aggregate queries. It is important to review this new behavior in a test environment when upgrading to MongoDB 3.6, especially if the exact order of sorted arrays in find results is crucial.
In MongoDB 3.6 sorting, a sort field containing an array is ordered with the lowest-valued element of the array first for ascending sorts and the highest-valued element of the array first for descending sorts. Before 3.6, it used the lowest-valued array element that matches the query.
Example: a collection has the following documents:
{ _id: 0, a: [-3, -2, 2, 3] } { _id: 1, a: [ 5, -4 ] }
And we perform this sort operation:
db.test.find({a: {$gte: 0}}).sort({a: 1});
In MongoDB 3.6 sorting, the sort operation no longer takes into account the query predicate when determining its sort key. The operation returns the documents in the following order:
{ "_id" : 1, "a" : [ 5, -4 ] } { "_id" : 0, "a" : [ -3, -2, 2, 3 ] }
Previous to 3.6 the result would be:
{ _id: 0, a: [-3, -2, 2, 3] } { _id: 1, a: [ 5, -4 ] }
More on this change here: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#array-sort-behavior and https://docs.mongodb.com/manual/release-notes/3.6-compatibility/#find-method-sorting.
Wire Protocol Compression
In MongoDB 3.4, wire protocol compression was added to the server and mongo shell using the snappy algorithm. However, this was disabled by default. In MongoDB 3.6, wire protocol compression becomes enabled by default and the zlib compression algorithm was added as an optional compression with snappy being the default. We recommend snappy, unless a higher level of compression (with added cost) is needed.
It’s important to note that MongoDB tools do not yet support wire protocol compression. This includes mongodump, mongorestore, mongoreplay, etc. As these tools are generally used to move a lot of data, there are significant benefits to be had when using these tools over a non-localhost network.
I created this MongoDB ticket earlier this year to add wire protocol compression support to these tools: https://jira.mongodb.org/browse/TOOLS-1668. Please watch and vote for this improvement if this feature is important to you.
$jsonSchema Schema Validation
A big driver for using MongoDB is its “schema-less”, document-based data model. A drawback to this flexibility is sometimes it can result in incomplete/incorrect data in the database, due to the lack of input checking and sanitization.
The relatively unknown “Schema Validation” was introduced in MongoDB 3.2 to address this risk. This feature allowed a user to define what fields are required and what field-values are acceptable using a simple $or array condition, known as a “validator”.
In MongoDB 3.6, a much-more friendy $jsonSchema format was introduced as a “validator” in Schema Validation. On top of that, the ability to query documents matching a defined $jsonSchema was introduced!
Below is an example of me creating a collection named “test” with the required field “x” that must be the bsonType: “number”:
test1:PRIMARY> db.createCollection("test", { validator: { $jsonSchema: { bsonType: "object", required: ["x"], properties: { x: { bsonType: "number", description: ”field ‘x’ must be a number" } } } } }) { "ok" : 1, "operationTime" : Timestamp(1513090298, 1) }
Now when I insert a document that does not contain this criterion (“x” should be a number), I get an error:
test1:PRIMARY> db.test.insert({ x: "abc" }) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 121, "errmsg" : "Document failed validation" } })
Of course, if my document matches the schema my insert will succeed:
test1:PRIMARY> db.test.insert({ x: 1 }) WriteResult({ "nInserted" : 1 })
To demonstrate $jsonSchema further, let’s perform a .find() query that returns documents matching my defined schema:
test1:PRIMARY> db.test.find({ $jsonSchema:{ bsonType: "object", required: ["x"], properties: { x: { bsonType: "number", description: "must be a number" } } } }) { "_id" : ObjectId("5a2fecfd6feb229a6aae374d"), "x" : 1 }
As we can see here, combining the power of the “schema-less” document model of MongoDB with the Schema Validation features is a very powerful combination! Now we can be sure our documents are complete and correct while still offering an extreme amount of developer flexibility.
If data correctness is important to your application, I suggest you implement a Schema Validator at the very start of your application development as implementing validation after data has been inserted is not straightforward.
More on $jsonSchema can be found here: https://docs.mongodb.com/manual/core/schema-validation/#json-schema
DNS SRV Connection
DNS-based Seedlists for connections is a very cool addition to MongoDB 3.6. This allows the server, mongo shell and client drivers (that support the new feature) to use a DNS SRV record to gather a list of MongoDB hosts to connect to. This avoids administrators from having to change seed hosts lists on several servers (usually in an application config) when the host topology changes.
DNS-based seedlists begin with “mongodb+srv://” and have a single DNS SRV record as the hostname.
An example:
mongodb+srv://server.example.com/
Would cause a DNS query to the SRV record ‘_mongodb._tcp.server.example.com’.
On the DNS server, we set the full list of MongoDB hosts that should be returned in this DNS SRV record query. Here is an example DNS response this feature requires:
Record TTL Class Priority Weight Port Target _mongodb._tcp.server.example.com. 86400 IN SRV 0 5 27317 mongodb1.example.com. _mongodb._tcp.server.example.com. 86400 IN SRV 0 5 27017 mongodb2.example.com.
In this above example the hosts ‘mongodb1’ and ‘mongodb2.example.com’ would be used to connect to the database. If we decided to change the list of hosts, only the DNS SRV record needs to be updated. Neat!
More on this new feature here: https://docs.mongodb.com/manual/reference/connection-string/#connections-dns-seedlist
dropDatabase Wait for Majority
In 3.6 the behavior of ‘dropDatabase’ was changed to wait for a majority of members to drop the database before returning success. This is a great step in the right direction to improve data integrity/correctness.
More on this change here: https://docs.mongodb.com/manual/reference/command/dropDatabase/#behavior
FTDC for mongos
On mongod instances the FTDC (full-time diagnostic capture) feature outputs .bson files to a directory named ‘diagnostics.data’ in the database path (the server dbPath variable). These files are useful for diagnostics, understanding crashes, etc.
On mongos the new FTDC support outputs the .bson files to ‘mongos.diagnostic.data’ beside the mongos log file. You can change the output path for FTDC files with the server parameter diagnosticDataCollectionDirectoryPath.
FTDC output files must be decoded to be read. The GitHub project ‘ftdc-utils’ is a great tool for reading these specially-formatted files, see more about this tool here: https://github.com/10gen/ftdc-utils.
Here is an example of how to decode the FTDC output files. We can follow the same process for mongod as well:
$ cd /path/to/mongos/mongos.diagnostic.data $ ftdc decode metrics.2017-12-12T14-44-36Z-00000 -o output
Now it decodes the FTDC metrics to the file ‘output’.
listDatabases Filters
Added in MongoDB 3.6, you can now filter the ‘listDatabases‘ server command. Also, a ‘nameOnly’ boolean option was added to only output database names without additional detail.
The filtering of output is controlled by the new ‘listDatabases‘ option ‘filter’. The ‘filter’ variable must be a match-document with any combination of these available fields for filtering:
- name
- sizeOnDisk
- empty
- shards
An example filtering by “name” equal to “tim”:
test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { name: "tim" } }) { "databases" : [ { "name" : "tim", "sizeOnDisk" : 8192, "empty" : false } ], "totalSize" : 8192, "ok" : 1, "operationTime" : Timestamp(1513100396, 1) }
Here, I am filtering ‘sizeOnDisk’ to find database larger than 30,000 bytes:
test1:PRIMARY> db.adminCommand({ listDatabases:1, filter: { sizeOnDisk: { $gt: 30000 } } }) { "databases" : [ { "name" : "admin", "sizeOnDisk" : 32768, "empty" : false }, { "name" : "local", "sizeOnDisk" : 233472, "empty" : false }, { "name" : "test", "sizeOnDisk" : 32768, "empty" : false }, { "name" : "tim", "sizeOnDisk" : 32768, "empty" : false } ], "totalSize" : 331776, "ok" : 1, "operationTime" : Timestamp(1513100566, 2) }
This can be really useful to reduce the size of the ‘listDatabases‘ result.
More on this here: https://docs.mongodb.com/manual/reference/command/listDatabases/#dbcmd.listDatabases
Arbiter priority: 0
MongoDB 3.6 changed the arbiter replica set priority to be 0 (zero). As the arbiter’s priority is not considered, this is a more correct value. You’ll notice your replica set configuration is automatically updated when upgrading to MongoDB 3.6.
More on this change here: https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/#considerations
More on MongoDB 3.6
There are many more changes in this release. It’s important to review these resources below before any upgrade. We always strongly recommend testing functionality in a non-production environment!
Check David Murphy’s blog post on MongoDB 3.6 sessions.
Release Notes: https://docs.mongodb.com/manual/release-notes/3.6/
Compatibility Changes: https://docs.mongodb.com/manual/release-notes/3.6-compatibility/
Conclusion
It is really exciting to see that with each recent major release the MongoDB project is (impressively) tackling both usability/features, while significantly hardening the existing features.
Give your deployment, developers and operations engineers the gift of these new features and optimizations this holiday season/new year! Best wishes in 2018!
21
2017
This Week in Data with Colin Charles 20: cPanel changes strategy, Percona Live CFP extended
Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.
I think the biggest news from last week was from cPanel – if you haven’t already read the post, please do – on Being a Good Open Source Community Member: Why we hesitated on MySQL 5.7. cPanel anticipated MariaDB being the eventual replacement for MySQL, based on movements from Red Hat, Wikipedia and Google. The advantage focused on transparency around security disclosure, and the added features/improvements. Today though, “MySQL now consistently matches or outpaces MariaDB when it comes to development and releases, which in turn is increasing the demand on us for providing those upgraded versions of MySQL by our users.” And maybe a little more telling, “when MariaDB 10.2 became stable in May 2017 it included many features found in MySQL 5.7. However, MySQL reached stable nearly 18 months earlier in October 2015.” (emphasis mine).
So cPanel is going forth and supporting MySQL 5.7. They will continue supporting MariaDB Server for the foreseeable future. This really is cPanel ensuring they are responsive to users: “The people using and building database-driven applications are doing so with MySQL in mind, and are hesitant to add support for MariaDB. Responding to our community’s desires is one of the most important things to us, and this is something that we are hearing asked for from our community consistently.”
I, of course, think this is a great move. Users deserve choice. And MySQL has features that are sometimes still not included in MariaDB Server. Have you seen the Complete list of new features in MySQL 5.7? Or my high-level response to a MariaDB Corporation white paper?
I can only hope to see more people think pragmatically like cPanel. Ubuntu as a Linux distribution still does – you get MySQL 5.7 as a default (very unlike the upstream Debian which ships MariaDB Server nowadays). I used to be a proponent of MariaDB Server being everywhere, when it was community-developed, feature-enhanced, and backward-compatible. However, the moment it stopped being a branch and a true fork is the moment where trouble lies for users. I think it was still marginally fine with 10.0, and maybe even 10.1, but the ability to maintain feature parity with enhanced features has long gone. Short of a rebase? But then… what would be different to the already popular branch of MySQL called Percona Server for MySQL?
While there are wins and support from cloud vendors, like Amazon AWS RDS and Microsoft Azure, you’ll notice that they offer both MySQL and MariaDB Server. Google Cloud SQL notably only offers MySQL. IBM may be a sponsor of the MariaDB Foundation, but I don’t see their services like Compose offering anything other than MySQL (with group replication nonetheless!). Platinum member Alibaba Cloud offers MySQL and PostgreSQL. However, Tencent seems to suggest that MariaDB is coming soon? One interesting statistic to watch would be user uptake naturally.
Events
From an events standpoint, the Percona Live 2018 Call for Papers has been extended to January 12, 2018. We expect an early announcement of maybe ten talks in the week of January 5. Please submit to the CFP. Have you got your tickets yet? Nab them during our Percona Live 2018 super saver registration when they are the best price!
FOSDEM has got Sveta and myself speaking in the MySQL and Friends DevRoom, but we also have good news in the sense that Peter Zaitsev is also going to be at FOSDEM – speaking in the main track. We’ll also have plenty of schwag at the stand.
I think it’s important to take note of the updates to Percona bug tracking: yes, its Jira all the way. Would be good for everyone to start also looking at how the sausage is made.
Dragph, a “distributed fast graph database“, just raised $3m and released 1.0. Have you used it?
On a lighter note, there seems to be a tweet going around by many, so I thought I’d share it here. Merry Christmas and Happy Holidays.
He’s making a database
He’s sorting it twice
SELECT * FROM girls_boys
WHERE behaviour
= “nice”
SQL Claus is coming to town!
Releases
- Percona Monitoring and Management 1.5.3
- ProxySQL 1.4.4 – some interesting features include bandwidth throttling, limit connections to backends, monitoring replication lag, and more.
- MariaDB Server 10.2 is in CentOS 6 and CentOS 7 via Software Collections.
Link List
- Good presentation on multi-user Presto usage.
- You probably must be dabbling with containers, so Google released container-diff, a tool for quickly comparing container images. Check it out on GitHub.
- James Governor (RedMonk), writes On AWS and Pivotal, opinions and overlaps.
- That time Larry Ellison allegedly tried to have a professor fired for benchmarking Oracle
- Apache Cassandra users, should you use incremental repair?
Upcoming appearances
- FOSDEM 2018 – Brussels, Belgium – February 3-4 2018
- SCALE16x – Pasadena, California, USA – March 8-11 2018
Feedback
I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.
21
2017
Three P’s of a Successful Black Friday: Percona, Pepper Media Holding, and PMM
As we close out the holiday season, let’s look at some data that tells us how to guarantee a successful Black Friday (from a database perspective).
There are certain peak times of the year where companies worldwide hold their breath in the hope that their databases do not become overloaded or unresponsive. A large percentage of yearly profits are achieved in a matter of hours during peak events. It is critical that the database environment remains online and responsive. According to a recent survey, users will not wait more than 2.5 seconds for a site to load before navigating elsewhere. Percona has partnered with many clients over the years to ensure success during these critical events. Our goal is always to provide our clients with the most responsive, stable open-source database environments in order to meet their business needs.
First Stop: Germany
In this blog post, we are going to take a closer look at what happened during Black Friday for a high-demand, high-traffic, business-critical application. Pepper Media Holding runs global deals sites where users post and vote on top deals on products in real-time. To give you a better idea of what the user sees, there is a screenshot below from their Germany mydealz.de branch of Pepper Media Holding.
As you can imagine, Black Friday results in a huge spike in traffic and user contribution. In order to ensure success during these crucial times, Pepper Media Holding utilizes Percona’s fully managed service offering. Percona’s Managed Services team has become an extension of Pepper Media Holding’s team by helping plan, prepare, and implement MySQL best-practices across their entire database environment.
Pepper Media Holding and Percona thought it would be interesting to reflect on Black Friday 2017 and how we worked together to flourish under huge spikes in query volume and user connections.
Below is a graph of MySQL query volume for Germany servers supporting the mydealz.de front-end. This graph is taken from Percona’s Managed Service Team’s installation of Percona Monitoring and Management (PMM), which they use to monitor Pepper Media’s environment.
As to be expected, MySQL query volume peaked shortly before and during midnight local time. It also spiked early in the morning as users were waking up. The traffic waned throughout the day. The most interesting data point is the spike from 5 AM to 9 AM which saw an 800% increase from the post-midnight dip. The sustained two-day traffic surge was on average a 200% increase when compared to normal, day-to-day query traffic hitting the database.
For more statistics on how the mydealz.de fared from a front-end and user perspective, visit Pepper Media Holding’s newsroom where Pepper Media has given a breakdown of various statistics related to website traffic during Black Friday.
Next Stop: United Kingdom
Another popular Pepper Media Holding branch is in the United Kingdom – better known as HotUKDeals. HotUKDeals hosts user-aggregated and voted-on deals for UK users. This is the busiest Pepper Media Holding database environment on average. Below is a screenshot of the user interface.
The below graphs are from our Managed Service Team’s Percona Monitoring and Management installation and representative of the UK servers supporting the HotUKDeals website traffic.
The first graph we are taking a look at is MySQL Replication Delay. As you can see, the initial midnight wave of Black Friday deals caused a negligible replica delay. The Percona Monitoring and Management MySQL Replication Delay graph is based on seconds_behind_master which is an integer value only. This means the delay is somewhere between 0 and 1 most of the time. Only once did it go between 1 and 2 over the entire course of Black Friday traffic.
The below graphs highlight the MySQL Traffic seen on the UK servers during the Black Friday traffic spike. One interesting note with this graph is the gradual lead-up to the midnight Black Friday spike. It looks like Black Friday is overstepping its boundaries into Gray Thursday. The traffic spikes here mimic the ones we saw in Germany. There’s an initial spike at midnight on Black Friday and then another spike as shoppers are waking up for their day. The UK servers saw a 361% spike in traffic the morning of Black Friday.
MySQL connections also saw an expected and significant spike during this time. Neglecting to consider max_connections system parameter during an event rush might result in “ERROR 1040 (00000): Too many connections.” However, our CEO, Peter Zaitsev, cautions against absent-mindedly setting this parameter at an unreachable level just to avoid this error. In a blog post, he explained best-practices for this scenario.
The MySQL query graph below shows a 400% spike in MySQL queries during the peak Black Friday morning traffic rush. The average number of queries hitting the database over this two day period is significantly higher than normal – approximately 183%.
Conclusion
Percona reported no emergencies during the Black Friday period for its Managed Service customers – including Pepper Media Holding. We saw similarly high traffic spikes among our customers during this 2017 Black Friday season. I hope that this run-down of a few PMM graphs taken during Pepper Media Holding’s Black Friday traffic period was informative and interesting. Special thanks to Pepper Media Holding for working with us to create this blog post.
Note: Check out our Pepper Media case study on how Percona helps them manage their database environment.
If you would like to further explore the graphs and statistics that Percona Monitoring and Management has to offer, we have a live demo available at https://pmmdemo.percona.com. To discuss how Percona Managed Services can help your database thrive during event-based traffic spikes (and all year round), please call us at +1-888-316-9775 (USA), +44 203 608 6727 (Europe), or have us contact you.
20
2017
Percona Live 2018 Call for Papers Deadline Extended to January 12, 2018
Percona is extending the Percona Live 2018 call for papers deadline to January 12, 2018!
Percona’s gift to you this holiday season is the gift of time – submit your speaking topics right up until January 12, 2018!
As the year winds up, we received many requests to extend the Percona Live Open Source Database Conference 2018 call for papers. Since many speakers wanted to submit during the week that they’re planning vacations (from Christmas until New Year’s Day), we realized that December 22 was too soon.
If you haven’t submitted already, please consider doing so. Speaking at Percona Live is a great way to talk about what you’re doing, build up your personal and company brands, and get collaborators to your project. If selected, all speakers receive a full complimentary conference pass.
Percona Live 2018 is the destination to share, learn and explore all pertinent topics related to open source databases. The theme for Percona Live 2018 is “Championing Open Source Databases,” with topics on MySQL, MongoDB and other open source databases, including time series databases, PostgreSQL and RocksDB. Session tracks include Developers, Operations, and Business/Case Studies.
Remember, just like last year, we aren’t looking for just MySQL-ecosystem–related talks (that includes MariaDB Server and Percona Server for MySQL). We are actively looking for talks around MongoDB, as well as other open source databases (so this is where you can add PostgreSQL, time series databases, graph databases, etc.). That also involves complementary technologies, such as the increasing importance of the cloud and container solutions such as Kubernetes.
Talk about your journey to open source. Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Share your case studies, best practices and technical knowledge with an engaged audience of open source peers.
We are looking for breakout sessions (25 or 50 minutes long), tutorials (3 hours or 6 hours long), and lightning talks and birds of a feather sessions. Submit as many topics as you think you can deliver well.
The conference itself features one day of tutorials and two days of talks. There will also be exciting keynote talks. Don’t forget that registration is now open, and our Super Saver tickets are the best price you can get (Super Saver tickets are on sale until January 7, 2018).
If your company is interested in sponsoring the conference, please take a look at the sponsorship prospectus.
All in, submit away and remember the Percona Live 2018 call for papers deadline is January 12, 2018. We look forward to seeing you at the conference from April 23-25 2018 in Santa Clara.
20
2017
Updates to Percona Bug Tracking
We’re completing our move of Percona bug tracking into JIRA, and the drop-dead date is December 28, 2017.
For some time now, Percona has maintained both the legacy Launchpad bug tracking system and a JIRA bug tracking system for some of the newer products. The time has come to consolidate everything into the JIRA bug tracking system.
Assuming everything goes according to schedule, on December 28, 2017, we will copy all bug reports in Launchpad into the appropriate JIRA projects (with the appropriate issue state). The new JIRA issue will link to the original Launchpad issue, and the new JIRA issue link is added to the original Launchpad issue. Once this is done, we will then turn off editing on the Launchpad projects.
Q&A
Which Launchpad projects are affected?
- https://launchpad.net/percona-server moves to https://jira.percona.com/projects/PS/issues
- https://launchpad.net/percona-xtradb-cluster moves to https://jira.percona.com/projects/PXC/issues
- https://launchpad.net/percona-xtrabackup moves to https://jira.percona.com/projects/PXB/issues
- https://launchpad.net/percona-toolkit moves to https://jira.percona.com/projects/PT/issues
Why are you copying all closed issues from Launchpad?
Copying all Launchpad issues to JIRA enables it to be the one place to search for previously reported issues, instead of having to search for old issues in Launchpad and new issues in JIRA.
What should I do now to prepare?
Go to https://jira.percona.com/ and create an account.
Thanks for reporting bugs, and post any questions in the comments section.
20
2017
Google Ventures invests in Ripcord’s paper digitizing robotics service
The $40 million Ripcord Series B we reported back in August has just been bumped up to $65 million, courtesy of additional funding led by Google Ventures. The added funding comes as the record keeping company continues to grow at a healthy clip. The Bay Area based service added 100 jobs this year, and is on track to add another 150 in 2018. The service came out of stealth in March of this… Read More
20
2017
WeWork’s Powered By We product is central to 2018 growth strategy
WeWork had a big year in 2017. The seven-year-old company opened 90 new locations, doubling its global membership, and expanded into new cities in Latin America, Asia and Australia, and Europe and Israel. It is reportedly valued at $20 billion. While 2018 holds more of the same — WeWork plans to launch 1 million square feet of new space each month next year — the company also plans… Read More
19
2017
SendBird raises another $16M to help developers add chat functions to a service
If you go to any company’s website these days, you’re probably starting to see some chat functionality more and more often — and for good reason, as it’s a quick way for those companies to get in touch with their potential customers. And SendBird, which launched in February this year out of Y Combinator, has tried to quietly begin eating up this space by giving… Read More