Nov
19
2020
--

Amazon S3 Storage Lens gives IT visibility into complex S3 usage

As your S3 storage requirements grow, it gets harder to understand exactly what you have, and this especially true when it crosses multiple regions. This could have broad implications for administrators, who are forced to build their own solutions to get that missing visibility. AWS changed that this week when it announced a new product called Amazon S3 Storage Lens, a way to understand highly complex S3 storage environments.

The tool provides analytics that help you understand what’s happening across your S3 object storage installations, and to take action when needed. As the company describes the new service in a blog post, “This is the first cloud storage analytics solution to give you organization-wide visibility into object storage, with point-in-time metrics and trend lines as well as actionable recommendations,” the company wrote in the post.

Amazon S3 Storage Lens Console

Image Credits: Amazon

The idea is to present a set of 29 metrics in a dashboard that help you “discover anomalies, identify cost efficiencies and apply data protection best practices,” according to the company. IT administrators can get a view of their storage landscape and can drill down into specific instances when necessary, such as if there is a problem that requires attention. The product comes out of the box with a default dashboard, but admins can also create their own customized dashboards, and even export S3 Lens data to other Amazon tools.

For companies with complex storage requirements, as in thousands or even tens of thousands of S3 storage instances, who have had to kludge together ways to understand what’s happening across the systems, this gives them a single view across it all.

S3 Storage Lens is now available in all AWS regions, according to the company.

Sep
19
2019
--

Quilt Data launches from stealth with free portal to access petabytes of public data

Quilt Data‘s founders, Kevin Moore and Aneesh Karve, have been hard at work for the last four years building a platform to search for data quickly across vast repositories on AWS S3 storage. The idea is to give data scientists a way to find data in S3 buckets, then package that data in forms that a business can use. Today, the company launched out of stealth with a free data search portal that not only proves what they can do, but also provides valuable access to 3.7 petabytes of public data across 23 S3 repositories.

The public data repository includes publicly available Amazon review data along with satellite images and other high-value public information. The product works like any search engine, where you enter a query, but instead of searching the web or an enterprise repository, it finds the results in S3 storage on AWS.

The results not only include the data you are looking for, it also includes all of the information around the data, such as Jupyter notebooks, the standard workspace that data scientists use to build machine learning models. Data scientists can then use this as the basis for building their own machine learning models.

The public data, which includes more than 10 billion objects, is a resource that data scientists should greatly appreciate it, but Quilt Data is offering access to this data out of more than pure altruism. It’s doing so because it wants to show what the platform is capable of, and in the process hopes to get companies to use the commercial version of the product.

Screen Shot 2019 09 16 at 2.31.53 PM

Quilt Data search results with data about the data found (Image: Quilt Data)

Customers can try Quilt Data for free or subscribe to the product in the Amazon Marketplace. The company charges a flat rate of $550 per month for each S3 bucket. It also offers an enterprise version with priority support, custom features and education and on-boarding for $999 per month for each S3 bucket.

The company was founded in 2015 and was a member of the Y Combinator Summer 2017 cohort. The company has received $4.2 million in seed money so far from Y Combinator, Vertex Ventures, Fuel Capital and Streamlined Ventures, along with other unnamed investors.

Nov
07
2017
--

New tools help could help prevent Amazon S3 data leaks

 If you do a search for Amazon S3 breaches due to customer error of leaving the data unencrypted, you’ll see a long list that includes a DoD contractor, Verizon (the owner of this publication) and Accenture, among the more high profile examples. Today, AWS announced a new set of five tools designed to protect customers from themselves and ensure (to the extent possible) that the data in S3… Read More

Sep
19
2017
--

Minio scores $20 million Series A to build a neutral object storage layer

 Minio has a plan to become the neutral object storage layer, while still maintaining Amazon S3 object storage compatibility. That may seem like an odd strategy, but as CEO Anand Babu Periasamy, co-founder and CEO of Minio points out, there is a clear market need.
By building a solution that enables customers to store data across a variety of solutions including S3, he believes he is giving… Read More

Mar
01
2017
--

The day Amazon S3 storage stood still

Jeff Bezoz, CEO of Amazon. By now you’ve probably heard that Amazon’s S3 storage service went down in its Northern Virginia datacenter for the better part of 4 hours yesterday, and took parts of a bunch of prominent websites and services with it. It’s worth noting that as of this morning, the Amazon dashboard was showing everything was operating normally. While yesterday’s outage was a big deal… Read More

Nov
22
2016
--

AWS drops its storage prices and launches new cold storage retrieval options

Perito Moreno Glacier Amazon Web Services (AWS) today announced a significant price drop for some of its storage services. In addition, it is also launched a few new features for developers who want to use its Glacier cold storage service. The new prices that most developers will likely care about are those for S3, AWS’ main cloud storage service. Instead of six pricing tiers, S3 will now use three: 0-50 TB;… Read More

Mar
13
2013
--

MySQL Backup tools used by Percona Remote DBA for MySQL

Percona Remote DBA for MySQLAs part of Percona Remote DBA for MySQL service we recognize that reliable backups are one of the most important things we can bring to the table. In my experience handling emergencies, the single worst thing that can happen is finding out you don’t have backups available when some sort of data loss or catastrophic event occurs.

With our Remote DBA service we can take care of backups for you, what follows are some of the internals of our implementation.

What kind of outages can happen?

  • Someone runs UPDATE or DELETE and forgets the where clause or filters weren’t quite right
  • The application had a bug causing data to be removed or overwritten
  • A table (or entire schema) was dropped accidentally
  • Your InnoDB table was corrupt and mysql shuts down
  • Your server or RAID controller crashes and all data is lost on that server
  • A disk failed, and RAID array does not recover
  • You run into a InnoDB corruption bug that propagates via replication (not common, but does happen)
  • You lose your entire SAN and all your DB servers were located there. Let’s hope your backups are somewhere else!
  • You lose a PSU or network switch in your datacenter and some or all of your servers go down in that location
  • Your entire datacenter loses power and the generators do not start, which happens more often than you might think

What tools do we use in Remote DBA?

We have these major components:
  1. Percona XtraBackup for MySQL for binary backups
  2. mydumper for logical backups
  3. mysqlbinlog 5.6
  4. Amazon S3
  5. monitoring for all the above

Philosophy on backups

  •  It is a good idea to schedule both logical and binary backups. They each have their use cases and add redundancy to your backups. If there is an issue with your backup, it’s likely not to affect the other tool.
  • Store your backups on more than one server.
  • In addition to local copies, store backups offsite. Look at the cost of S3 or S3+Glacier, it’s worth the peace of mind!
  • Test your backups, and if you have a test environment, load them there periodically. You can also spin up an EC2 instance to load your backups onto. In addition, you can binlog rollforward 24 hours of binlogs as a good test.
  • Store your binlogs off your primary server so you can perform point in time recovery.
  • Store your binlogs offsite for disaster recovery scenarios.
  • Run pt-table-checksum periodically (i.e. once a month) and make sure your servers data stays consistent. Checksumming is important, as backups are typically pulled off a slave and it’s vital that it has the same data.

How do we use these components to give our customers reliable backups?

Think about the 10 example outages listed above. Each tool has it’s strong points given the conditions.

Percona XtraBackup for MySQL for binary backups.

Strong Points:
  • It can restore an entire server very fast. Often the limiter of how fast this can be restored to another server, is how fast you can transfer data over your network. If you have 1GB network and you have 1TB of data, it could take awhile.
  • It can compress the DB on the fly
  • It can backup a server at approximately the maximum rate the server allows, given it’s IO system
  • It can typically execute a backup with little to no major impact on the server. For example in xtrabackup 2.0.5+, the time taken for “FLUSH TABLES WITH RAED LOCK” is normally under 1 second.
Tips/Tricks:
  • If you have a lot of non-transactional tables (i.e. myisam), use –rsync option. This will rsync a copy of all the frm files and all the MYD/MYI files. It then does a second rsync while under a global lock. This means where you may have been locked for hours where you had many non-transactional tables, now you can be locked sub-second. Even with innodb only this can greatly cut down on the lock time by syncing the frm files.
  • Enable –slave-info when backing up from a slave so you know what the position you are in the master’s bin logs
  • –compress option, compresses on the fly using qpress under the hood.
When do we typically use xtrbackup restores:
  • Setting up new slaves
  • When we lose an entire server due to hardware failure, corruption, etc
  • When the majority of data on the server was lost. e.g. there is one primary schema and that schema was dropped. Basically when restoring may take less time that trying to load a logical backup.

Restoring your data from backup is another topic. Piecing together data after accidental data loss is one of Percona’s specialties, and there are many different techniques depending on the scenario. I will go through some of these in detail in a future blog post.

Mydumper for logical backups

Strong Points:
  • Very fast for logical backups – compared to mysqldump
  • Consistent backups between myisam and innodb tables. Global read lock only held until myisam tables are dumped.
    • We are researching into how we could further improve lock times here when non-transactional tables are
      used
  • Almost no locking, if not using myisam tables
  • Built in compression
  • Each table is dumped to a separate file. This is very important to make restoring single tables easy. You can quickly restore a single table, instead of restoring your entire backup just to find a tiny table you need. This is actually the most common type of restore needed, so it’s important to make this operation as painless as possible.
  • Compressed mydumper typically 3x-5x smaller vs compressed xtrabackup
  • Typically we upload mydumper backups to s3 vs xtrabackup given the time needed to upload/download. Though it depends on the available bandwidth and should be factored into your restore time.

Problems:

  • You can’t rely on mydumper to dump schema’s. It does not handle views/triggers/procedures etc. Run with –no-schemas, instead use mysqldump for the schemas and rely on mydumper for data only.
  • You will have to compile it yourself as binary packages aren’t distributed
  • Be careful with importing a dump from a server running in a different timezone. We have a fix here.
Details on how we dump schemas:
  • loop through each DB
    • write out ALTER DATABASE DEFAULT CHARACTER SET <charset> to the schema file, putting in the current charset
    • mysqldump … -d -R –skip-triggers, out to the schema file
    • create a schema-post file that has the triggers # mysqldump … -d -t
How to restore mydumper data:
  • Load the schema file
  • Run myloader –threads=x
  • Load the schema-post file
I will get into specifics on the tips/tricks to restore data in a future blog post.
Tips/Tricks:
  • run with –kill-long-queries to avoid nasty problems with “FLUSH TABLES WITH READ LOCK”
  • –compress, compresses tables per file and should typically be enabled by default. The time needed to uncompress is not a limiting factor on restore time when done inline.
When do we typically use mydumper restores:
  • Restoring a single file
  • Restoring a single schema or rolling forward a single schema to a point in time
  • Restoring data while automatically replicating out to all slaves

mysqlbinlog 5.6

Last year Percona IT director Tamas Kozak had a great blog post that showed how mysqlbinlog in 5.6 could be used. With mysqlbinlog 5.6, you can now pull binary logs in real time to another server using “mysqlbinlog … –read-from-remote-server –raw –stop-never”

  • Useful to mirror the binlogs on the master to a second server.
  • Allows you to roll forward backups even after losing the master
  • Very useful for disaster recovery.
  • You can have your backups in S3 and mysqlbinlog –stop-never running on a small ec2 instance. This can allow for a very low cost disaster recovery plan to ensure you will not lose data even in the worst case scenarios.
  • Takes very little resources to run, can run about anywhere with disk space and writes out binlog files sequentially.
Tips/Tricks (how we run this):
  • Ensure it stays running, restart it if it appears to be hanging
  • Verify the file is the same on master and slave
  • Re-transfer files that are partially transferred
  • Compress the files after successful transfer

Amazon S3 for MySQL

I discuss S3 here but other cloud based storage can be used as well. S3 is just the most popular in this category and is in wide use.
Details:
  • s3cmd – we have been using the version from github,  Mostly for multi-part upload support. This prevents us from having to split files up before uploading to S3.
  • There is released alpha version of this version here
  • You can now set bucket lifecycle properties so data over X days is archived to Glacier and data over Y days is removed. This is very convenient feature and allows you to cost effectively store long term backups with little additional work
Tips/Tricks:
  • –add-header=x-amz-server-side-encryption:AES256 to use the server side encryption feature which helps with some types of compliance. We also have the capability to encrypt all files with gpg prior to upload via a separate script
  • use_https = True, especially if your data is not encrypted before transfer

Monitoring

  • Monitoring is the most important piece to tie all of these process together. We employ nsca nagios alerts for all of the backup processes.
  • freshness_threshold should be set so if your nsca hasn’t checked in within a certain period it will alert you. For example if you backup once a day a good threshold could be 36 hours.
  • For our mysqlbinlog processes, we have it sending nsca alerts every  30 seconds and have it alert when nothing has been received for 15 minutes -> 1 hour
  • If backups throw an error and are aborted, we send a critical alert immediately to be investigated
  • The number one cause of backup alerts are due to problems with “FLUSH TABLE WITH READ LOCK”. Namely when a select is blocking the flush from completing and queuing all requests behind it. Our current solution to deal with this issue is we have a guardian process that runs during a backup. It then kills any process that causes a stall of the flush. We are also researching into other ways that could improve this in the future.

Other details on Percona Remote DBA for MySQL backup systems for future posts

  • Detailed strategies for different types of restores
  • Strategies on retention dailies weeklies, long term backups
  • Decompressing Percona XtraBackup for MySQL  in parallel using all your resources available
  • Downloading from s3 in parallel
  • Parallel encryption/description
  • Hardlinking of backups. Given both our mydumper and xtrbackup are seperated by file, for files that don’t change they can be easily hardlinked to typically save 20-80% of space locally

The post MySQL Backup tools used by Percona Remote DBA for MySQL appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com