In this blog post, I will try to cover a few dump and data-loading utilities offered by MySQL Shell.
What is MySQL Shell?
It is a robust client and code editor for MySQL. In addition to having APIs for dealing with MySQL, MySQL Shell offers scripting capabilities for JavaScript and Python. It provides an AdminAPI to manage MySQL servers and create InnoDB Cluster and ReplicaSet deployments. It supports an X DevAPI to work on both SQL and NoSQL interfaces via the X Protocol. Finally, it provides utilities to make working with MySQL in Python and Javascript mode easier.
Now, let’s jump into some practical stuff and see what these tools offer us.
Dump Utility
In the dump utility, there are three tools: util.dumpInstance(), util.dumpSchemas(), and util.dumpTables () so let’s discuss them one by one.
1. “util.dumpInstance()” – This was introduced in MySQL Shell 8.0.21 and supports the export of all databases excluding information_schema, MySQL, ndbinfo, performance_schema, and sys schema. The backups can be kept locally or could be taken remotely from the MySQL Shell utility.
Performing a complete instance dump with “util.dumpInstance()”:
a) Log in to the database via MySQL Shell.
shell> mysqlsh --uri root@localhost
b) Prior to doing the actual backup, we will first perform a dryRun procedure to check for any potential problems or compatibility issues.
mysql shell > util.dumpInstance("/home/backup/", {dryRun:”true”})
Output:
dumping.
Acquiring global read lock
Global read lock acquired
Initializing - done
1 out of 5 schemas will be dumped and within them 16 tables, 0 views.
3 out of 5 users will be dumped.
Gathering information - done
All transactions have been started
Global read lock has been released
Writing global DDL files
Writing users DDL
Writing DDL - done
Starting data dump
0% (0 rows / ~1.60M rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed
So, here we didn’t get any issues or warnings so we are good to proceed with the actual backup process.
c) Running the full instance backup and saving in the target location.
mysql shell >util.dumpInstance("/home/backup/")
Output:
dumping.
Acquiring global read lock
Global read lock acquired
Initializing - done
1 out of 5 schemas will be dumped and within them 16 tables, 0 views.
3 out of 5 users will be dumped.
Gathering information - done
All transactions have been started
Global read lock has been released
Writing global DDL files
Writing users DDL
Running data dump using 4 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
101% (1.63M rows / ~1.60M rows), 114.56K rows/s, 22.13 MB/s uncompressed, 10.09 MB/s compressed
Dump duration: 00:00:15s
Total duration: 00:00:15s
Schemas dumped: 1
Tables dumped: 16
Uncompressed data size: 313.55 MB
Compressed data size: 142.87 MB
Compression ratio: 2.2
Rows written: 1629368
Bytes written: 142.87 MB
Average uncompressed throughput: 20.01 MB/s
Average compressed throughput: 9.12 MB/s
We have successfully finished the backup here. By default, it employs “4 threads” and applies the zstd compression algorithm to the database backup.
d) Running the instance dump with more advanced options.
mysql shell >util.dumpInstance("/home/backup/", {threads:8,maxRate:"100M",consistent:true,chunking:true,bytesPerchunk:"64M",compression:"zstd"})
Output:
Acquiring global read lock
Global read lock acquired
Initializing - done
1 out of 5 schemas will be dumped and within them 16 tables, 0 views.
3 out of 5 users will be dumped.
Gathering information - done
All transactions have been started
Global read lock has been released
Writing global DDL files
Writing users DDL
Running data dump using 8 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
101% (1.63M rows / ~1.60M rows), 468.47K rows/s, 84.22 MB/s uncompressed, 38.41 MB/s compressed
Dump duration: 00:00:03s
Total duration: 00:00:03s
Schemas dumped: 1
Tables dumped: 16
Uncompressed data size: 313.55 MB
Compressed data size: 142.87 MB
Compression ratio: 2.2
Rows written: 1629368
Bytes written: 142.87 MB
Average uncompressed throughput: 85.24 MB/s
Average compressed throughput: 38.84 MB/s
There are various options available to control and enhance the dump process.
“dryRun”: List out all of the compatibility issues before starting the dump.
“consistent”: In order to provide consistent backup, it automatically acquires a global read lock. By default, it is turned on.
“chunking”: splits each table's data into a number of files. It is enabled by default.
“bytesPerChunk”: specify the approximate number of bytes that should be written for each data block.
“threads”: The number of parallel threads to employ when dumping data from the MySQL instance. "4" threads are used by default.
“maxRate”: Maximum data read throughput for each thread during the dump in bytes per second.
2) “util.dumpSchemas” – This was introduced in MySQL Shell 8.0.21 and supports the export of selected databases either locally or remotely.
Performing database dump with “util.dumpSchemas()”. Here we will take a specific database(“sbtest”) backup only.
mysql shell > util.dumpSchemas(["sbtest"],"/home/backup/sbtest/",{threads :2})
Output:
dumping.
Acquiring global read lock
Global read lock acquired
Initializing - done
1 schemas will be dumped and within them 16 tables, 0 views.
Gathering information - done
All transactions have been started
Global read lock has been released
Writing global DDL files
Running data dump using 2 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
104% (1.63M rows / ~1.56M rows), 424.56K rows/s, 81.96 MB/s uncompressed, 37.35 MB/s compressed
Dump duration: 00:00:03s
Total duration: 00:00:03s
Schemas dumped: 1
Tables dumped: 16
Uncompressed data size: 313.55 MB
Compressed data size: 142.87 MB
Compression ratio: 2.2
Rows written: 1629368
Bytes written: 142.87 MB
Average uncompressed throughput: 81.82 MB/s
Average compressed throughput: 37.28 MB/s
3) “util.dumpTables” – In MySQL Shell 8.0.22, the table dump utility “util.dumpTables()” was released.
Performing selective tables dump. Here we take only table “sbtest2” dump from the “sbtest” database.
mysql shell> util.dumpTables("sbtest", [ "sbtest2"], "/home/backup/sbtest/",{threads: 4})
Output:
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
101% (101.89K rows / ~100.58K rows), 0.00 rows/s, 0.00 B/s uncompressed, 0.00 B/s compressed
Dump duration: 00:00:00s
Total duration: 00:00:00s
Schemas dumped: 1
Tables dumped: 1
Uncompressed data size: 19.61 MB
Compressed data size: 8.93 MB
Compression ratio: 2.2
Rows written: 101892
Bytes written: 8.93 MB
Average uncompressed throughput: 19.61 MB/s
Average compressed throughput: 8.93 MB/s
Data load utility
So far, we’ve seen how to use various methods to take a dump. We will now demonstrate how to restore the same using a single restoration command for all sorts of backups.
The utility “util.loadDump()” for loading dumps was introduced in MySQL Shell 8.0.21. It enabled the parallel loading of tables or table chunks during the data-loading process.
Load the dump files using util.loadDump().
a) Running a dry run process to validate any compatibility issues or errors prior to the actual restoration process.
mysql shell > util.loadDump("/home/backup/", {dryRun :"true"})
Output:
Loading DDL and Data from '/home/backup/' using 4 threads.
Opening dump...
dryRun enabled, no changes will be made.
Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log
Scanning metadata - done
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL - done
Executing view DDL - done
Starting data load
Executing common postamble SQL
0% (0 bytes / 313.55 MB), 0.00 B/s, 16 / 16 tables done
Recreating indexes - done
No data loaded.
0 warnings were reported during the load.
Note – no data loaded, it just prints us information about any warnings or errors during the restoration process.
b) Running the data restoration process.
mysql shell > util.loadDump("/home/backup/", {progressFile :"/home/backuplog.json",threads:4,backgroundThreads:4,maxBytesPerTransaction:"4096"})
Output:
Loading DDL and Data from '/home/backup/' using 4 threads.
Opening dump...
Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log
Scanning metadata - done
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL - done
Executing view DDL - done
Starting data load
Executing common postamble SQL
100% (313.55 MB / 313.55 MB), 3.03 MB/s, 16 / 16 tables done
Recreating indexes - done
16 chunks (1.63M rows, 313.55 MB) for 16 tables in 1 schemas were loaded in 2 min 43 sec (avg throughput 1.92 MB/s)
0 warnings were reported during the load.
Here, we have successfully completed the restoration process.
There are various options available to control and enhance the data loading process.
“threads”:The number of parallel threads to employ when dumping data from the MySQL instance. "4" is used by default.
“backgroundThreads”:the number of threads in the pool of background threads that were used to retrieve the file data.
“maxBytesPerTransaction”:The maximum number of bytes that can be loaded from a data file with a single LOAD DATA statement.
“loadIndexes”: Secondary indexes are not built during the import when this option is set to false; you must do so afterwards. This may come in handy if you're loading the DDL files and data files separately and want to modify the table structure after the DDL files have been loaded.
“deferTableIndexes”:Postpone creating secondary indexes until after loading the table's contents. This may speed up loading. off denotes that all indexes are built as the table is loaded.
Taking backups on cloud storage (GCP and AWS S3)
MySQL Shell also provides the capability to store dumps remotely over some S3-compatible cloud storage such as Amazon S3 and Google Cloud storage. Here, we’ll talk about how to do it with Google Cloud storage and an Amazon S3 environment.
Let’s start by configuring it for Google Cloud storage.
Prerequisites
- Storage Bucket should exist in Google Cloud
- Access keys and secrets defined in a file(“/home/credentials”)
[gcb]
aws_access_key_id=xxxxxxxx
aws_secret_access_key=xxxxxxxx
a) Running full backup and putting it in the GCP bucket “ajpxctest”.
mysql shell> util.dumpInstance("sbtest",{s3BucketName: "ajpxctest",s3EndpointOverride: "https://storage.googleapis.com", s3CredentialsFile:"/home/credentials", s3Profile: "gcb", threads: 4})
Output:
dumping.
Acquiring global read lock
Global read lock acquired
Initializing - done
1 out of 5 schemas will be dumped and within them 16 tables, 0 views.
3 out of 5 users will be dumped.
Gathering information - done
All transactions have been started
Global read lock has been released
Writing global DDL files
Writing users DDL
Running data dump using 4 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
104% (1.63M rows / ~1.56M rows), 172.12K rows/s, 34.60 MB/s uncompressed, 15.75 MB/s compressed
Dump duration: 00:00:12s
Total duration: 00:00:14s
Schemas dumped: 1
Tables dumped: 16
Uncompressed data size: 313.55 MB
Compressed data size: 142.87 MB
Compression ratio: 2.2
Rows written: 1629368
Bytes written: 142.87 MB
Average uncompressed throughput: 25.21 MB/s
Average compressed throughput: 11.49 MB/s
b) Then, using these same details, we will restore the data from the S3 bucket into the local database.
mysql shell> util.loadDump("sbtest",{s3BucketName: "ajpxctest", s3EndpointOverride: "https://storage.googleapis.com",s3CredentialsFile:"/home/credentials", s3Profile: "gcb", threads: 4})
Output:
Loading DDL and Data from AWS S3 bucket=ajpxctest, prefix='sbtest' using 4 threads.
Opening dump...
Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log
Fetching dump data from remote location...
Listing files - done
Scanning metadata - done
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL - done
Executing view DDL - done
Starting data load
1 thds loading / 100% (313.55 MB / 313.55 MB), 6.23 MB/s, 14 / 16 tables done
Recreating indexes - done
Executing common postamble SQL
16 chunks (1.63M rows, 313.55 MB) for 16 tables in 1 schemas were loaded in 1 min 8 sec (avg throughput 7.41 MB/s)
0 warnings were reported during the load.
S3 options:
“s3BucketName”: Name of the s3 bucket where the dump will be kept.
“s3EndpointOverride”: s3 endpoint or url.
“s3CredentialsFile”: Credential file containing key & secret details.
“s3Profile”: The name of the s3 profile to identify the credential details.
Let’s configure the same for Amazon S3
Prerequisites
- AWS S3 bucket should exist
- Configure AWS CLI
###~/.aws/credentials###
[default]
aws_access_key_id=xxxxxxx
aws_secret_access_key=xxxxxxx
###~/.aws/config###
[default]
region = us-east-1
a) Now, let’s perform the instance backup process which keeps the dump file in “ajtestbkp” S3 bucket.
mysql shell> util.dumpInstance("sbtest",{s3bucketName: "ajtestbkp",threads: 4})
Output:
Acquiring global read lock
Global read lock acquired
Initializing - done
1 out of 5 schemas will be dumped and within them 16 tables, 0 views.
3 out of 5 users will be dumped.
Gathering information - done
All transactions have been started
Global read lock has been released
Writing global DDL files
Writing users DDL
Running data dump using 4 threads.
NOTE: Progress information uses estimated values and may not be accurate.
Writing schema metadata - done
Writing DDL - done
Writing table metadata - done
Starting data dump
101% (1.63M rows / ~1.61M rows), 287.53K rows/s, 53.81 MB/s uncompressed, 24.51 MB/s compressed
Dump duration: 00:00:06s
Total duration: 00:00:07s
Schemas dumped: 1
Tables dumped: 16
Uncompressed data size: 313.55 MB
Compressed data size: 142.87 MB
Compression ratio: 2.2
Rows written: 1629368
Bytes written: 142.87 MB
Average uncompressed throughput: 49.93 MB/s
Average compressed throughput: 22.75 MB/s
b) Now, let’s restore the dump from S3 into our local database.
mysql shell> util.loadDump("sbtest",{s3BucketName: "ajtestbkp", threads: 4})
Output:
Loading DDL and Data from AWS S3 bucket=ajtestbkp, prefix='sbtest' using 4 threads.
Opening dump...
Target is MySQL 5.7.38-41-57-log. Dump was produced from MySQL 5.7.38-41-57-log
Fetching dump data from remote location...
Listing files - done
Scanning metadata - done
Checking for pre-existing objects...
Executing common preamble SQL
Executing DDL - done
Executing view DDL - done
Starting data load
1 thds loading / 100% (313.55 MB / 313.55 MB), 8.80 MB/s, 15 / 16 tables done
Recreating indexes - done
Executing common postamble SQL
16 chunks (1.63M rows, 313.55 MB) for 16 tables in 1 schemas were loaded in 38 sec (avg throughput 9.14 MB/s)
0 warnings were reported during the load.
Summary
These utilities are a great addition to MySQL Shell and very useful in the scenario of backup/restoration of large tables by using different control options. Now, we can have better handling of the dump/load process by defining options such as threads, max rate, and maxBytesPerTransaction. Dump utilities are supported in (MySQL 5.6,5.7, and 8.0), however, for data loading, the database should be either MySQL 5.7 or 8.0.