Percona Toolkit 3.6.0 has been released on June 12, 2024. The most important updates in this version are: The possibility to resume pt-online-schema-change if it is interrupted. eu-stack support in pt-pmp that significantly improves this tool’s performance and decreases the load it causes on production servers. New tool pt-eustack-resolver Packages for Ubuntu 24.04 (Noble Numbat) […]
12
2017
Gh-ost benchmark against pt-online-schema-change performance
In this blog post, I will run a gh-ost benchmark against the performance of pt-online-schema-change.
When gh-ost came out, I was very excited. As MySQL ROW replication became commonplace, you could use it to track changes instead of triggers. This practice is cleaner and safer compared to Percona Toolkit’s pt-online-schema-change. Since gh-ost doesn’t need triggers, I assumed it would generate lower overhead and work faster. I frequently called it “pt-online-schema-change on steroids” in my talks. Finally, I’ve found some time to check my theoretical claims with some benchmarks.
DISCLAIMER: These benchmarks correspond to one specific ALTER TABLE on the table of one specific structure and hardware configuration. I have not set up a broad set of tests. If you have other results – please comment!
Benchmark Setup Details
- pt-online-schema-change from Percona Toolkit 3.0.3
- gh-ost 1.0.36
- Percona Server 5.7.18 on Ubuntu 16.04 LTS
- Hardware: 28CPU cores/56 Threads. 128GB Memory. Samsung 960 Pro 512GB
- Sysbench 1.0.7
Prepare the table by running:
sysbench --threads=40 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 prepare
The table size is about 3GB (completely fitting to innodb_buffer_pool).
Run the benchmark in “full ACID” mode with:
- sync_binlog=1
- innodb_flush_log_at_trx_commit=1
- innodb_doublewrite=1
This is important as this workload is heavily commit-bound, and extensively relies on group commit.
This is the pt-online-schema-change command to alter table:
time pt-online-schema-change --execute --alter "ADD COLUMN c1 INT" D=sbtest,t=sbtest1
This the gh-ost command to alter table:
time ./gh-ost --user="sbtest" --password="sbtest" --host=localhost --allow-on-master --database="sbtest" --table="sbtest1" --alter="ADD COLUMN c1 INT" --execute
Tests Details
For each test the old sysbench table was dropped and a new one prepared. I tested alter table in three different cases:
- When nothing else was running (“Idle Load”)
- When the system handled about 2% of load it can handle at full capacity (“Light Background Load”)
- When the system handled about 40% of the possible load, with sysbench injected about 25% of the transactions/sec the system could handle at full load (“Heavy Background Load”)
I measured the alter table completion times for all cases, as well as the overhead generated by the alter (in other words, how much peak throughput is reduced by running alter table through the tools).
Idle Load
For the Idle Load test, pt-online-schema-change completed nearly twice as fast as gh-ost. This was a big surprise for me. I haven’t looked into the reasons or details yet, though I can see most of the CPU usage for gh-ost is on the MySQL server side. Perhaps the differences relate to the SQL used to perform non-blocking alter tables.
Light Background Load
I generated the Light Background Load by running the sysbench command below. It corresponds to a roughly 4% load, as the system can handle some 2500 transactions/sec at this concurrency under full load. Adjust the
--rate
value to scale it for your system.
time sysbench --threads=40 --rate=100 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run
The numbers changed (as expected), but pt-online-schema-change is still approximately twice as fast as gh-ost.
What is really interesting in this case is how a relatively light background load affects the process completion time. It took both pt-online-schema-change and gh-ost about 2.7x times longer to finish!
Heavy Background Load
I generated the Heavy Background Load running the sysbench command below. It corresponds to a roughly 40% load, as the system can handle some 2500 transactions/sec at this concurrency under full load. Adjust
--rate
value to scale it for your system.
time sysbench --threads=40 --rate=1000 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run
What happened in this case? When the load gets higher, gh-ost can’t keep up with binary log processing, and just never finishes at all. While this may be surprising at first, it makes sense if you think more about how these tools work. pt-online-schema-change uses triggers, and while they have a lot of limitations and overhead they can execute in parallel. gh-ost, on the other hand, processes the binary log in a single thread and might not be able to keep up.
In MySQL 5.6 we didn’t have parallel replication, which applies writes to the same table in parallel. For that version the gh-ost limitation probably isn’t as big a deal, as such a heavy load would also cause replication lag. MySQL 5.7 has parallel replication. This makes it much easier to quickly replicate workloads that are too heavy for gh-ost to handle.
I should note that the workload being simulated in this benchmark is a rather extreme case. The table being altered by gh-ost here is at the same time handling a background load so high it can’t be replicated in a single thread.
Future versions of gh-ost could improve this issue by applying binlog events in parallel, similar to what MySQL replicas do.
An excerpt from the gh-ost log shows how it is totally backed up trying to apply the binary log:
root@rocky:/tmp# time ./gh-ost --user="sbtest" --password="sbtest" --host=localhost --allow-on-master --database="sbtest" --table="sbtest1" --alter="ADD COLUMN c1 INT" --execute 2017/06/25 19:16:05 binlogsyncer.go:75: [info] create BinlogSyncer with config &{99999 mysql localhost 3306 sbtest sbtest false false <nil>} 2017/06/25 19:16:05 binlogsyncer.go:241: [info] begin to sync binlog from position (rocky-bin.000018, 640881773) 2017/06/25 19:16:05 binlogsyncer.go:134: [info] register slave for master server localhost:3306 2017/06/25 19:16:05 binlogsyncer.go:568: [info] rotate to (rocky-bin.000018, 640881773) 2017-06-25 19:16:05 ERROR parsing time "" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "" as "2006" # Migrating `sbtest`.`sbtest1`; Ghost table is `sbtest`.`_sbtest1_gho` # Migrating rocky:3306; inspecting rocky:3306; executing on rocky # Migration started at Sun Jun 25 19:16:05 -0400 2017 # chunk-size: 1000; max-lag-millis: 1500ms; max-load: ; critical-load: ; nice-ratio: 0.000000 # throttle-additional-flag-file: /tmp/gh-ost.throttle # Serving on unix socket: /tmp/gh-ost.sbtest.sbtest1.sock Copy: 0/9872432 0.0%; Applied: 0; Backlog: 0/100; Time: 0s(total), 0s(copy); streamer: rocky-bin.000018:641578191; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 0; Backlog: 100/100; Time: 1s(total), 1s(copy); streamer: rocky-bin.000018:641626699; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 640; Backlog: 100/100; Time: 2s(total), 2s(copy); streamer: rocky-bin.000018:641896215; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 1310; Backlog: 100/100; Time: 3s(total), 3s(copy); streamer: rocky-bin.000018:642178659; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 1920; Backlog: 100/100; Time: 4s(total), 4s(copy); streamer: rocky-bin.000018:642436043; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 2600; Backlog: 100/100; Time: 5s(total), 5s(copy); streamer: rocky-bin.000018:642722777; State: ... Copy: 0/9872432 0.0%; Applied: 120240; Backlog: 100/100; Time: 3m0s(total), 3m0s(copy); streamer: rocky-bin.000018:694142377; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 140330; Backlog: 100/100; Time: 3m30s(total), 3m30s(copy); streamer: rocky-bin.000018:702948219; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 160450; Backlog: 100/100; Time: 4m0s(total), 4m0s(copy); streamer: rocky-bin.000018:711775662; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 180600; Backlog: 100/100; Time: 4m30s(total), 4m30s(copy); streamer: rocky-bin.000018:720626338; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 200770; Backlog: 100/100; Time: 5m0s(total), 5m0s(copy); streamer: rocky-bin.000018:729509960; State: migrating; ETA: N/A
Online Schema Change Performance Impact
For this test I started the alter table, waited 60 seconds and then ran sysbench at full speed for five minutes. Then I measured how much the performance was impacted by running the tool:
sysbench --threads=40 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=300 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run
As we can see, gh-ost has negligible overhead in this case. pt-online-schema-change on the other hand, had peformance reduced by 12%. It is worth noting though that pt-online-schema-change still makes progress in this case (though slowly), while gh-ost would never complete.
If anything, I was surprised at how little impact the pt-online-schema-change run had on sysbench performance.
It’s important to note that in this case we only measured the overhead for the “copy” stage of the online schema change. Another thing you should worry about is the impact to performance during “table rotation” (which I have not measured).
Summary
While gh-ost introduces a number of design advantages, and gives better results in some situation, I wouldn’t call it always superior the tried and true pt-online-schema-change. At least in some cases, pt-online-schema-change offers better performance than gh-ost and completes a schema change when gh-ost is unable to keep up. Consider trying out both tools and see what works best in your situation.