Changing the data type on a column in a big table is almost always a pain. It can lock out the entire table for reading and writing because of the exclusive lock acquired by the ALTER TABLE statement that changes the data type. In this blog post, we will look at one possibility of performing such an operation with minimal impact, with minimized locks. It works for any data type; let’s look at Integer to Big Integer change as an example.
Example use case
Let’s say we have a table with many columns. One of them is of a type Integer (INT, 4 bytes) where values were incrementally growing over time. After a while, we started getting messages in the log ERROR: integer out of range, which means that the value we are trying to insert is bigger than the integer limit (Max value 2147483647). The easy choice would be to change its type to Big Integer (BIGINT, 8 bytes). Here is how we can accomplish this almost “online.”
First, we will need to create a new column with the new data type. The new column will be created empty, so it should be done in milliseconds.
ALTER TABLE big_table ADD COLUMN order_id_tmp bigint;
To ensure it is not causing too many locks, we can write a small SQL script that will timeout the command if it takes too long to complete, i.e.
cat <<EOF > alter.sql SET statement_timeout = 100; ALTER TABLE big_table ADD COLUMN order_id_tmp bigint; EOF while true do date psql -qX -v ON_ERROR_STOP=1 -f alter.sql && break sleep 1 done
Statement_timeout will terminate ALTER TABLE command if it runs for more than 100 milliseconds, and because of ON_ERROR_STOP added to psql parameters, its process will exit with an error. Therefore “break” will not be executed after the timeout because the && operator is expecting the exit code from the first part to be 0, which means success, before executing the next one.
Next, we will have to create a function and a trigger that will copy the value of order_id from newly inserted and updated rows to order_id_tmp.
CREATE FUNCTION order_id_bigint() RETURNS TRIGGER AS $BODY$ BEGIN NEW.order_id_tmp=NEW.order_id; RETURN NEW; END $BODY$ LANGUAGE PLPGSQL; CREATE TRIGGER order_id_update_or_insert BEFORE INSERT OR UPDATE ON big_table FOR EACH ROW EXECUTE PROCEDURE order_id_bigint();
And we are ready to populate the new column with the data. For tables that are bigger in size, we can create a support table to help with this progress. It should contain the Primary Key, in this case, column “ID” of big_table, for all rows that have the new bigint column empty.
CREATE TABLE public.temp_order_id_bigint( id bigint NOT NULL ); CREATE INDEX id_sorted ON public.temp_order_id_bigint USING btree(id ASC) ; INSERT INTO public.temp_order_id_bigint_change SELECT id FROM big_table WHERE (order_id IS NOT NULL AND order_id_tmp IS NULL);
We will be populating the data in chunks, as Postgres does not like massive updates because of how MVCC (Multi-Version Concurrency Control) mechanism and autovacuum works. Update would update all tuples at once, would cause huge bloat, and the table would grow double in size.
START TRANSACTION; UPDATE public.big_table SET order_id_tmp = order_id WHERE id IN ( SELECT id FROM public.temp_order_id_bigint ORDER BY id ASC LIMIT 5000 ); delete from public.temp_order_id_bigint where id in ( SELECT id FROM public.temp_order_id_bigint ORDER BY id ASC LIMIT 5000 ); COMMIT;
Run the above transaction in a loop until the temporary table, temp_order_id_bigint, is empty.
Use LIMIT that works for you best; depending on row size, you may achieve better results with lower or higher values. In my test case, 5000 was optimal, and every 5k rows was finished in less than 300ms.
Once the data is populated, we are ready to perform column switchover. We can use a similar approach with the while loop, as previously mentioned. It will make sure that the lock is not waiting and blocking other sessions for too long.
BEGIN; LOCK TABLE big_table IN SHARE ROW EXCLUSIVE MODE; DROP TRIGGER order_id_update_or_insert ON big_table; ALTER TABLE big_table RENAME COLUMN order_id TO order_id_old; ALTER TABLE big_table RENAME COLUMN order_id_tmp TO order_id; COMMIT;
Et voila, we just changed the column from Integer to Big Integer with less than one second of locking. The operation itself could take much longer than just altering the data type of the existing column because of all the preparation work, data population, etc. But we avoided the downtime or maintenance window that would be required using the standard approach.
Enjoy!
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.