The tech preview version of the pg_tde extension for PostgreSQL 16 was released on March 28th, with many improvements compared to our previous MVP release. In addition to lots of bug fixes, the new release supports multi-tenancy and key rotation with a simplified configuration mechanism. Note that the Tech Preview release is not recommended for […]
19
2023
Using the Transparent Data Encryption Extension PG_TDE With PostgreSQL
PG_TDE (PostgreSQL Transparent Data Encryption) is an experimental transparent data encryption access method for PostgreSQL 16, available on GitHub as an extension for vanilla PostgreSQL. Since our previous blog post, Adding Transparent Data Encryption to PostgreSQL with pg_tde: Please Test, it received new features such as TOAST data encryption and HashiCorp Vault support for key storage. Today, we’d like to share more details about the extension, including information about its internal architecture.
Deploying PostgreSQL Transparent Data Encryption
PG_TDE is available as an easy-to-use Docker image with everything set up and ready to be used, Ubuntu 22.04 deb package, or as source code on github. The simplest way to run it for testing is to use the docker container:
docker run --name pg-tde -e POSTGRES_PASSWORD=mysecretpassword -d perconalab/postgres-tde-ext
This command starts a container with the extension loaded and already enabled for the Postgres database. Using this, we can create two tables, one encrypted and one using the default access method:
docker run -it --rm postgres psql -h 172.17.0.2 -U postgres CREATE TABLE test_enc (id SERIAL, t VARCHAR(32), PRIMARY KEY(id)) USING pg_tde; INSERT INTO test_enc (t) VALUES ('foobar'), ('barfoo'); CREATE TABLE test_basic (id SERIAL, t VARCHAR(32), PRIMARY KEY(id)); INSERT INTO test_basic (t) VALUES ('foobar'), ('barfoo'); SELECT * FROM test_enc; id | t ----+-------- 1 | fooba 2 | barfoo ----+-------- 1 | foobar 2 | barfoo (2 rows) SELECT to_regclass('test_enc')::oid; -- 16397 SELECT to_regclass('test_basic')::oid; -- 16404
Then, we can use the docker cp command to retrieve and check the data files from the container:
mkdir /tmp/tdedata docker cp pg-tde:/var/lib/postgresql /tmp/tdedata ls -l /tmp/tdedata/postgresql/data/base/5/16404* -rw------- 1 pg pg 8192 dec 11 21:49 /tmp/tdedata/postgresql/data/base/5/16404 ls -l /tmp/tdedata/postgresql/data/base/5/16397* -rw------- 1 pg pg 8192 dec 11 21:49 /tmp/tdedata/postgresql/data/base/5/16397 -rw------- 1 pg pg 304 dec 11 21:47 /tmp/tdedata/postgresql/data/base/5/16397.tde strings /tmp/tdedata/postgresql/data/base/5/16404 barfoo foobar strings /tmp/tdedata/postgresql/data/base/5/16397 # no output
The above commands show us that:
- The strings “foobar” and “barfoo”, which are rows in the table, are clearly visible in the unencrypted table but can’t be found as-is in the encrypted table.
- Compared to the basic unencrypted table, the encrypted table has one additional fork ending with the “.tde” extension, which contains encryption metadata.
How data is encrypted
Going one step further, we can run hexdiff on the two datafiles and look for differences:
hexdiff /tmp/tdedata/postgresql/data/base/5/16397 /tmp/tdedata/postgresql/data/base/5/16404 Visuel HexDiff v 0.0.53 by tTh 2007 dec 7bits 8000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8016 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8032 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8048 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8064 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8096 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8112 ed 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8128 02 00 02 00 02 09 18 00 3f 72 99 0c ad 72 c4 96 ?r r 8144 af 2d 1c 00 00 00 00 00 ed 02 00 00 00 00 00 00 - 8160 00 00 00 00 00 00 00 00 01 00 02 00 02 09 18 00 8176 ae 7b da 47 b6 4c 69 e6 0a e7 cd 00 00 00 00 00 { G Li 8192 8208 8224 ** /tmp/tdedata/postgresql/data/base/5/16397 8192 8000 97% 8000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8016 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8032 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8048 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8064 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8096 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8112 ef 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8128 02 00 02 00 02 08 18 00 02 00 00 00 0f 62 61 72 bar 8144 66 6f 6f 00 00 00 00 00 ef 02 00 00 00 00 00 00 foo 8160 00 00 00 00 00 00 00 00 01 00 02 00 02 08 18 00 8176 01 00 00 00 0f 66 6f 6f 62 61 72 00 00 00 00 00 foobar 8192 8208 8224 /tmp/tdedata/postgresql/data/base/5/16404 8192 8000 97%
This command shows that the data files have only two differences: the page checksum is different, and later parts of the tuples are different too – shown in the above output.
But it’s also important to note that not everything in the tuple structure is different: only the actual tuple data gets encrypted, and the tuple headers remain unencrypted. As these fields do not contain sensitive data, this is not a security issue and helps with the performance and simplicity of the extension. Our PG_TDE implementation is based on the original heap engine of PostgreSQL, with as few modifications as possible to encrypt the contents of the tables.
For this reason, compared to the commonly used page-level encryption of other database servers, where entire pages are encrypted and decrypted during disk IO, our approach decrypts/encrypts tuples (table rows) when they are accessed or written.
Database keys and encryption algorithms
To encrypt or decrypt any row or part of any row within a table, the encryption in PG_TDE needs the ability to work on any length of data at any offset without changing anything before or after the specific row. To accomplish this, PG_TDE uses the AES-CTR encryption algorithm. To ensure that the extension never reuses the same (key, offset) pairs, every table has its own encryption key. This, with some additional information, is what gets stored in the .tde file:
hexdump -C /tmp/tdedata/postgresql/data/base/5/16397.tde 00000000 6d 61 73 74 65 72 2d 6b 65 79 00 00 00 00 00 00 |master-key......| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000100 01 00 00 00 00 00 00 00 54 af ea f2 21 f0 80 78 |........T...!..x| 00000110 69 f7 bc e5 19 b7 5f 60 00 00 00 00 00 00 00 00 |i....._`........| 00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000130
The above output shows that the TDE file itself is encrypted using a different encryption key, named “master-key”, and then contains the encryption information for the table itself, encrypted using the previously mentioned key. This key is stored in an external keyring, possibly at a separate location, to increase security. Pg_tde currently supports a local file-based key storage intended only for development and storage using Hashicorp Vault. In the future, we plan to add support for other services, too, for example, the KMIP protocol.
Providing feedback on PG_TDE
Percona wants to make PG_TDE your choice of TDE encryption, and for that, we need as many people testing and providing feedback as possible.
Follow the directions above or on the Github repository. Please let us know what you like and what you dislike about PG_TDE, and let Percona know about any issues you discover. Tell us what additional tooling about PG_TDE you would like to have, as well as any other feedback.
This is open source software, and the old adage about having many eyeballs on the code to ensure its quality is applicable here, as Percona wants your input.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.