Aug
10
2021
--

PostgreSQL PL/Java – A How-To, Part 2: How to Manipulate and Return Tuples

PostgreSQL PL:Java Manipulate Return Tuples

We discussed how to install and create a simple class in the first part of this series where we ran a SELECT and returned one row with one column with a formatted text. Now it’s time to expand and see how to return multiple tuples.
A little disclaimer here; I’m not going to comment much on the Java code because this is not intended to be a Java tutorial. The examples here are just for educational purposes, not intended to be of high performance or used in production!

Returning a Table Structure

This first example will show how we can select and return a table from a PL/Java function. We’ll keep using the table “customer” here and the probing SQL will be:
SELECT * FROM customer LIMIT 10;

I will create a new Java class here called “CustomerResultSet” and the initial code is:
package com.percona.blog.pljava;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.postgresql.pljava.ResultSetHandle;

public class CustomerResultSet implements ResultSetHandle {
	private Connection conn;
	private PreparedStatement stmt;
	private final String m_url = "jdbc:default:connection";
	private final String sql = "select * FROM customer LIMIT 10";

	public CustomerResultSet() throws SQLException {
		conn = DriverManager.getConnection(m_url);
		stmt = conn.prepareStatement(sql);
	}

	@Override
	public void close() throws SQLException {
		stmt.close();
		conn.close();
	}

	@Override
	public ResultSet getResultSet() throws SQLException {
		return stmt.executeQuery();
	}

	public static ResultSetHandle getCustomerPayments() throws SQLException {
		return new CustomerResultSet();
	}
}

Note that we are implementing the org.postgresql.pljava.ResultSetHandle interface provided by PL/Java. We need it because we are returning a complex object and the ResultSetHandle interface is appropriated when we don’t need to manipulate the returned tuples.

Now that we are using PL/Java objects we need to tell the compiler where to find those references and for this first example here we need the pljava-api jar, which in my case happens to be pljava-api-1.6.2.jar. If you remember from the first post I’ve compiled, the PL/Java I’m using here and my JAR file is located at “~/pljava-1_6_2/pljava-api/target/pljava-api-1.6.2.jar” and the compilation command will be:

javac -cp "~/pljava-1_6_2/pljava-api/target/pljava-api-1.6.2.jar" com/percona/blog/pljava/CustomerResultSet.java
jar -c -f /app/pg12/lib/pljavaPart2.jar com/percona/blog/pljava/CustomerResultSet.class

With my new JAR file created, I can then install it into Postgres and create the function “getCustomerLimit10()“:

SELECT sqlj.install_jar( 'file:///app/pg12/lib/pljavaPart2.jar', 'pljavaPart2', true );
SELECT sqlj.set_classpath( 'public', 'pljavaPart2' );
CREATE OR REPLACE FUNCTION getCustomerLimit10() RETURNS SETOF customer AS 'com.percona.blog.pljava.CustomerResultSet.getCustomerLimit10' LANGUAGE java;

The result of the function call is:
test=# SELECT * FROM getCustomerLimit10();
 customer_id | store_id | first_name | last_name |                email                | address_id | activebool | create_date |     last_update     | active 
-------------+----------+------------+-----------+-------------------------------------+------------+------------+-------------+---------------------+--------
           1 |        1 | MARY       | SMITH     | MARY.SMITH@sakilacustomer.org       |          5 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           2 |        1 | PATRICIA   | JOHNSON   | PATRICIA.JOHNSON@sakilacustomer.org |          6 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           3 |        1 | LINDA      | WILLIAMS  | LINDA.WILLIAMS@sakilacustomer.org   |          7 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           4 |        2 | BARBARA    | JONES     | BARBARA.JONES@sakilacustomer.org    |          8 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           5 |        1 | ELIZABETH  | BROWN     | ELIZABETH.BROWN@sakilacustomer.org  |          9 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           6 |        2 | JENNIFER   | DAVIS     | JENNIFER.DAVIS@sakilacustomer.org   |         10 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           7 |        1 | MARIA      | MILLER    | MARIA.MILLER@sakilacustomer.org     |         11 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           8 |        2 | SUSAN      | WILSON    | SUSAN.WILSON@sakilacustomer.org     |         12 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           9 |        2 | MARGARET   | MOORE     | MARGARET.MOORE@sakilacustomer.org   |         13 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
          10 |        1 | DOROTHY    | TAYLOR    | DOROTHY.TAYLOR@sakilacustomer.org   |         14 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
(10 rows)

test=#

Manipulating the Result Before Returning

Returning the result of a plain SQL has its usage like visibility/permissioning control, but we usually need to manipulate the results of a query before returning, and to do this we can implement the interface “org.postgresql.pljava.ResultSetProvider“.
I will implement a simple method to anonymize sensitive data with a hash function in the following example. I’ll also create a helper class to deal with the hash and cryptographic functions to keep the CustomerResultSet class clean:
/**
 * Crypto helper class that will contain all hashing and cryptographic functions
 */
package com.percona.blog.pljava;

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class Crypto {
	MessageDigest digest;
	
	public Crypto() throws NoSuchAlgorithmException {
		digest = MessageDigest.getInstance("SHA-256");
	}
	
	public String bytesToHex(byte[] hash) {
		StringBuilder hexString = new StringBuilder(2 * hash.length);
		for (int i = 0; i < hash.length; i++) {
			String hex = Integer.toHexString(0xff & hash[i]);
			if (hex.length() == 1) {
				hexString.append('0');
			}
			hexString.append(hex);
		}
		return hexString.toString();
	}
	
	public String encode(String data, int min, int max) {
		double salt = Math.random();
		int sbstring = (int) ((Math.random() * ((max - min) + 1)) + min);

		return bytesToHex(digest.digest((data + salt).getBytes(StandardCharsets.UTF_8))).substring(0, sbstring);
	}
}

/**
 * CustomerHash class
 */
package com.percona.blog.pljava;

import java.security.NoSuchAlgorithmException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;

import org.postgresql.pljava.ResultSetProvider;

public class CustomerHash implements ResultSetProvider {
	private final Connection conn;
	private final PreparedStatement stmt;
	private final ResultSet rs;
	private final Crypto crypto;
	
	private final String m_url = "jdbc:default:connection";

	public CustomerHash(int id) throws SQLException, NoSuchAlgorithmException {
		String query;
		
		crypto = new Crypto();
		query = "SELECT * FROM customer WHERE customer_id = ?";
		conn = DriverManager.getConnection(m_url);
		stmt = conn.prepareStatement(query);
		stmt.setInt(1, id);
		rs = stmt.executeQuery();
	}
	
	@Override
	public void close() throws SQLException {
		stmt.close();
		conn.close();
	}
	
	@Override
	public boolean assignRowValues(ResultSet receiver, int currentRow) throws SQLException {		
		if (!rs.next())
			return false;
		
		try {
			receiver.updateInt(1, rs.getInt("customer_id"));
			receiver.updateInt(2, rs.getInt("store_id"));
			receiver.updateString(3, crypto.encode(rs.getString("first_name"), 5, 45));
			receiver.updateString(4, crypto.encode(rs.getString("last_name"), 5, 45));
			receiver.updateString(5, crypto.encode(rs.getString("email"), 5, 41) + "@mail.com");
			receiver.updateInt(6, rs.getInt("address_id"));
			receiver.updateBoolean(7, rs.getBoolean("activebool"));
			receiver.updateDate(8, rs.getDate("create_date"));
			receiver.updateTimestamp(9, rs.getTimestamp("last_update"));
			receiver.updateInt(10, rs.getInt("active"));
			
		} catch (Exception e) {
			Logger.getAnonymousLogger().log(Level.parse("SEVERE"), e.getMessage());
		}
		return true;
	}
	
	public static ResultSetProvider getCustomerAnonymized(int id) throws SQLException, NoSuchAlgorithmException {
		return new CustomerHash(id);
	}

}

The number of classes is increasing, so instead of mentioning them one by one let’s just use the “.java” to build the classes and the “.class” to create the jar:
javac -cp "~/pljava-1_6_2/build/pljava-api-1.6.2.jar" com/percona/blog/pljava/*.java
jar -c -f /app/pg12/lib/pljavaPart2.jar com/percona/blog/pljava/*.class

Remember that every time we change our JAR file we need to also reload it into Postgres. Check the next example and you’ll see that I’m reloading the JAR file, creating and testing our new function/method:
test=# SELECT sqlj.replace_jar( 'file:///app/pg12/lib/pljavaPart2.jar', 'pljavaPart2', true );
 replace_jar 
-------------
 
(1 row)

test=# CREATE OR REPLACE FUNCTION getCustomerAnonymized(int) RETURNS SETOF customer AS 'com.percona.blog.pljava.CustomerHash.getCustomerAnonymized' LANGUAGE java;
CREATE FUNCTION

test=# SELECT * FROM getCustomerAnonymized(9);
 customer_id | store_id |     first_name      |              last_name              |                  email                  | address_id | activebool | create_date |     last_update     | ac
tive 
-------------+----------+---------------------+-------------------------------------+-----------------------------------------+------------+------------+-------------+---------------------+---
-----
           9 |        2 | 72e2616ef0075e81929 | 3559c00ee546ae0062460c8faa4f24960f1 | 24854ed40ed42b57f077cb1cfaf916@mail.com |         13 | t          | 2006-02-14  | 2006-02-15 09:57:20 |   
   1
(1 row)

test=#

Great! We now have a method to anonymize data!

Triggers

The last topic of this second part will be about “triggers”, and to make it a bit more interesting we will create a trigger to encrypt the sensitive data of our table. The anonymization using the hash function in the previous example is great, but what happens if we have unauthorized access to the database? The data is saved in plain text!
To make this example as small as possible I won’t bother with securing the keys, as we will do it in part three of this series when we’ll use Java to access external resources using Vault to secure our keys, so stay tuned!
Ok, the first thing we need to do is to create the pair of keys we need to encrypt/decrypt our data. I’ll use “OpenSSL” to create them and gonna store them into a table named “keys”!
openssl genrsa -out keypair.pem 2048
openssl pkcs8 -topk8 -nocrypt -in keypair.pem -outform PEM -out private.pem
openssl rsa -in keypair.pem -outform PEM -pubout -out public.pem

Now that we have the keys we need to sanitize the key files to remove the header and footer data from both the private and public keys, and also remove all break-lines, or else our Java code will complain:
echo -n "CREATE TABLE keys(id int primary key, priv varchar not null, pub varchar not null); INSERT INTO keys VALUES(1, '" > keys.sql
cat private.pem | sed '1d;$d' | sed ':a;N;$!ba;s/\n//g' | tr -d '\n' >> keys.sql
echo -n "', '" >> keys.sql
cat public.pem | sed '1d;$d' | sed ':a;N;$!ba;s/\n//g' | tr -d '\n' >> keys.sql
echo -n "');" >> keys.sql

psql test < keys.sql

It will look like this when sanitized:
CREATE TABLE keys(id int primary key, priv varchar not null, pub varchar not null); INSERT INTO keys VALUES(1, 'MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQCiAA4BE64JZpXwIGfsUanyL//drIcFZ1cmiCW6zWOxc6nL8AQ33MPyQup8g/ociJFGn/eEEYOvRMV2pVNo3qB3VQU4WHRWkq22x7mRfuhHmAnAJA3dic5fiJ1aCQgo7tEqlGPc0WqL+jMUXh6Wmktq1kDZagUGJRorw0f5Iaj60PycbGtgKaKDc4VHepkN1jl0rhpJBzjBehuvB88LLXJ/cHsMOp3q569jLsHtqymCA2wP68ldtfKtOowPW9togIUmgWY0Z2lWlefrlzmT2g3L/oYbPUxCmptOAMFD8NajdA518ohZAC8SPfUsD4CwL89oPrMZlX4RkTuc5UvBHiKrAgMBAAECggEAcJl5ImZ7YS1cqjrcAPYCGcQjJAD3GFpryOx4zQ5VbNHoA0ggpnNb/tdkBIf3ID4MO/qUH8fMr9YtKfpfr1SOVGNT7YYN1t68v36zDN4YtSqIHHTy7jkKqHxcYmhEs67K072wa5tjY0fUmSOSPzufj/K7wGJge5TuS9y/+fnbafkdfW/yz3X2YXL6T/jfjqI4h+P7Nhh5hlpD1KZfEWTAY5B5tBoLc4xaTIB8FTLclVWw3CAW8h60EwUAkyxCSbrP2I1FCrWsV6hJGy8U+hUQJUpyDdum9ZC1oAVewRrCkSH0ZP1XaQifDZoRv/1N7cCbQqaLJaVk4rzVOEv0DoCEAQKBgQDOMPMm2ioEredCx0hfmHWWayGA5as7VPDSzv1QH3g4AdjZFf2YxctXGNJNMpfqVvFiQCWxp9NpjUPpODRbmR2J+7tYgx3B445zDeXdBH2JTKhUgNTHjL6DrM6FTI3yaSsSJ77L0mDcFQ42nfWtfqkZd5lYfwiVC0eL86bp408+YQKBgQDJIks6RqVlDbHerIIqT1ToN+CQ+BCjx/Z6sk4BFIKBB8sU8VyVkZlvQpFHvT06oE/1NbyiQ3nVufGrm0kwMqx7MXGiA670E1Q+Q/mQ12uRByUlgd+LW4mp1Y6tln1lpP5pVqUOC/jtnXYQmEReU4Ye24E4AZhFU23J+oYoh3XEiwKBgEJFaWFrbWXjnxjPhGt1TRXziOks6ERBoMWg0boW40TdEx1y+/dGW3y69ZzqTfl7yEmT5ImdL04VoWYsMmfeZqgayLRCMCZJRVeld+P5tX+Tq+a9Iaahjfo0aIxfdqAbPUSwkZphG9Cg09iqHHSO6TrOPfM7oT6GSZCp11QFQ0sBAoGAeABi+8D8mx8hmWY5Pv8X/HiiHjwyyVTbpPbO/Wv8NPmuW69per9k2PHRdgjdCCZvrjBCfFlfznljS+yZLQ1+xP2J+4zRDESgBYpO0vED94JY0lj7Q8z4hICq4Lyh0kwvki+kyI2yFirVLy950wFoSu7R2NVywSH2pgQ3mOTBCeMCgYBL5KIRf1qwsCYaCggPls4pWKMjfxxO915h26/aaniEYaTNnhXRSRwkVOWoGHoUKfrqQdrvj/y5lgezn7mZM0CvnB6ZkGwDXxpcIYUnhR1Lnp3HNSqfigg+WjQASVCKuq3YUri3p+KQkrpED/O3B4FJW2Q4IReEuREEsKNkeH96ew==', 'MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAogAOAROuCWaV8CBn7FGp8i//3ayHBWdXJoglus1jsXOpy/AEN9zD8kLqfIP6HIiRRp/3hBGDr0TFdqVTaN6gd1UFOFh0VpKttse5kX7oR5gJwCQN3YnOX4idWgkIKO7RKpRj3NFqi/ozFF4elppLatZA2WoFBiUaK8NH+SGo+tD8nGxrYCmig3OFR3qZDdY5dK4aSQc4wXobrwfPCy1yf3B7DDqd6uevYy7B7aspggNsD+vJXbXyrTqMD1vbaICFJoFmNGdpVpXn65c5k9oNy/6GGz1MQpqbTgDBQ/DWo3QOdfKIWQAvEj31LA+AsC/PaD6zGZV+EZE7nOVLwR4iqwIDAQAB');

After done with populating the table we should have a nice table with both private and public keys. Now is the time to create our Java classes. I’ll reuse the “Crypto” class for the cryptographic functions and create a new class to add our trigger functions. I’ll only add the relevant part of the Crypto class here, but you can find the code described here on my GitHub page here[1] including Part One (and Part Three) when released. Let’s get to the code:
/**
 * This is the relevant part of the Crypto class that will encrypt and decrypt our data using the certificates we generated above.
 */
	public PublicKey getPublicKey(String base64PublicKey) {
		PublicKey publicKey = null;
		try {
			X509EncodedKeySpec keySpec = new X509EncodedKeySpec(Base64.getDecoder().decode(base64PublicKey.getBytes()));
			KeyFactory keyFactory = KeyFactory.getInstance("RSA");
			publicKey = keyFactory.generatePublic(keySpec);
			return publicKey;
		} catch (NoSuchAlgorithmException e) {
			e.printStackTrace();
		} catch (InvalidKeySpecException e) {
			e.printStackTrace();
		}
		return publicKey;
	}

	public PrivateKey getPrivateKey(String base64PrivateKey) {
		PrivateKey privateKey = null;
		PKCS8EncodedKeySpec keySpec = new PKCS8EncodedKeySpec(Base64.getDecoder().decode(base64PrivateKey.getBytes()));
		KeyFactory keyFactory = null;
		try {
			keyFactory = KeyFactory.getInstance("RSA");
		} catch (NoSuchAlgorithmException e) {
			e.printStackTrace();
		}
		try {
			privateKey = keyFactory.generatePrivate(keySpec);
		} catch (InvalidKeySpecException e) {
			e.printStackTrace();
		}
		return privateKey;
	}

	public String encrypt(String data, PublicKey publicKey) throws BadPaddingException, IllegalBlockSizeException,
			InvalidKeyException, NoSuchPaddingException, NoSuchAlgorithmException {
		Cipher cipher = Cipher.getInstance("RSA/ECB/PKCS1Padding");
		cipher.init(Cipher.ENCRYPT_MODE, publicKey);
		return Base64.getEncoder().encodeToString(cipher.doFinal(data.getBytes()));
	}
	
	public String decrypt(String data, PrivateKey privateKey) throws NoSuchPaddingException,
			NoSuchAlgorithmException, InvalidKeyException, BadPaddingException, IllegalBlockSizeException {
		Cipher cipher = Cipher.getInstance("RSA/ECB/PKCS1Padding");
		cipher.init(Cipher.DECRYPT_MODE, privateKey);
		return new String(cipher.doFinal(Base64.getDecoder().decode(data)));
	}

Now we can implement the class with both functions – the trigger function to encrypt and a function to decrypt when we need to SELECT the data:
package com.percona.blog.pljava;

import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.security.PrivateKey;
import java.security.PublicKey;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.crypto.BadPaddingException;
import javax.crypto.IllegalBlockSizeException;
import javax.crypto.NoSuchPaddingException;

import org.postgresql.pljava.ResultSetProvider;
import org.postgresql.pljava.TriggerData;

public class CustomerCrypto implements ResultSetProvider {
	private final String m_url = "jdbc:default:connection";
	private final Connection conn;
	private PreparedStatement stmt;
	private ResultSet rs;

	//
	private PrivateKey privateKey;
	private PublicKey publicKey;

	public CustomerCrypto() throws SQLException, NoSuchAlgorithmException {
		String query;

		query = "SELECT * FROM keys WHERE id = 1";
		conn = DriverManager.getConnection(m_url);
		stmt = conn.prepareStatement(query);
		rs = stmt.executeQuery();
		if (!rs.next())
			throw new SQLException("Keys not found!");

		privateKey = Crypto.getPrivateKey(rs.getString("priv"));
		publicKey = Crypto.getPublicKey(rs.getString("pub"));
	}
	
	public void processQuery(int id) throws SQLException, NoSuchAlgorithmException {
		String query;
		query = "SELECT * FROM customer WHERE customer_id = ?";
		stmt = conn.prepareStatement(query);
		stmt.setInt(1, id);
		rs = stmt.executeQuery();
	}

	@Override
	public void close() throws SQLException {
		stmt.close();
		conn.close();
	}

	public static int getLineNumber() {
		return Thread.currentThread().getStackTrace()[2].getLineNumber();
	}

	@Override
	public boolean assignRowValues(ResultSet receiver, int currentRow) throws SQLException {
		if (!rs.next())
			return false;

		try {
			receiver.updateInt(1, rs.getInt("customer_id"));
			receiver.updateInt(2, rs.getInt("store_id"));
			receiver.updateString(3, Crypto.decrypt(rs.getString("first_name"), this.privateKey));
			receiver.updateString(4, Crypto.decrypt(rs.getString("last_name"), this.privateKey));
			receiver.updateString(5, Crypto.decrypt(rs.getString("email"), this.privateKey));
			receiver.updateInt(6, rs.getInt("address_id"));
			receiver.updateBoolean(7, rs.getBoolean("activebool"));
			receiver.updateDate(8, rs.getDate("create_date"));
			receiver.updateTimestamp(9, rs.getTimestamp("last_update"));
			receiver.updateInt(10, rs.getInt("active"));

		} catch (Exception e) {
			Logger.getAnonymousLogger().log(Level.parse("SEVERE"), e.getMessage());
		}
		return true;
	}
	
	private void encryptData(TriggerData td) throws InvalidKeyException, BadPaddingException, IllegalBlockSizeException, NoSuchPaddingException, NoSuchAlgorithmException, SQLException {
		ResultSet _new = td.getNew();
		
		_new.updateString("first_name", Crypto.encrypt(_new.getString("first_name"), this.publicKey));
		_new.updateString("last_name", Crypto.encrypt(_new.getString("last_name"), this.publicKey));
		_new.updateString("email", Crypto.encrypt(_new.getString("email"), this.publicKey));
	}
	
	public static void customerBeforeInsertUpdate(TriggerData td) throws SQLException, InvalidKeyException, BadPaddingException, IllegalBlockSizeException, NoSuchPaddingException, NoSuchAlgorithmException {
		CustomerCrypto ret = new CustomerCrypto();
		ret.encryptData(td);
	}

	public static ResultSetProvider getCustomerCrypto(int id) throws SQLException, NoSuchAlgorithmException {
		CustomerCrypto ret = new CustomerCrypto();
		ret.processQuery(id);
		
		return ret;
	}

}

The relevant parts of the code above are the “customerBeforeInsertUpdate” and “encryptData” methods, the former being the static method the database will access. The PL/Java on Postgres expects to find a static method with “void (TriggerData)” signature. It will call the “encryptData” method of the “CustomerCrypto” object to do the job. The “encryptData” method will recover the resultset from the “NEW” pointer that is passed through the “TriggerData” object and then change the value to crypt the data. We need to call the trigger in the “BEFORE” event because we need to crypt it before it is persisted.
Another important method is the “getCustomerCrypto“. We need to be able to get the data decrypted and this method will help us. Here, we use the same technique we used in the previous example where we implemented the “ResultSetProvider” interface and manipulated the data before returning the resultset. Take a closer look at the “assignRowValues” method and you’ll see that we are decrypting the data there with “Crypto.decrypt” method!
Ok, time to compile the code and check if it really works:
javac -cp "/v01/proj/percona/blog/pljava/pljava-1_6_2/build/pljava-api-1.6.2.jar" com/percona/blog/pljava/*.java
jar -c -f /app/pg12/lib/pljavaPart2.jar com/percona/blog/pljava/*.class

And create the database objects:
SELECT sqlj.replace_jar( 'file:///app/pg12/lib/pljavaPart2.jar', 'pljavaPart2', true );

CREATE FUNCTION customerBeforeInsertUpdate()
			RETURNS trigger
			AS 'com.percona.blog.pljava.CustomerCrypto.customerBeforeInsertUpdate'
			LANGUAGE java;

CREATE TRIGGER tg_customerBeforeInsertUpdate
			BEFORE INSERT ON customer
			FOR EACH ROW
			EXECUTE PROCEDURE customerBeforeInsertUpdate();

At this point, our data isn’t encrypted yet but we can do it with a noop update and the trigger will do its magic:
test=# SELECT * FROM customer LIMIT 3;
 customer_id | store_id | first_name | last_name |                email                | address_id | activebool | create_date |     last_update     | active 
-------------+----------+------------+-----------+-------------------------------------+------------+------------+-------------+---------------------+--------
           1 |        1 | MARY       | SMITH     | MARY.SMITH@sakilacustomer.org       |          5 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           2 |        1 | PATRICIA   | JOHNSON   | PATRICIA.JOHNSON@sakilacustomer.org |          6 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
           3 |        1 | LINDA      | WILLIAMS  | LINDA.WILLIAMS@sakilacustomer.org   |          7 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
(3 rows)

test=# UPDATE customer SET first_name = first_name, last_name = last_name, email = email;
UPDATE 599

test=# SELECT * FROM customer LIMIT 3;
 customer_id | store_id |                                                                                                                                                                       
 first_name                                                                                                                                                                        |            
                                                                                                                                                            last_name                           
                                                                                                                                              |                                                 
                                                                                                                         email                                                                  
                                                                                                         | address_id | activebool | create_date |        last_update         | active 
-------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------+------------+------------+-------------+----------------------------+--------
           3 |        1 | DT8oXb0VQvtFSIOv61zZfmUrpjWWGoeyl8D8tQl7naCLT31WlJn3U+uILYqUedSOdDSO17QdQKwChWG+DrcvYEih9RYyEPR2ja9deN4cn+vwHt/v09HDfmwrsJUt0UgP/fp78hCxkJDAV50KkMUsA23aeH5HRn9nCHOH0P
AcuId+7acCgwvU9YP8Sx2KVeVnLaBrzpeRLrsmAczQLUAXilXfdFC8uT2APBfwx1So2eCe+kSOsjcu1yTwlsa95Dnfu/N++Zm1D4knKUrAuNm5svTHjIz+B4HKXFMPz/Yk7KYF6ThB6OshIomyRvSEtKu5torfdwAvT3tsgP2DLWiKgQ== | H0YRoi10z36
tnNSXpBs/oYfMQRbAhfUYLIcE885Dhxmy2mbuhecCCqPcye5/++MhUwmEQG2pBgfsqWHLOnAgbqjaG3O0reipVysYK7cMysX1w5RVINsyD5H3vCqgnHESfdRuhW3b00InkR2qCtBYX1QJ1tKJZz89D2AOjoTq5jTum00vcLT06h6ZxVh1RKLNAuGpY9qN57m
/9a4JZuff9poYjw2PPQ6kTxhtbFl3bw+B3sJUcLFuFMYUoAAHsVETQRAerH1ncG9Uxi+xQjUtTVBqZdjvED+eydetH7vsnjBuYDtXD9XAn14qmALx5NfvwpU5jfpMOPOM4xP1BRVA2Q== | DpWBfhhii4LRPxZ9XJy8xoNne+qm051wD5Gd9AMHc+oIhx/B
ln6H+lcAM3625rKN1Vw/lG6VkQo2EnoZz/bhFtULvAOAUiBxerBDbYe0lYWqI50NxnFJbkexMkjDSiuoglh3ilRBn6Z+WGLc7FfEprOd1+tULW2gcwLI68uoEhfSY7INQZuGXfOUMAM4roB2fWvEfylL1ShbiGTRjX7KGXQbXLJtm7xel8J2VhdCecXzxzY2
Mtnu3EXGNpFy9atTXxE/fI0C5AX/u2FDZiOHz9xV7sB3atcqAeXwJB0smpBnPbwI3BN+ptzsWnhyFNNS+ol4QayMDgFhi/tp2+lCAQ== |          7 | t          | 2006-02-14  | 2021-08-08 19:10:29.337653 |      1
           4 |        2 | jo3zKr6lJ5zMN5f3/BPgENhs9CdzDu7F/uFxAf6uO9MAKc7X+++ipk/OBbvvA8vpaJ7DTgph9LshRvpYIqPMwS6IubkScDOSRDoLBNW9z2oMF3dB46R86LK0pTEVVrGaddjnPzaAEh7Uwzy3LncC1y7pJqGKW1b3RGUE8n
4SgstMo/4tRNUe/AUcPn9FXkCoc5jFvn8gPwVoTRKwhYu0oTco2gNKZs1rmFdmkmobGaMjZuhWyWG2PO1oXIVPkpgILmD42yAMDxWkS4DVvgJChXJRukjBzfOitsQwHapjqPqf/q3mfBaQzNUwROcSfGBe6KlI5yfjWU309KRKaCYWNQ== | MMhPovG/N3k
Xjou6kS9V7QtEvlA5NS8/n62KVRVVGEnsh5bhwEhBZxlK72AQv8e4niATWPEiJJU6i7Z08NkU5FWNIvuWwlGTdEEW+kK7XQXib6cNAdnmo4RH72SWhDxEp3tMwwoZif2932H8WDEbNdP6bCP69ekBA7Z+nGtXaeh+H9BAaLf1e6XunBj2YN7zs4sFWB2Kxs2
IugLWd9g9677BWzUeJIzpJfVLro4HYlzASh9AMKb8wPRU0LlEpxtqUdejj7IY5M1hVhDTCCLSQjSqJscqzG1pYQ04W7KNdGwWxJPMjvyPC2K4H+HQuW0IWVjvFpyYd/5q1eIQX+vjdw== | oF4nyIjtVtWuydg6QiSg3BDadWe48nhbBEZSLrR5GVigA768
E3n1npN6sdstzG7bRVnphtfwIZwQ3MUFURWCbUCe0VqioNgVXFhiRvr3DAw2AH64ss/h65B2U5whAygnq4kiy5JvPD0z0omtfs9913QeoO+ilvPVLEc0q3n0jD9ZQlkNVfHSytx1NY86gWnESquTVhkVQ55QDV8GY70YLX9V6nU7ldu+zpNLmf2+rfpxqbRC
i16jnHGDcTT7CKeq+AxbiJDeaaAmSPpxTZsrX4sXFW4rpNtSmOyuyHZziy8rkN8xSpyhvrmxjC7EYe4bn6L/+hay108Wn0BSFYe2ow== |          8 | t          | 2006-02-14  | 2021-08-08 19:10:29.337653 |      1
<...>
(3 rows)

test=#

Awesome, we get our data encrypted! What about the “decrypt” part of the class? Let’s check it out:
test=# CREATE OR REPLACE FUNCTION getCustomerCrypto(int) RETURNS SETOF customer AS 'com.percona.blog.pljava.CustomerCrypto.getCustomerCrypto' LANGUAGE java;
CREATE FUNCTION

test=# SELECT * FROM getCustomerCrypto(10);
 customer_id | store_id | first_name | last_name |               email               | address_id | activebool | create_date |     last_update     | active 
-------------+----------+------------+-----------+-----------------------------------+------------+------------+-------------+---------------------+--------
          10 |        1 | DOROTHY    | TAYLOR    | DOROTHY.TAYLOR@sakilacustomer.org |         14 | t          | 2006-02-14  | 2006-02-15 09:57:20 |      1
(1 row)

test=#

Worked like a charm! Here we finish part two and at this point, we are able to query and manipulate objects inside of our database. The next and last article of this series will cover external calls, and we’ll see how to use external resources from PL/Java. Don’t miss it!

[1] https://github.com/elchinoo/blogs/tree/main/pljava

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Jul
27
2017
--

The Ultimate Guide to MySQL Partitions

This blog was originally published in July 2017 and was updated in August 2023.

It’s a pretty common question around here, so let’s see what we can do about that.

So, What is MySQL Partitioning?

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables but still gets treated as a single table by the SQL layer.

When partitioning in MySQL, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, and DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather and queries ALL partitions in a UNION that is not concurrent.

Generally, you must add the partition key into the primary key along with the auto-increment, i.e., PRIMARY KEY (part_id, id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

Want to ensure a successful MySQL 5.7 to 8.0 Upgrade? Watch this on-demand webinar from Percona to learn how!

 

What are the Different Types of MySQL Partitions?

Horizontal and vertical MySQL partitions are techniques used to divide tables into smaller sections for improved performance and management. Horizontal partitioning splits a table into smaller tables with identical columns but distinct rows, while Vertical partitioning divides a table into separate tables with related columns. Various partition types, like RANGE, LIST, HASH, and KEY, are used for specific needs, from range-based data to custom criteria, to ensure efficient data handling and the optimization of queries. Note: Vertical partitioning is not supported in MySQL 8.0 — more on this later.

RANGE partitioning

RANGE partitioning in MySQL is a data partitioning technique where a large table is divided into smaller partitions based on a specified range of column values like dates or numeric intervals. Each partition holds data that falls within a specific range, optimizing data handling and query speed.

HASH partitioning

HASH partitioning in MySQL divides a table into partitions based on the hash value of a designated column’s contents. Unlike range or list partitioning, where you manually designate the partition for specific column values, hash partitioning automatically assigns values to partitions based on hashing. This method distributes data evenly across partitions to achieve balanced storage and optimal query performance.

LIST partitioning

LIST partitioning in MySQL shares similarities with range partitioning. As with range partitioning, each partition is explicitly defined, but in list partitioning, partitions are created and assigned based on including a column value in predefined value lists rather than the contiguous ranges of values used in range partitioning.

COLUMNS partitioning

COLUMNS partitioning in MySQL is a technique that involves dividing a table into partitions based on specific columns’ values. Unlike other partitioning methods focusing on the entire row, column partitioning separates columns into different partitions. This approach is helpful when working with tables with many columns or when specific columns are frequently updated.

KEY partitioning

KEY partitioning is similar to HASH partitioning, except that only one or more columns to be evaluated are specified, and the MySQL server provides its own hashing function. These columns can contain other than integer values since the hashing function supplied by MySQL guarantees an integer result regardless of the column data type.

MySQL Partitioning in Version 5.7

MySQL version 5.7 introduced various features and capabilities for partitioning, enhancing its utility in managing large datasets. It enabled dividing large tables into smaller, manageable segments based on defined criteria. This facilitates improved data organization, query optimization, and maintenance.

In version 5.7, MySQL partitioning supports multiple partitioning types, including RANGE, LIST, HASH, KEY, and COLUMNS. Each type caters to different data distribution needs. 

Using partitioning in a MySQL 5.7 environment offers several practical benefits. It significantly improves query performance by reducing the amount of data scanned during queries, which is especially helpful when dealing with large tables. Partition pruning, a feature in MySQL 5.7, ensures that only relevant partitions are accessed, further enhancing query efficiency. Additionally, partitioning aids in maintenance tasks like archiving and purging old data, as operations can be performed on individual partitions instead of the entire table.

Are you ready for MySQL 5.7 EOL? Percona can help. 

Learn More

 

MySQL Partitioning in MySQL 8.0

MySQL 8.0 brought substantial advancements and enhancements to partitioning, significantly elevating its capabilities. This version introduces key features and optimizations that address limitations from previous iterations.

One major enhancement is the support for subpartitioning. MySQL 8.0 allows you to create subpartitions within existing partitions, providing an additional level of data segmentation. This feature facilitates even more precise data organization and management, allowing for complex use cases involving multi-level data hierarchies.

Additionally, MySQL 8.0 introduces automated list partitioning, simplifying partition creation through by enabling the database to determine the partition based on the values inserted automatically.

This version also notably integrates native backing for range and list partitioning of spatial indexes, amplifying geospatial query speed for substantial datasets. Enhancements to the query optimizer improve partition pruning for both single-level and subpartitioned tables, leading to improved query performance.

To sum it up, MySQL 8.0 significantly advances partitioning with features like subpartitioning, automatic list partitioning, and improved query optimization. These enhancements address limitations from previous versions, allowing for more complex data organization, streamlined management, and optimized query performance.

Upgrading MySQL to 8.0? Check out this blog to learn how to avoid disaster!

What are the Benefits of MySQL Partitions?

MySQL partitioning offers several advantages in terms of query performance and maintenance:

Enhanced Query Performance: Partitioning improves query performance by minimizing the amount of data scanned during queries. As the data is distributed into smaller partitions, the database engine only needs to scan relevant partitions, leading to faster query responses.

Optimized Resource Utilization: Partitioning enables parallelism in query execution across partitions. This means that multiple partitions can be processed simultaneously, making better use of available hardware resources and further enhancing query performance.

Data Retention and Deletion: Partitioning simplifies the archiving or deleting of old data by targeting specific partitions, and enhancing data retention policies.

Reduced Overhead: Partitioning can significantly reduce the overhead of managing large tables. For example, when inserting or deleting data, the database engine only needs to modify the relevant partitions, which can be much faster than performing these operations on the entire table.

Streamlined Maintenance: Partitioning simplifies maintenance operations. For example, you can perform maintenance tasks like index rebuilds, statistics updates, or data archiving on specific partitions rather than the entire table, minimizing downtime and optimizing resource utilization.

Data Lifecycle Management: Partitioning supports efficient data lifecycle management. Old or infrequently accessed data can be stored in separate partitions or even archived, allowing for better control over data retention and optimization of storage resources.

Enhanced Scalability: Partitioning enhances the database’s ability to scale, as data can be distributed across different storage devices.

In summary, MySQL partitioning brings substantial advantages to both query performance and maintenance. It improves data retrieval speed, enhances resource utilization, streamlines maintenance operations, optimizes storage management, and reduces overheads associated with large tables. These benefits collectively contribute to a more efficient database environment.

What are the Challenges and Limitations of MySQL Partitions?

While there are lots of positives about using MySQL partitioning, there can also be challenges and limitations that users should be aware of:

Query Optimization Complexity: Although partitioning can enhance query performance, it requires queries to be designed with partitioning key considerations in mind. Inappropriately designed queries may not fully utilize partitioning benefits, leading to poor performance.

Limited Key Choices: Not all columns are suitable for partitioning keys. Choosing a proper partitioning key is crucial, and inappropriate selections can result in uneven data distribution across partitions, impacting performance.

Suboptimal Partitioning Strategies: Choosing the wrong partitioning strategy or key can lead to performance degradation. For instance, using partitioning on a table with a small number of rows may not provide significant benefits and can even worsen performance due to increased complexity.

Limited Parallelism: While partitioning allows for parallel processing, there might be limitations on how many partitions can be processed concurrently based on hardware resources, potentially impacting query performance.

Data Skewing: In some scenarios, data might not be uniformly distributed across partitions, causing “data skew.” This can lead to uneven resource utilization and slower performance for certain partitions.

Replication and Backup Issues: MySQL partitioning might impact the way data replication and backups are performed. Special considerations are needed to ensure these processes still work seamlessly after partitioning.

So, while MySQL partitioning does offer advantages, it also brings challenges and limitations related to complexity, maintenance, query optimization, and performance. Careful planning and continuous monitoring are crucial to facing these challenges and achieving optimal performance.

Performance Optimization with MySQL Partitioning

MySQL partitioning enhances query performance by enabling the database to focus on relevant data partitions during queries. This reduces the amount of data that needs to be scanned, resulting in faster data retrieval. For example, when querying a large table for specific date ranges, partitioning allows the engine to scan only relevant partitions containing data within those ranges.

Query execution plans are positively impacted by partitioning. The query optimizer recognizes partitioning schemes and generates execution plans that use partition pruning. This means the optimizer can skip unnecessary partitions, resulting in optimized query plans that use fewer resources and execute more quickly.

Partitioning influences indexing strategies by narrowing the scope of indexing. Instead of indexing the entire table, partitioning allows for more focused indexing. This minimizes index size and boosts efficiency, leading to faster query performance.

In scenarios where partitioning aligns with natural data distribution, such as time-series data or geographical regions, query execution time is significantly reduced. Queries that involve specific partitions can bypass irrelevant data; for instance, when searching for transactions within a certain date range, partitioning enables the database to search only the relevant partition.

Best Practices for Implementing MySQL Partitioning

With these best practices, you can ensure that your MySQL partitioning setup is efficient, well-maintained, and improves database performance.

Choose the Correct Partition Key: Select a partition key that aligns with your data distribution and query patterns. Common choices include time-based or range-based values.

Monitor Query Performance: Continuously monitor query performance after partitioning. Use tools like EXPLAIN to assess query execution plans.

Watch for Bloat: Over time, partitions can accumulate large amounts of data, leading to slow queries.

Proper Indexing: Partitioned tables benefit from proper indexing. Ensure that the chosen partition key is part of the primary or unique key. Additionally, consider indexing frequently queried columns to improve performance further.

Regular Maintenance: Perform routine maintenance tasks, such as purging old data from partitions, optimizing indexes, and rebuilding partitions.

Backup and Restore: As we mentioned earlier, partitioning can impact backup and restore strategies. Ensure your backup and restore procedures account for partitioned data to prevent data loss and ensure reliable recovery.

Test, Test, and Test Again: Before implementing partitioning in production, thoroughly test it in a controlled environment. This helps identify potential issues and fine-tune the partitioning strategy.

Documentation: Always be documenting! Be sure to include your partitioning strategy, why certain partition keys are used, and your maintenance procedures.

Talk to experts: If you’re new to partitioning or dealing with complex scenarios, consider consulting with experts.

Choosing the Right Partitioning Strategy

Selecting the appropriate partitioning strategy in MySQL involves carefully considering various factors, including:

Understanding your data’s nature and distribution. For range-based data, consider range partitioning, while list partitioning is suitable for discrete values. Hash partitioning evenly distributes data.

Analyzing query patterns to align the strategy with how data is accessed. Time-based queries benefit from range partitioning, while hash partitioning suits equally accessed values.

Matching the strategy to your database requirements. For archiving historical data, consider range-based on time. High-write workloads might favor hash or key partitioning for load balancing.

Watching for changes in data patterns. As data grows, a previously effective strategy might become less optimal. Periodically review and adjust.

Any partitioning strategy should improve query performance, not lead to suboptimal queries — test and benchmark strategies before implementation.

Ensuring the strategy aligns with maintenance operations. For example, rebuilding large partitions might often impact uptime. Select a strategy that minimizes disruptions.

Continuously monitoring query performance after implementation. Be ready to adjust your strategy as needed.

Evaluating how your chosen strategy accommodates future growth, as some strategies scale better with larger datasets.

Choosing the right partitioning strategy is pivotal to database performance. By aligning the strategy with your data’s characteristics and specific requirements, you ensure that your MySQL partitioning delivers the desired results.

Elevate your MySQL database management: Get started with Percona Distribution for MySQL.

Migrating to MySQL 8.0 and 5.7 EOL Support with Percona

Proper MySQL partitioning optimizes databases by splitting large tables into smaller parts, enhancing query speed and data management while reducing overhead and making maintenance easier. But, users need to understand that careful planning, monitoring, and testing are vital to avoid any potential performance declines due to improper setup.

Looking to upgrade to MySQL 8.0 or stay on 5.7? Percona will support you either way.

 

Move to MySQL 8.0  Get Post-EOL Support For MySQL 5.7

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com