Aug
24
2018
--

PostgreSQL Accessing MySQL as a Data Source Using mysql_fdw

PostgreSQL foreign tables in MySQL

PostgreSQL foreign tables in MySQLThere are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.

This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post

Preparing MySQL for fdw connectivity

On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.

mysql> create user 'fdw_user'@'%' identified by 'Secret!123';

This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.

mysql> grant select,insert,update,delete on EMP to fdw_user@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> grant select,insert,update,delete on DEPT to fdw_user@'%';
Query OK, 0 rows affected (0.00 sec)

Installing mysql_fdw on PostgreSQL server

Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongside other development packages. To switch to Percona packages as client libraries, you need to have the Percona development packages too.

sudo yum install Percona-Server-devel-57-5.7.22-22.1.el7.x86_64.rpm

Now we should be able to install the mysql_fdw from PGDG repository:

sudo yum install mysql_fdw_10.x86_64

Connect to the PostgreSQL server where we are going to create the foreign table, and using the command line tool, create mysql_fdw extension:

postgres=# create extension mysql_fdw;
CREATE EXTENSION

Create a server definition to point to the MySQL server running on a host machine by specifying the hostname and port:

postgres=# CREATE SERVER mysql_svr  FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host 'hr',port '3306');
CREATE SERVER

Now we can create a user mapping. This maps the database user in PostgreSQL to the user on the remote server (MySQL). While creating the user mapping, we need to specify the user credentials for the MySQL server as shown below. For this demonstration, we are using PUBLIC user in PostgreSQL. However, we could use a specific user as an alternative.

postgres=# CREATE USER MAPPING FOR PUBLIC SERVER mysql_svr OPTIONS (username 'fdw_user',password 'Secret!123');
CREATE USER MAPPING

Import schema objects

Once we complete the user mapping, we can import the foreign schema.

postgres=# IMPORT FOREIGN SCHEMA hrdb FROM SERVER mysql_svr INTO public;

Or we have the option to import only selected tables from the foreign schema.

postgres=# IMPORT FOREIGN SCHEMA hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO public;

This statement says that the tables “EMP” and “DEPT” from the foreign schema named “hrdb” in mysql_server need to be imported into the  public schema of the PostgreSQL database.

FDWs in PostgreSQL allow us to import the tables to any schema in postgres.

Let’s create a schema in postgres:

postgres=# create schema hrdb;
postgres=# IMPORT FOREIGN SCHEMA hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO hrdb;

Suppose we need the foreign table to be part of multiple schemas of PostgreSQL. Yes, it is possible.

postgres=# create schema payroll;
CREATE SCHEMA
postgres=# create schema finance;
CREATE SCHEMA
postgres=# create schema sales;
CREATE SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO payroll;
IMPORT FOREIGN SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO finance;
IMPORT FOREIGN SCHEMA
postgres=# IMPORT FOREIGN SCHEMA  hrdb limit to ("EMP","DEPT") FROM SERVER mysql_svr INTO sales;
IMPORT FOREIGN SCHEMA

You might be wondering if there’s a benefit to doing this. Yes, since in a multi-tenant environment, it allows us to centralize many of the master/lookup tables. These can even sit in a remote server, and that can be MySQL as well!.

IMPORTANT: PostgreSQL extensions are database specific. So if you have more than one database inside a PostgreSQL instance/cluster, you have to create a separate fdw extension, foreign server definition and user mapping.

Foreign tables with a subset of columns

Another important property of foreign tables is that you can have a subset of columns if you are not planning to issue DMLs on the remote table. For example MySQL’s famous sample database Sakila contains a table “film” with the following definition

CREATE TABLE `film` (
`film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` text,
`release_year` year(4) DEFAULT NULL,
`language_id` tinyint(3) unsigned NOT NULL,
`original_language_id` tinyint(3) unsigned DEFAULT NULL,
`rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
`rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
`length` smallint(5) unsigned DEFAULT NULL,
`replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',
`special_features` set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`),
KEY `idx_fk_language_id` (`language_id`),
KEY `idx_fk_original_language_id` (`original_language_id`),
CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE,
CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8

Imagine that we don’t need all of these fields to be available to the PostgreSQL database and its application. In such cases, we can create a foreign table with only the necessary columns in the PostgreSQL side. For example:

CREATE FOREIGN TABLE film (
film_id smallint NOT NULL,
title varchar(255) NOT NULL,
) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'film');

The challenges of incompatible syntax and datatypes

There are many syntactical differences between MySQL and PostgreSQL. Consequently, you may need to manually intervene to create foreign tables. For example, MySQL tables accepts definition of enumerations in place, whereas PostgreSQL expects enumeration types to be defined before creating the table like this:

CREATE TYPE rating_t AS enum('G','PG','PG-13','R','NC-17');

Many such things are not handled perfectly. So it is better to specify them as a text datatype. The same applies to the set datatype.

CREATE FOREIGN TABLE film (
film_id smallint NOT NULL,
title varchar(255) NOT NULL,
rating text,
special_features text
) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'film');

I’m used to receiving scepticism from people about treating enum and set as text. Well, please don’t forget that we are not storing them in PostgreSQL, the text datatype is just a method for handling input and output from the table. The data is pulled and pushed from the foreign server, which is MySQL, and this converts these texts into the corresponding enumeration before storing them.

IMPORTANT : mysql_fdw has the capability to do data type conversion (casting) automatically behind the scenes when a user fires DML against foreign tables.

Generally, DML against a remote MySQL database from the PostgreSQL side can be quite challenging because of the architecture differences. These impose restrictions, such as the first column of the foreign table must be unique. We will cover these in more depth in a future post.

Handling views on the MySQL side

While foreign tables are not limited tables on the MySQL side, a view can also be mapped as a foreign table. Let’s create a view in the MySQL database.

mysql> create view v_film as select film_id,title,description,release_year from film;

PostgreSQL can treat this view as a foreign table:

postgres=# CREATE FOREIGN TABLE v_film (
film_id smallint,
title varchar(255) NOT NULL,
description text,
release_year smallint ) SERVER mysql_svr OPTIONS (dbname 'sakila', table_name 'v_film');
CREATE FOREIGN TABLE

Views on the top of foreign table on PostgreSQL

PostgreSQL allows us to create views on the top of foreign tables. This might even be pointing to a view on the remote MySQL server. Let’s try creating a view using the newly created foreign table v_film.

postgres=# create view v2_film as select film_id,title from v_film;
postgres=# explain verbose select * from v2_film;
QUERY PLAN
--------------------------------------------------------------------------
Foreign Scan on public.v_film  (cost=10.00..1010.00 rows=1000 width=518)
Output: v_film.film_id, v_film.title
Local server startup cost: 10
Remote query: SELECT `film_id`, `title` FROM `sakila`.`v_film`
(4 rows)

Materializing the foreign tables (Materialized Views)

One of the key features mysql_fdw implements is the ability to support persistent connections. After query execution, the connection to the remote MySQL database is not dropped. Instead it retains the connection for the next query from the same session. Nevertheless, in some situations, there will be concerns about continuously streaming data from the source database (MySQL) to the destination (PostgreSQL). If you have a frequent need to access data from foreign tables, you could consider the option of materializing the data locally. It is possible to create a materialized view on the top of the foreign table.

postgres=# CREATE MATERIALIZED VIEW mv_film as select * from film;
SELECT 1000

Whenever required, we can just refresh the materialized view.

postgres=# REFRESH MATERIALIZED VIEW mv_film;
REFRESH MATERIALIZED VIEW

Automated Cleanup

One of the features I love about the FDW framework is its ability to clean up foreign tables in a single shot. This is very useful when we setup foreign table for a temporary purpose, like data migration. At the very top level, we can drop the extension, PostgreSQL will walk through the dependencies and drop those too.

postgres=# drop extension mysql_fdw cascade;
NOTICE:  drop cascades to 12 other objects
DETAIL:  drop cascades to server mysql_svr
drop cascades to user mapping for public on server mysql_svr
drop cascades to foreign table "DEPT"
drop cascades to foreign table "EMP"
drop cascades to foreign table hrdb."DEPT"
drop cascades to foreign table hrdb."EMP"
drop cascades to foreign table payroll."DEPT"
drop cascades to foreign table payroll."EMP"
drop cascades to foreign table finance."DEPT"
drop cascades to foreign table finance."EMP"
drop cascades to foreign table sales."DEPT"
drop cascades to foreign table sales."EMP"
DROP EXTENSION
postgres=#

Conclusion

I should concede that the features offered by mysql_fdw are far fewer compared to postgres_fdw. Many of the features are not yet implemented, including column renaming. But the good news is that the key developer and maintainer of mysql_fdw is here with Percona! Hopefully, we will be able to put more effort into implementing some of the missing features. Even so, we can see here that the features implemented so far are powerful enough to support system integration. We can really make the two sing together!

Percona’s support for PostgreSQL

As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here.

The post PostgreSQL Accessing MySQL as a Data Source Using mysql_fdw appeared first on Percona Database Performance Blog.

Aug
21
2018
--

Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw

Foreign data wrapper FDWs with PostgreSQL postgres_fdw

Foreign data wrapper FDWs with PostgreSQL postgres_fdwThere are a few features in PostgreSQL that are very compelling, and that I rarely see in other RDBMSs. Some of these features are the driving force behind the growing popularity of PostgreSQL. This blog post is about one of my favourite features: FDW (Foreign Data Wrapper). As the name indicates, this feature allows a PostgreSQL database to treat tables in a remote PostgreSQL database as locally available tables.

The history of FDW began when SQL/MED came out as part of the ANSI SQL standard specification in 2003. MED stands for “Management of External Data”. By definition, “external data” is the data that the DBMS is able to access but does not manage. There are two parts for this specification:

  1. Foreign Table : this is about how to access external data sources and present them as relational tables.
  2. Datalink : this extends the functionality of database systems to include control over external files without the need to store their contents directly in the database, such as LOBs. A column of a table could directly refer a file.

PostgreSQL’s FDW capabilities addresses foreign tables only. It was introduced in PostgreSQL 9.1 and has been receiving improvements ever since.

Today there are a variety of FDWs which allow PostgreSQL to talk to most of the data sources we can think of. However, most FDWs are independent open source projects implemented as Postgres Extensions, and not officially supported by the PostgreSQL Global Development Group.

postgres_fdw

In this blog post we will take a closer look at the postgres_fdw which can be considered as the “reference implementation” for other FDW development efforts, and showcases its capabilities. This is the one FDW which comes with PostgreSQL source as a contrib extension module. The only other FDW which is part of PostgreSQL source tree is file_fdw.

Let’s look into postgres_fdw with a use case. In many organizations, there could be multiple systems catering to different functionalities/departments. For example, while an HR database may be holding the employee information the finance and payroll systems may need to access that same data. A common—but bad—solution for this is to duplicate the data in both systems. Data duplication often leads to problems, starting with data maintenance and accuracy. A smarter option, to avoid duplication while providing access to foreign databases to only the required data, is through FDWs.

Installation postgres_fdw

The Postgres Development Group (PGDG) offers PostgreSQL packages for all major Linux distributions. postgres_fdw itself is provided as a module that is usually included in the contrib package. In the example below we install such package for PostgreSQL 10 running on Red Hat/CentOS:

$ sudo yum install postgresql10-contrib.x86_64

Steps to setup

Let’s consider two PostgreSQL Instances, source instance and a destination instance

  • source is the remote postgres server from where the tables are accessed by the destination database server as foreign tables.
  • destination is another postgres server where the foreign tables are created which is referring tables in source database server.

We are going to use these definitions of source and destination in the rest of the post. Let’s assume that current application connects to destination database using a user app_user.

Step 1 : Create a user on the source

Create a user in the source server using the following syntax. This user account will be used by the destination server to access the source tables

postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';

Step 2 : Create test tables (optional)

Let’s create a test table in the source server and insert a few records.

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));
CREATE TABLE
postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');
INSERT 0 3

Step 3 : Grant privileges to user in the source

Give appropriate privileges to the fdw_user on the source table. Always try to limit the scope of privilege to minimum to improve security.
An example syntax is as following :

postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user;
GRANT

Step 4 : Modify ACL in pg_hba.conf

We need to ensure that the proper authentication is setup for accessing source server from destination server.
Add an entry into  pg_hba.conf as shown below, preferably at the beginning of the file.

host    all all     destination_server_ip/32          md5

Step 5 : Test connectivity and privileges on source

Before proceeding further, It is a good idea to make sure that we are able to connect to the source machine from this destination machine using the newly created database user (fdw_user).

In order to validate, on the destination server, use psql to connect to the source server:

$ psql -h hr -U fdw_user postgres

You could even validate all privileges on the tables which are to be presented as foreign tables using this connection.

Step 6 : Create postgres_fdw extension on the destination

Connect to destination server, and create the postgres_fdw extension in the destination database from where you wish to access the tables of source server. You must be a superuser to create the extension.

No postgres_fdw extension is needed on the source server.

postgres=# create extension postgres_fdw;
CREATE EXTENSION

Validate if the extension is created using \dx. Following is an example validation log.

postgres=# \dx postgres_fdw
                            List of installed extensions
    Name    | Version | Schema |                    Description
--------------+---------+--------+----------------------------------------------------
postgres_fdw | 1.0     | public | foreign-data wrapper for remote PostgreSQL servers
(1 row)

Step 7: Grant privileges to user in the destination

Always better to limit the scope of the server definition to an application user. If a regular user needs to define a server, that user needs to have USAGE permission on the foreign data wrapper. Superuser can grant the privilege

postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user;
GRANT

Alternatively, superuser (postgres) can create a server definition and then grant USAGE permission on that server definition to the application user like this:

postgres=# GRANT USAGE ON FOREIGN SERVER hr TO app_user;
GRANT

Step 8: Create a server definition

Now we can create a server definition. This foreign server is created using the connection details of the source server running on host “hr”. Let’s name the foreign server as itself as “hr”

postgres=> CREATE SERVER hr
 FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (dbname 'postgres', host 'hr', port '5432');
CREATE SERVER

Step 9: Create user mapping from destination user to source user

Create a mapping on the destination side for destination user (app_user) to remote source user (fdw_user)

postgres=> CREATE USER MAPPING for app_user
SERVER hr
OPTIONS (user 'fdw_user', password 'secret');
CREATE USER MAPPING

Step 10 : Create foreign table definition on the destination

Create a foreign table in the destination server with the same structure as the source table, but with OPTIONS specifying schema_name and table_name

postgres=# CREATE FOREIGN TABLE employee
(id int, first_name character varying(20), last_name character varying(20))
SERVER hr
OPTIONS (schema_name 'public', table_name 'employee');

Step 11 : Test foreign table

Validate whether we can query the foreign table we just created in the destination server.

postgres=> select * from employee;
id | first_name | last_name
----+------------+-----------
1 | jobin | augustine
2 | avinash | vallarapu
3 | fernando | camargos
(3 rows)

As we can see from the above example, data is been accessed from the source database.

Now you might be thinking: “creating foreign tables one by one like this on the destination server is painful. Is it possible to do it automatically?“. The answer is yes – there is an option to import a full schema.

On the destination server, you can use the following syntax to import a schema.

postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

If you wish to choose a certain list of tables for import, you can use the following syntax.

postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

In the above example, it will import the definition of only one table (employee).

Advantages of foreign tables

The main use case of the foreign tables is to make the data available to systems without actually duplicating/replicating it. There are even simple implementations of sharding using FDW, because data in the other shards can be made available for queries though FDWs.

A person coming from an Oracle-like background might think: “I can get data from a remote database table using simple DBLinks so what is the difference?“. The main difference is that FDW will maintain the meta-data/table definition about the foreign table locally. This results in better decisions compared to sending a simple SELECT * FROM <TABLE> to pull all results. We are going to see some of these advantages.

Note : In the following section always pay special attention on those lines starting with “Remote SQL:”

Query optimization

Since the definition of the foreign table is held locally, all query optimizations are made for remote executions too. Let’s consider a slightly more complex example where we have EMP (employee) and DEPT (department) tables in the HR database and SALGRADE (salary grade) table in the finance database. Suppose we want to know how many employees there are with a particular salary grade:

SELECT COUNT(*)
FROM EMP
JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL
WHERE SALGRADE.GRADE = 4;

Let’s see how PostgreSQL handles this:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: count(*)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=0)
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)

Please pay special attention for the line reading :

Remote SQL: SELECT sal FROM public.emp

It knows that only the sal column need to be fetched from the remote database.
If we change the count(*) to ename (Employee Name) column, the remote SQL changes like:

Remote SQL: SELECT ename, sal FROM public.emp

PostgreSQL tries to pull only the absolutely necessary data from the remote server.

Writable foreign tables

At the beginning, foreign tables were just readable. But, with time, the community introduced writable foreign tables functionality in PostgreSQL. Let us consider the following situation where management wants to give a salary increase of 10% to grade 3 employees:

UPDATE emp
SET    sal = sal * 1.1
FROM   salgrade
WHERE  emp.sal > salgrade.losal
AND emp.sal < salgrade.hisal
AND salgrade.grade = 3;

In this case, we are updating data on a remote table using a join condition with a local table. As we can see in the explain plan, an UPDATE statement is more complex because it involves 2 steps. First it needs to fetch the data from the remote table to complete the join operation. Then, it updates the rows in the foreign table.

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on public.emp  (cost=100.00..300.71 rows=669 width=118)
   Remote SQL: UPDATE public.emp SET sal = $2 WHERE ctid = $1
   ->  Nested Loop  (cost=100.00..300.71 rows=669 width=118)
         Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, (emp.sal * '1.1'::double precision), emp.comm, emp.deptno, emp.ctid, salgrade.ctid
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..128.06 rows=602 width=112)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno, emp.ctid
               Remote SQL: SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno, ctid FROM public.emp FOR UPDATE
         ->  Materialize  (cost=0.00..35.55 rows=10 width=14)
               Output: salgrade.ctid, salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=14)
                     Output: salgrade.ctid, salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 3)

Operator and function pushdown

PostgreSQL 9.5 release included the capability to assess and decide on the safety of pushing a function execution to remote server. Built-in functions are good candidates for this:

SELECT avg(sal)
FROM EMP
WHERE EMP.SAL > (SELECT LOSAL FROM SALGRADE WHERE GRADE = 4);

This statement results in the following query plan

QUERY PLAN
---------------------------------------------------------------------------
 Foreign Scan  (cost=137.63..186.06 rows=1 width=8)
   Output: (avg(emp.sal))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT avg(sal) FROM public.emp WHERE ((sal > $1::integer))
   InitPlan 1 (returns $0)
     ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=4)
           Output: salgrade.losal
           Filter: (salgrade.grade = 4)

If the planner finds that the majority of records needs to be fetched from a remote server, it may not push the function execution to the remote server. For example:

SELECT avg(sal)
FROM EMP
JOIN SALGRADE ON EMP.SAL > SALGRADE.LOSAL AND EMP.SAL < SALGRADE.HISAL
WHERE SALGRADE.GRADE = 4;

In this case, the planner decides to do the function execution on the local server:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: avg(emp.sal)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=8)
         Output: emp.sal
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)
(13 rows)

A great improvement in PostgreSQL 9.6 is that the function does’t need to be even a built-in function. If a user defined function or operator is immutable it becomes a good candidate for being executed in the remote server.

Join push down

In many cases, it is worth pushing down the entire join operations to the remote server in such a way only the results need to be fetched to the local server. PostgreSQL handles this switching intelligently. Here’s an example:

postgres=# EXPLAIN VERBOSE SELECT COUNT(*)
FROM EMP JOIN  DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=100.56..194.84 rows=1 width=8)
   Output: (count(*))
   Relations: Aggregate on ((public.emp) INNER JOIN (public.dept))
   Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))
(4 rows)

Predicate push down

There are two options when executing a query against a foreign table:

  1. Fetch the data locally and apply the predicates like filtering condition locally.
  2. Send the filtering condition to the remote server and have it applied there.

The latter will can be the best option in many cases.

If you consider the previous example, we can see that  the predicate specification like “DEPT.deptno=10;” is pushed down to the remote server through foreign tables and applied there separately like this:

Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))

PostgreSQL not only pushed the predicate, it also rewrote the query we sent to avoid one extra AND condition.

Aggregate push down

Just like predicate push down, here PostgreSQL also considers 2 options:

  1.  Execute the aggregates on the remote server and pull the result back to the local server
  2. Do the aggregate calculations on the local database instance after collecting all required data from remote database

We’ve already seen an aggregate pushdown example as part of the function pushdown, since we’ve used an aggregate function for that example. Here’s another simple example:

postgres=# explain verbose select deptno,count(*) from emp group by deptno;
                            QUERY PLAN
------------------------------------------------------------------
 Foreign Scan  (cost=114.62..159.88 rows=200 width=12)
   Output: deptno, (count(*))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT deptno, count(*) FROM public.emp GROUP BY 1
(4 rows)

In this case, all of the aggregate calculation happens on the remote server.

Triggers and Check constraints on Foreign tables

We have seen that foreign tables can be writable. PostgreSQL provides features to implement check constraints and triggers on the foreign table as well. This allows us to have powerful capabilities in the local database. For example, all validations and auditing can take place on the local server. The remote DMLs can be audited separately, or a different logic can be applied for local and remote triggers and constraint validations.

Conclusion

FDWs in PostgreSQL, postgres_fdw in particular, provides very powerful and useful features by which, in many cases, we can avoid the complex duplicating and replicating of data. It provides a mechanism for ACID compliant transactions between two database systems. postgres_fdw works as a reference implementation for the development of other fdw implementations. In the coming days we will be covering some of these.

More articles you might enjoy:

If you found this article useful, why not take a look at some of our other posts on PostgreSQL?

 

The post Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com