Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw

Foreign data wrapper FDWs with PostgreSQL postgres_fdw

Foreign data wrapper FDWs with PostgreSQL postgres_fdwThere are a few features in PostgreSQL that are very compelling, and that I rarely see in other RDBMSs. Some of these features are the driving force behind the growing popularity of PostgreSQL. This blog post is about one of my favourite features: FDW (Foreign Data Wrapper). As the name indicates, this feature allows a PostgreSQL database to treat tables in a remote PostgreSQL database as locally available tables.

The history of FDW began when SQL/MED came out as part of the ANSI SQL standard specification in 2003. MED stands for “Management of External Data”. By definition, “external data” is the data that the DBMS is able to access but does not manage. There are two parts for this specification:

  1. Foreign Table : this is about how to access external data sources and present them as relational tables.
  2. Datalink : this extends the functionality of database systems to include control over external files without the need to store their contents directly in the database, such as LOBs. A column of a table could directly refer a file.

PostgreSQL’s FDW capabilities addresses foreign tables only. It was introduced in PostgreSQL 9.1 and has been receiving improvements ever since.

Today there are a variety of FDWs which allow PostgreSQL to talk to most of the data sources we can think of. However, most FDWs are independent open source projects implemented as Postgres Extensions, and not officially supported by the PostgreSQL Global Development Group.


In this blog post we will take a closer look at the postgres_fdw which can be considered as the “reference implementation” for other FDW development efforts, and showcases its capabilities. This is the one FDW which comes with PostgreSQL source as a contrib extension module. The only other FDW which is part of PostgreSQL source tree is file_fdw.

Let’s look into postgres_fdw with a use case. In many organizations, there could be multiple systems catering to different functionalities/departments. For example, while an HR database may be holding the employee information the finance and payroll systems may need to access that same data. A common—but bad—solution for this is to duplicate the data in both systems. Data duplication often leads to problems, starting with data maintenance and accuracy. A smarter option, to avoid duplication while providing access to foreign databases to only the required data, is through FDWs.

Installation postgres_fdw

The Postgres Development Group (PGDG) offers PostgreSQL packages for all major Linux distributions. postgres_fdw itself is provided as a module that is usually included in the contrib package. In the example below we install such package for PostgreSQL 10 running on Red Hat/CentOS:

$ sudo yum install postgresql10-contrib.x86_64

Steps to setup

Let’s consider two PostgreSQL Instances, source instance and a destination instance

  • source is the remote postgres server from where the tables are accessed by the destination database server as foreign tables.
  • destination is another postgres server where the foreign tables are created which is referring tables in source database server.

We are going to use these definitions of source and destination in the rest of the post. Let’s assume that current application connects to destination database using a user app_user.

Step 1 : Create a user on the source

Create a user in the source server using the following syntax. This user account will be used by the destination server to access the source tables

postgres=# CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';

Step 2 : Create test tables (optional)

Let’s create a test table in the source server and insert a few records.

postgres=> create table employee (id int, first_name varchar(20), last_name varchar(20));
postgres=# insert into employee values (1,'jobin','augustine'),(2,'avinash','vallarapu'),(3,'fernando','camargos');

Step 3 : Grant privileges to user in the source

Give appropriate privileges to the fdw_user on the source table. Always try to limit the scope of privilege to minimum to improve security.
An example syntax is as following :

postgres=# GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE employee TO fdw_user;

Step 4 : Modify ACL in pg_hba.conf

We need to ensure that the proper authentication is setup for accessing source server from destination server.
Add an entry into  pg_hba.conf as shown below, preferably at the beginning of the file.

host    all all     destination_server_ip/32          md5

Step 5 : Test connectivity and privileges on source

Before proceeding further, It is a good idea to make sure that we are able to connect to the source machine from this destination machine using the newly created database user (fdw_user).

In order to validate, on the destination server, use psql to connect to the source server:

$ psql -h hr -U fdw_user postgres

You could even validate all privileges on the tables which are to be presented as foreign tables using this connection.

Step 6 : Create postgres_fdw extension on the destination

Connect to destination server, and create the postgres_fdw extension in the destination database from where you wish to access the tables of source server. You must be a superuser to create the extension.

No postgres_fdw extension is needed on the source server.

postgres=# create extension postgres_fdw;

Validate if the extension is created using \dx. Following is an example validation log.

postgres=# \dx postgres_fdw
                            List of installed extensions
    Name    | Version | Schema |                    Description
postgres_fdw | 1.0     | public | foreign-data wrapper for remote PostgreSQL servers
(1 row)

Step 7: Grant privileges to user in the destination

Always better to limit the scope of the server definition to an application user. If a regular user needs to define a server, that user needs to have USAGE permission on the foreign data wrapper. Superuser can grant the privilege

postgres=# grant usage on FOREIGN DATA WRAPPER postgres_fdw to app_user;

Alternatively, superuser (postgres) can create a server definition and then grant USAGE permission on that server definition to the application user like this:

postgres=# GRANT USAGE ON FOREIGN SERVER hr TO app_user;

Step 8: Create a server definition

Now we can create a server definition. This foreign server is created using the connection details of the source server running on host “hr”. Let’s name the foreign server as itself as “hr”

postgres=> CREATE SERVER hr
 OPTIONS (dbname 'postgres', host 'hr', port '5432');

Step 9: Create user mapping from destination user to source user

Create a mapping on the destination side for destination user (app_user) to remote source user (fdw_user)

postgres=> CREATE USER MAPPING for app_user
OPTIONS (user 'fdw_user', password 'secret');

Step 10 : Create foreign table definition on the destination

Create a foreign table in the destination server with the same structure as the source table, but with OPTIONS specifying schema_name and table_name

postgres=# CREATE FOREIGN TABLE employee
(id int, first_name character varying(20), last_name character varying(20))
OPTIONS (schema_name 'public', table_name 'employee');

Step 11 : Test foreign table

Validate whether we can query the foreign table we just created in the destination server.

postgres=> select * from employee;
id | first_name | last_name
1 | jobin | augustine
2 | avinash | vallarapu
3 | fernando | camargos
(3 rows)

As we can see from the above example, data is been accessed from the source database.

Now you might be thinking: “creating foreign tables one by one like this on the destination server is painful. Is it possible to do it automatically?“. The answer is yes – there is an option to import a full schema.

On the destination server, you can use the following syntax to import a schema.

postgres=# IMPORT FOREIGN SCHEMA "public" FROM SERVER hr INTO public;

If you wish to choose a certain list of tables for import, you can use the following syntax.

postgres=# IMPORT FOREIGN SCHEMA "public" limit to (employee) FROM SERVER hr INTO public;

In the above example, it will import the definition of only one table (employee).

Advantages of foreign tables

The main use case of the foreign tables is to make the data available to systems without actually duplicating/replicating it. There are even simple implementations of sharding using FDW, because data in the other shards can be made available for queries though FDWs.

A person coming from an Oracle-like background might think: “I can get data from a remote database table using simple DBLinks so what is the difference?“. The main difference is that FDW will maintain the meta-data/table definition about the foreign table locally. This results in better decisions compared to sending a simple SELECT * FROM <TABLE> to pull all results. We are going to see some of these advantages.

Note : In the following section always pay special attention on those lines starting with “Remote SQL:”

Query optimization

Since the definition of the foreign table is held locally, all query optimizations are made for remote executions too. Let’s consider a slightly more complex example where we have EMP (employee) and DEPT (department) tables in the HR database and SALGRADE (salary grade) table in the finance database. Suppose we want to know how many employees there are with a particular salary grade:


Let’s see how PostgreSQL handles this:

 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: count(*)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=0)
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)

Please pay special attention for the line reading :

Remote SQL: SELECT sal FROM public.emp

It knows that only the sal column need to be fetched from the remote database.
If we change the count(*) to ename (Employee Name) column, the remote SQL changes like:

Remote SQL: SELECT ename, sal FROM public.emp

PostgreSQL tries to pull only the absolutely necessary data from the remote server.

Writable foreign tables

At the beginning, foreign tables were just readable. But, with time, the community introduced writable foreign tables functionality in PostgreSQL. Let us consider the following situation where management wants to give a salary increase of 10% to grade 3 employees:

SET    sal = sal * 1.1
FROM   salgrade
WHERE  emp.sal > salgrade.losal
AND emp.sal < salgrade.hisal
AND salgrade.grade = 3;

In this case, we are updating data on a remote table using a join condition with a local table. As we can see in the explain plan, an UPDATE statement is more complex because it involves 2 steps. First it needs to fetch the data from the remote table to complete the join operation. Then, it updates the rows in the foreign table.

 Update on public.emp  (cost=100.00..300.71 rows=669 width=118)
   Remote SQL: UPDATE public.emp SET sal = $2 WHERE ctid = $1
   ->  Nested Loop  (cost=100.00..300.71 rows=669 width=118)
         Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, (emp.sal * '1.1'::double precision), emp.comm, emp.deptno, emp.ctid, salgrade.ctid
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..128.06 rows=602 width=112)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno, emp.ctid
               Remote SQL: SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno, ctid FROM public.emp FOR UPDATE
         ->  Materialize  (cost=0.00..35.55 rows=10 width=14)
               Output: salgrade.ctid, salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=14)
                     Output: salgrade.ctid, salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 3)

Operator and function pushdown

PostgreSQL 9.5 release included the capability to assess and decide on the safety of pushing a function execution to remote server. Built-in functions are good candidates for this:

SELECT avg(sal)

This statement results in the following query plan

 Foreign Scan  (cost=137.63..186.06 rows=1 width=8)
   Output: (avg(emp.sal))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT avg(sal) FROM public.emp WHERE ((sal > $1::integer))
   InitPlan 1 (returns $0)
     ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=4)
           Output: salgrade.losal
           Filter: (salgrade.grade = 4)

If the planner finds that the majority of records needs to be fetched from a remote server, it may not push the function execution to the remote server. For example:

SELECT avg(sal)

In this case, the planner decides to do the function execution on the local server:

 Aggregate  (cost=805.44..805.45 rows=1 width=8)
   Output: avg(emp.sal)
   ->  Nested Loop  (cost=100.00..798.33 rows=2844 width=8)
         Output: emp.sal
         Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..186.80 rows=2560 width=8)
               Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno
               Remote SQL: SELECT sal FROM public.emp
         ->  Materialize  (cost=0.00..35.55 rows=10 width=8)
               Output: salgrade.losal, salgrade.hisal
               ->  Seq Scan on public.salgrade  (cost=0.00..35.50 rows=10 width=8)
                     Output: salgrade.losal, salgrade.hisal
                     Filter: (salgrade.grade = 4)
(13 rows)

A great improvement in PostgreSQL 9.6 is that the function does’t need to be even a built-in function. If a user defined function or operator is immutable it becomes a good candidate for being executed in the remote server.

Join push down

In many cases, it is worth pushing down the entire join operations to the remote server in such a way only the results need to be fetched to the local server. PostgreSQL handles this switching intelligently. Here’s an example:

FROM EMP JOIN  DEPT ON EMP.deptno = DEPT.deptno AND DEPT.deptno=10;

 Foreign Scan  (cost=100.56..194.84 rows=1 width=8)
   Output: (count(*))
   Relations: Aggregate on ((public.emp) INNER JOIN (public.dept))
   Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))
(4 rows)

Predicate push down

There are two options when executing a query against a foreign table:

  1. Fetch the data locally and apply the predicates like filtering condition locally.
  2. Send the filtering condition to the remote server and have it applied there.

The latter will can be the best option in many cases.

If you consider the previous example, we can see that  the predicate specification like “DEPT.deptno=10;” is pushed down to the remote server through foreign tables and applied there separately like this:

Remote SQL: SELECT count(*) FROM (public.emp r1 INNER JOIN public.dept r2 ON (((r2.deptno = 10)) AND ((r1.deptno = 10))))

PostgreSQL not only pushed the predicate, it also rewrote the query we sent to avoid one extra AND condition.

Aggregate push down

Just like predicate push down, here PostgreSQL also considers 2 options:

  1.  Execute the aggregates on the remote server and pull the result back to the local server
  2. Do the aggregate calculations on the local database instance after collecting all required data from remote database

We’ve already seen an aggregate pushdown example as part of the function pushdown, since we’ve used an aggregate function for that example. Here’s another simple example:

postgres=# explain verbose select deptno,count(*) from emp group by deptno;
                            QUERY PLAN
 Foreign Scan  (cost=114.62..159.88 rows=200 width=12)
   Output: deptno, (count(*))
   Relations: Aggregate on (public.emp)
   Remote SQL: SELECT deptno, count(*) FROM public.emp GROUP BY 1
(4 rows)

In this case, all of the aggregate calculation happens on the remote server.

Triggers and Check constraints on Foreign tables

We have seen that foreign tables can be writable. PostgreSQL provides features to implement check constraints and triggers on the foreign table as well. This allows us to have powerful capabilities in the local database. For example, all validations and auditing can take place on the local server. The remote DMLs can be audited separately, or a different logic can be applied for local and remote triggers and constraint validations.


FDWs in PostgreSQL, postgres_fdw in particular, provides very powerful and useful features by which, in many cases, we can avoid the complex duplicating and replicating of data. It provides a mechanism for ACID compliant transactions between two database systems. postgres_fdw works as a reference implementation for the development of other fdw implementations. In the coming days we will be covering some of these.

More articles you might enjoy:

If you found this article useful, why not take a look at some of our other posts on PostgreSQL?


The post Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw appeared first on Percona Database Performance Blog.


Talla builds a smarter customer knowledge base

Talla is taking aim at the customer service industry with its latest release, an AI-infused knowledge base. Today, the company released version 2.0 of the Talla Intelligent Knowledge Base.

The company also announced that Paula Long, most recently CEO at Data Gravity, has joined the company as SVP of engineering.

This tool combines customer content with automation, chatbots and machine learning. It’s designed to help teams who work directly with customers get at the information they need faster and the machine learning element should allow it to improve over time.

You can deploy the product as a widget on your website to give customers direct access to the information, but Rob May, company founder and CEO says the most common use case involves helping sales, customer service and customer success teams get access to the most relevant and current information, whether that’s maintenance or pricing.

The information can get into the knowledge base in several ways. First of all you can enter elements like product pages and FAQs directly in the Talla product as with any knowledge base. Secondly if an employee asks a question and there isn’t an adequate answer, it exposes the gaps in information.

Talla Knowledge Base gap list. Screenshot: Talla

“It really shows you the unknown unknowns in your business. What are the questions people are asking that you didn’t realize you don’t have content for or you don’t have answers for. And so that allows you to write new content and better content,” May explained.

Finally, the company can import information into the knowledge base from Salesforce, ServiceNow, Jira or wherever it happens to live, and that can be added to a new page or incorporated into existing page as appropriate.

Employees interact with the system by asking a bot questions and it supplies the answers if one exists. It works with Slack, Microsoft Teams or Talla Chat.

Talla bot in action in Talla Chat. Screenshot: Talla

Customer service remains a major pain point for many companies. It is the direct link to customers when they are having issues. A single bad experience can taint a person’s view of a brand, and chances are when a customer is unhappy they let their friends know on social media, making an isolated incident much bigger. Having quicker access to more accurate information could help limit negative experiences.

Today’s announcement builds on an earlier version of the product that took aim at IT help desks. Talla found customers kept asking for a solution that provided similar functionality with customer-facing information and they have tuned it for that.

May launched Talla in 2015 after selling his former startup Backupify to Datto in 2014. The company, which is based near Boston, has raised $12.3 million.

-- promises standardized open source IoT device security

IoT devices currently lack a standard way of applying security. It leaves consumers, whether business or individuals, left to wonder if their devices are secure and up-to-date., a company that launched today, wants to change that by offering a standard way to secure devices and deliver updates over the air.

“Our mission is solving the problem of IoT and embedded space where there is no standardized core platform like Android for phones,” CEO George Grey explained.

What Foundries has created is an open and secure solution that saves everyone from creating their own and reinventing the wheel every time. Grey says Foundries’ approach is not only secure, it provides a long-term solution to the device update problem by providing a way to deliver updates over the air in an automated manner on any device from tiny sensors to smart thermostats to autonomous cars.

He says this approach will allow manufacturers to apply security patches in a similar way that Apple applies regular updates to iOS. “Manufacturers can continuously make sure their devices can be updated with the latest software to fix security flaws or Zero Day flaws,” he said.

The company offers two solutions, depending on the size and complexity of your device. The Zephyr RTOS microPlatform is designed for smaller, less complex devices. For those that are more complex, Foundries offers a version of Linux called the Linux OE microPlatform.


Grey claims that these platforms free manufacturers to build secure devices without having to hire a team of security experts. But he says the real beauty of the product is that the more people who use it, the more secure it will get, as more and more test it against their products in a virtuous cycle.

You may be wondering how they can make money in this model, but they do it by charging a flat fee of $10,000 per year for Zephyr RTOS and $25,000 per year for Linux OE. These are one-time prices and apply by the product, regardless of how many units get sold and there is no lock-in, according to Grey. Companies are free to back out any time. “If you want to stop subscribing you take over maintenance and you still have access to everything up to the point,. You just have to arrange maintenance yourself,” he said.

There is also a hobbyist and education package for $10 a month.

The company spun off from research at Linaro, an organization that promotes development on top of ARM chips.

To be successful, needs to build a broad community of manufacturers. Today’s launch is the first step in that journey. If it eventually takes off, it has the potential to provide a consistent way of securing and updating IoT devices, a move which would certainly be welcome.


Semmle, startup that makes code searchable, hauls in $21M Series B

Semmle, a startup that originally spun out of research at Oxford, announced a $21 million Series B investment today led by Accel Partners. It marked the second time Accel has led an investment in the company.

Work-Bench also participated in the round. Today’s investment brings the total to $31 million.

Semmle has warranted this kind of interest by taking a unique approach to finding vulnerabilities in code. “The key idea behind our technology is to treat code as data and treat analysis problems as simple queries against a database. What this allows you to do is very easily encode domain expertise, security expertise or any other kinds of specialist knowledge in such a way it can be easily and automatically applied to large amounts of code,” Pavel Avgustinov, Semmle co-founder and VP of platform engineering told TechCrunch.

Screenshot: Semmle

Once you create the right query, you can continuously run it against your code to prevent the same mistakes from entering the code base on subsequent builds. The key here is building the queries and the company has a couple of ways to deal with that.

They can work with customers to help them create queries, although in the long run that is not a sustainable way of working. Instead, they share queries, and encourage customers to share them with the community.

“What we find is that the great tech companies we work with have the best security teams in the world, and they are giving back what they created on the Semmle platform with other users in an open source fashion. There is a GitHub repository where we publish queries, but Microsoft and Google are doing the same thing,” Oege de Moor, company CEO and co-founder explained.

In fact, the Semmle solution is freely available to open source programmers to use with their applications, and the company currently analyzes every commit of almost 80,000 open source projects. Open source developers can run shared queries against their code or create their own.

They also have a paid version with customers like Microsoft, Google, Credit Suisse, NASA and Nasdaq. They have relied mostly on these strategic partners up until now. With today’s investment they plan to build out their sales and marketing departments to expand their customer base into a wider enterprise market.

The company spun out of research at Oxford University in 2006. They are now based in San Francisco with 60 employees, a number that should go up with this investment. They received an $8 million Series A in 2014 and $2 million seed round in 2011.


Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost

AWS EC2 MySQL cost savings

AWS EC2 MySQL cost savingsIf you are using large EBS GP2 volumes for MySQL (i.e. 10TB+) on AWS EC2, you can increase performance and save a significant amount of money by moving to local SSD (NVMe) instance storage. Interested? Then read on for a more detailed examination of how to achieve cost-benefits and increase performance from this implementation.

EBS vs Local instance store

We have heard from customers that large EBS GP2 volumes can be affected by short term outages—IO “stalls” where no IO is going in or out for a couple of minutes. This can happen, especially, in the largest AWS region us-east-1. Statistically, with so many disks in disk arrays (which back EBS volumes) we can expect frequent disk failures. If we allocate a very large EBS GP2 volume, i.e. 10Tb+, hitting such failure events can be common.

In the case of MySQL/InnoDB, such an IO “stall” will be obvious, particularly with the highly loaded system where MySQL needs to do physical IO. During the stall, you will see all write queries are waiting, or “hang”.  Some of the writes may error out with “Error 1030” (MySQL error code 1030 (ER_GET_ERRNO): Got error %d from storage engine). There is nothing MySQL can do here – if the IO subsystem is not available, it will need to wait for it.

The good news is: many of the newer EC2 instances (i.e. i3, m5d, etc) have local SSD disks attached (NVMe). Those disks are local to the physical server and should not suffer from the EBS issues described above. Using local disks can be a very good solution:

  1. They are faster, as they are local to the server, and do not suffer from the EBS issues
  2. They are much cheaper compared to large EBS volumes.

Please note, however, that local storage does not guarantee persistence. More about this below.

Another potential option will be to use IO1 volumes with provisional IOPS. However, it will be significantly more expensive for the large volumes and high traffic.

A look at costs

To estimate the costs, I’ve used the AWS simple monthly calculator. Estimated costs are based on 1 year reserved instances. Let’s imagine we will need to use 14TB volume (to store ~10Tb of MySQL data including binary logs). The pricing estimates will look like this:

r4.4xlarge, 122GB RAM, 16 vCPUs + EBS, 14TB volume (this is what we are presumably using now)

Amazon EC2 Service (US East (N. Virginia)) $ 1890.56 / month
Compute: $ 490.56
EBS Volumes: $1400.00

Local storage price estimate:
i3.4xlarge, 122GB RAM, 16 vCPUs, 3800 GiB disk (2 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $ 627.21 / month
Compute: $ 625.61

i3.8xlarge, 244GB RAM, 32 vCPUs, 7600 GiB disk (4 x 1900 NVMe SSD)

Amazon EC2 Service (US East (N. Virginia)) $1252.82 / month
Compute: $ 1251.22

As we can see, even if we switch to i3.8xlarge and get 2x more RAM and 2x more virtual CPUs, faster storage, 10 gigabit network we can still pay 1.5x less per box what we are presumably paying now. Include replication, then that’s paying 1.5x less per each of the replication servers.

But wait … there is a catch.

How to migrate to local storage from EBS

Well, we have some challenges here to migrate from EBS to local instance NVMe storage.

  1. Wait, we are storing ~10Tb and i3.8xlarge have 7600 GiB disk. The answer is simple: compression (see below)
  2. Wait, but the local storage is ephemeral, if we loose the box we will loose our data – that is unacceptable.  The answer is also simple: replication (see below)
  3. Wait, but we use EBS snapshots for backups. That answer is simple too: we can still use EBS (and use snapshots) on 1 of the replication slave (see below)


To fit i3.8xlarge we only need 2x compression. This can be done with InnoDB row compression (row_format=compressed) or InnoDB page compression, which requires sparse file and hole punching support. However, InnoDB compression may be slower and will only compress ibd files—it does not compress binary logs, frm files, etc.


Another option: use the ZFS filesystem. ZFS will compress all files, including binary logs and frm. That can be very helpful if we use a “schema per customer” or “table per customer” approach and need to store 100K – 200K tables in a single MySQL instance. If the data is compressible, or new tables were provisioned without much data in those, ZFS can give a significant disk savings.

I’ve used ZFS (followed Yves blog post, Hands-On Look at ZFS with MySQL). Here are the results of data compression with ZFS (this is real data, not a generated data):

# du -sh --apparent-size /mysqldata/mysql/data
8.6T	/mysqldata/mysql/data
# du -sh /mysqldata/mysql/data
3.2T	/mysqldata/mysql/data

Compression ratio:

# zfs get all | grep -i compress
mysqldata/mysql/data  compressratio         2.42x                  -
mysqldata/mysql/data  compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/data  refcompressratio      2.42x                  -
mysqldata/mysql/log   compressratio         3.75x                  -
mysqldata/mysql/log   compression           gzip                   inherited from mysqldata/mysql
mysqldata/mysql/log   refcompressratio      3.75x                  -

As we can see, the original 8.6Tb of data was compressed to 3.2Tb, the compression ratio for MySQL tables is 2.42x, for binary logs 3.75x. That will definitely fit i3.8xlarge.

(For another test, I’ve generated 40 million tables spread across multiple schemas (databases). I’ve added some data only to one schema, leaving others blank. For that test I achieved ~10x compression ratio.)

Conclusion: ZFS can provide you with very good compression ratio, will allow you to use different EC2 instances on AWS, and save you a substantial amount of money. Although compression is not free performance-wise, and ZFS can be slower for some workloads, using local NVMe storage can compensate.

You can find some performance testing for ZFS on linux in this blog post: About ZFS Performance. Some benchmarks comparing EBS and local NVMe SSD storage (i3 instances) can be found in this blog post: Percona XtraDB Cluster on Amazon GP2 Volumes


Another option for compression would be using the MyRocks storage engine in Percona Server for MySQL, which provides compression.

Replication and using local volumes

As the local instance storage is ephemeral we need redundancy: we can use MySQL replication or Percona XtraDB cluster (PXC). In addition, we can use one replication slave—or we can attach a replication slave to PXC—and have it use EBS storage.

Local storage is not durable. If you stop the instance and then start it again, the local storage will probably disappear. (Though reboot is an exception, you can reboot the instance and the local storage will be fine.) In addition if the local storage disappears we will have to recreate MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

For example, using MySQL replication:

master - local storage (AZ 1, i.e. 1a)
-> slave1 - local storage (AZ 2, i.e. 1b)
-> slave2 - ebs storage (AZ 3, i.e. 1c)
   (other replication slaves if needed with local storage - optional)

MySQL Master AZ 1a, Local storage

Then we can use slave2 for ebs snapshots (if needed). This slave will be more expensive (as it is using EBS) but it can also be used to either serve production traffic (i.e. we can place smaller amount of traffic) or for other purposes (for example analytical queries, etc).

For Percona XtraDB cluster (PXC) we can just use 3 nodes, 1 in each AZ. PXC uses auto-provisioning with SST if the new node comes back blank. For MySQL replication we need some additional things:

  1. Failover from master to a slave if the master will go down. This can be done with MHA or Orchestrator
  2. Ability to clone slave. This can be done with Xtrabackup or ZFS snapshots (if using ZFS)
  3. Ability to setup a new MySQL local storage partition (for ZFS, i.e. zpool create or for EXT4/XFS, i.e. mkfs)

Other options

Here are some totally different options we could consider:

  1. Use IO1 volumes (as discussed). That can be way more expensive.
  2. Use local storage and MyRocks storage engine. However, switching to another storage engine is another bigger project and requires lots of testing
  3. Switch to AWS Aurora. That can be even more expensive for this particular case; and switching to aurora can be another big project by itself.


  1. Using EC2 i3 instances with local NVMe storage can increase performance and save money. There are some limitations: local storage is ephemeral and will disappear if the node has stopped. Reboot is fine.
  2. ZFS filesystem with compression enabled can decrease the storage requirements so that a MySQL instance will fit into local storage. Another option for compression could be to use InnoDB compression (row_format=compressed).

That may not work for everyone as it requires additional changes to the existing server provisioning: failover from master to a slave, ability to clone replication slaves (or use PXC), ability to setup a new MySQL local storage partition, using compression.

The post Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost appeared first on Percona Database Performance Blog.


Webinar Tues 8/21: MariaDB 10.3 vs. MySQL 8.0

MariaDB 10.3 vs MySQL 8.0

MariaDB 10.3 vs MySQL 8.0Please join Percona’s Chief Evangelist, Colin Charles on Tuesday, August 21st, 2018, as he presents MariaDB 10.3 vs. MySQL 8.0 at 7:00 AM PDT (UTC-7) / 10:00 PM EDT (UTC-4).

Are they syntactically similar? Where do these two languages differ? Why would I use one over the other?

MariaDB 10.3 is on the path of gradually diverging from MySQL 8.0. One obvious example is the internal data dictionary currently under development for MySQL 8.0. This is a major change to the way metadata is stored and used within the server: MariaDB doesn’t have an equivalent feature. Implementing this feature could mark the end of datafile-level compatibility between MySQL and MariaDB.

There are also non-technical differences between MySQL 8.0 and MariaDB 10.4, including:

  • Licensing: MySQL offers their code as open-source under the GPL, and provides the option of non-GPL commercial distribution in the form of MySQL Enterprise. MariaDB can only use the GPL because their work is derived from the MySQL source code under the terms of that license.
  • Support services: Oracle provides technical support, training, certification and consulting for MySQL, while MariaDB has their own support services. Some people prefer working with smaller companies, as traditionally it affords them more leverage as a customer.
  • Community contributions: MariaDB touts the fact that they accept more community contributions than Oracle. Part of the reason for this disparity is that developers like to contribute features, bug fixes and other code without a lot of paperwork overhead (and they complain about the Oracle Contributor Agreement). However, MariaDB has its own MariaDB Contributor Agreement — which more or less serves the same purpose.

Colin will take a look at some of the differences between MariaDB 10.3 and MySQL 8.0 and help answer some of the common questions our Database Performance Experts get about the two databases.

Register Now

The post Webinar Tues 8/21: MariaDB 10.3 vs. MySQL 8.0 appeared first on Percona Database Performance Blog.


Distributed teams are rewriting the rules of office(less) politics

When we think about designing our dream home, we don’t think of having a thousand roommates in the same room with no doors or walls. Yet in today’s workplace where we spend most of our day, the purveyors of corporate office design insist that tearing down walls and bringing more people closer together in the same physical space will help foster better collaboration while dissolving the friction of traditional hierarchy and office politics.

But what happens when there is no office at all?

This is the reality for Jason Fried, Founder and CEO of Basecamp, and Matt Mullenweg, Founder and CEO of Automattic (makers of WordPress), who both run teams that are 100% distributed across six continents and many time zones. Fried and Mullenweg are the founding fathers of a movement that has inspired at least a dozen other companies to follow suit, including Zapier, Github, and Buffer. Both have either written a book, or have had a book written about them on the topic.

For all of the discussions about how to hire, fire, coordinate, motivate, and retain remote teams though, what is strangely missing is a discussion about how office politics changes when there is no office at all. To that end, I wanted to seek out the experience of these companies and ask: does remote work propagate, mitigate, or change the experience of office politics? What tactics are startups using to combat office politics, and are any of them effective?

“Can we take a step back here?”

Office politics is best described by a simple example. There is a project, with its goals, metrics, and timeline, and then there’s who gets to decide how it’s run, who gets to work on it, and who gets credit for it. The process for deciding this is a messy human one. While we all want to believe that these decisions are merit-based, data-driven, and objective, we all know the reality is very different. As a flood of research shows, they come with the baggage of human bias in perceptions, heuristics, and privilege.

Office politics is the internal maneuvering and positioning to shape these biases and perceptions to achieve a goal or influence a decision. When incentives are aligned, these goals point in same direction as the company. When they don’t, dysfunction ensues.

Perhaps this sounds too Darwinian, but it is a natural and inevitable outcome of being part of any organization where humans make the decisions. There is your work, and then there’s the management of your coworker’s and boss’s perception of your work.

There is no section in your employee handbook that will tell you how to navigate office politics. These are the tacit, unofficial rules that aren’t documented. This could include reworking your wardrobe to match your boss’s style (if you don’t believe me, ask how many people at Facebook own a pair of Nike Frees). Or making time to go to weekly happy hour not because you want to, but because it’s what you were told you needed to do to get ahead.

One of my favorite memes about workplace culture is Sarah Cooper’s “10 Tricks to Appear Smart in Meetings,” which includes…

  • Encouraging everyone to “take a step back” and ask “what problem are we really trying to solve”
  • Nodding continuously while appearing to take notes
  • Stepping out to take an “important phone call”
  • Jumping out of your seat to draw a Venn diagram on the whiteboard

Sarah Cooper, The Cooper Review

These cues and signals used in physical workplaces to shape and influence perceptions do not map onto the remote workplace, which gives us a unique opportunity to study how office politics can be different through the lens of the officeless.

Friends without benefits

For employees, the analogy that coworkers are like family is true in one sense — they are the roommates that we never got to choose. Learning to work together is difficult enough, but the physical office layers on the additional challenge of learning to live together. Contrast this with remote workplaces, which Mullenweg of Automattic believes helps alleviate the “cohabitation annoyances” that come with sharing the same space, allowing employees to focus on how to best work with each other, versus how their neighbor “talks too loud on the phone, listens to bad music, or eats smelly food.”

Additionally, remote workplaces free us of the tyranny of the tacit expectations and norms that might not have anything to do with work itself. At an investment bank, everyone knows that analysts come in before the managing director does, and leave after they do. This signals that you’re working hard.

Basecamp’s Fried calls this the “presence prison,” the need to be constantly aware of where your coworkers are and what they are doing at all times, both physically and virtually. And he’s waging a crusade against it, even to the point of removing the green dot on Basecamp’s product. “As a general rule, nobody at Basecamp really knows where anyone else is at any given moment. Are they working? Dunno. Are they taking a break? Dunno. Are they at lunch? Dunno. Are they picking up their kid from school? Dunno. Don’t care.”

There is credible basis for this practice. A study of factory workers by Harvard Business School showed that workers were 10% to 15% more productive when managers weren’t watching. This increase was attributed to giving workers the space and freedom to experiment with different approaches before explaining to managers, versus the control group which tended to follow prescribed instructions under the leery watch of their managers.

Remote workplaces experience a similar phenomenon, but by coincidence. “Working hard” can’t be observed physically so it has to be explained, documented, measured, and shared across the company. Cultural norms are not left to chance, or steered by fear or pressure, which should give individuals the autonomy to focus on the work itself, versus how their work is perceived.

Lastly, while physical workplaces can be the source of meaningful friendships and community, recent research by the Wharton School of Business is just beginning to unravel the complexities behind workplace friendships, which can be fraught with tensions from obligations, reciprocity and allegiances. When conflicts arise, you need to choose between what’s best for the company, and what’s best for your relationship with that person or group. You’re not going to help Bob because your best friend Sally used to date him and he was a dick. Or you’re willing to do anything for Jim because he coaches your kid’s soccer team, and vouched for you to get that promotion.

In remote workplaces, you don’t share the same neighborhood, your kids don’t go to the same school, and you don’t have to worry about which coworkers to invite to dinner parties. Your physical/personal and work communities don’t overlap, which means you (and your company) unintentionally avoid many of the hazards of toxic workplace relationships.

On the other hand, these same relationships can be important to overall employee engagement and well-being. This is evidenced by one of the findings in Buffer’s 2018 State of Remote Work Report, which surveyed over 1900 remote workers around the world. It found that next to collaborating and communicating, loneliness was the biggest struggle for remote workers.

Graph by Buffer (State of Remote Work 2018)

So while you may be able to feel like your own boss and avoid playing office politics in your home office, ultimately being alone may be more challenging than putting on a pair of pants and going to work.

Feature, not a bug?

Physical offices can have workers butting heads with each other. Image by UpperCut Images via Getty Images.

For organizations, the single biggest difference between remote and physical teams is the greater dependence on writing to establish the permanence and portability of organizational culture, norms and habits. Writing is different than speaking because it forces concision, deliberation, and structure, and this impacts how politics plays out in remote teams.

Writing changes the politics of meetings. Every Friday, Zapier employees send out a bulletin with: (1) things I said I’d do this week and their results, (2) other issues that came up, (3) things I’m doing next week. Everyone spends the first 10 minutes of the meeting in silence reading everyone’s updates.

Remote teams practice this context setting out of necessity, but it also provides positive auxiliary benefits of “hearing” from everyone around the table, and not letting meetings default to the loudest or most senior in the room. This practice can be adopted by companies with physical workplaces as well (in fact, Zapier CEO Wade Foster borrowed this from Amazon), but it takes discipline and leadership to change behavior, particularly when it is much easier for everyone to just show up like they’re used to.

Writing changes the politics of information sharing and transparency. At Basecamp, there are no all-hands or town hall meetings. All updates, decisions, and subsequent discussions are posted publicly to the entire company. For companies, this is pretty bold. It’s like having a Facebook wall with all your friends chiming in on your questionable decisions of the distant past that you can’t erase. But the beauty is that there is now a body of written decisions and discussions that serves as a rich and permanent artifact of institutional knowledge, accessible to anyone in the company. Documenting major decisions in writing depoliticizes access to information.

Remote workplaces are not without their challenges. Even though communication can be asynchronous through writing, leadership is not. Maintaining an apolitical culture (or any culture) requires a real-time feedback loop of not only what is said, but what is done, and how it’s done. Leaders lead by example in how they speak, act, and make decisions. This is much harder in a remote setting.

A designer from WordPress notes the interpersonal challenges of leading a remote team. “I can’t always see my teammates’ faces when I deliver instructions, feedback, or design criticism. I can’t always tell how they feel. It’s difficult to know if someone is having a bad day or a bad week.”

Zapier’s Foster is also well aware of these challenges in interpersonal dynamics. In fact, he has written a 200-page manifesto on how to run remote teams, where he has an entire section devoted to coaching teammates on how to meet each other for the first time. “Because we’re wired to look for threats in any new situation… try to limit phone or video calls to 15 minutes.” Or “listen without interrupting or sharing your own stories.” And to “ask short, open ended questions.” For anyone looking for a grade school refresher on how to make new friends, Wade Foster is the Dale Carnegie of the remote workforce.

To office, or not to office

What we learn from companies like Basecamp, Automattic, and Zapier is that closer proximity is not the antidote for office politics, and certainly not the quick fix for a healthy, productive culture.

Maintaining a healthy culture takes work, with deliberate processes and planning. Remote teams have to work harder to design and maintain these processes because they don’t have the luxury of assuming shared context through a physical workspace.

The result is a wealth of new ideas for a healthier, less political culture — being thoughtful about when to bring people together, and when to give people their time apart (ending the presence prison), or when to speak, and when to read and write (to democratize meetings). It seems that remote teams have largely succeeded in turning a bug into a feature. For any company still considering tearing down those office walls and doors, it’s time to pay attention to the lessons of the officeless.


Percona Server for MySQL 5.6.41-84.1 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.6.41-84.1 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.6.41, including all the bug fixes in it. Percona Server for MySQL 5.6.41-84.1 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • A simple SELECT query on a table with CHARSET=euckr COLLATE=euckr_bin could return different results each time it was executed. Bug fixed #4513 (upstream 91091).
  • Percona Server 5.6.39-83.1 could crash when altering an InnoDB table that has a full-text search index defined. Bug fixed #3851 (upstream 68987).
Other Bugs Fixed
  • #3325 “online upgrade GTID cause binlog damage in high write QPS situation”
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”
  • #4506 “Backport fixes from 8.0 for InnoDB memcached Plugin”

Find the release notes for Percona Server for MySQL 5.6.41-84.1 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.6.41-84.1 Is Now Available appeared first on Percona Database Performance Blog.


Percona Server for MySQL 5.5.61-38.13 Is Now Available

Percona Server for MySQL

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.5.61-38.13 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.5.61, including all the bug fixes in it. Percona Server for MySQL 5.5.61-38.13 is now the current GA release in the 5.5 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • The --innodb-optimize-keys option of the mysqldump utility fails when a column name is used as a prefix of a column which has the AUTO_INCREMENT attribute. Bug fixed #4524.
Other Bugs Fixed
  • #4566 “stack-use-after-scope in reinit_io_cache()” (upstream 91603)
  • #4581 “stack-use-after-scope in _db_enter_() / mysql_select_db()” (upstream 91604)
  • #4600 “stack-use-after-scope in _db_enter_() / get_upgrade_info_file_name()” (upstream 91617)
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”

Find the release notes for Percona Server for MySQL 5.5.61-38.13 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.5.61-38.13 Is Now Available appeared first on Percona Database Performance Blog.


Incentivai launches to simulate how hackers break blockchains

Cryptocurrency projects can crash and burn if developers don’t predict how humans will abuse their blockchains. Once a decentralized digital economy is released into the wild and the coins start to fly, it’s tough to implement fixes to the smart contracts that govern them. That’s why Incentivai is coming out of stealth today with its artificial intelligence simulations that test not just for security holes, but for how greedy or illogical humans can crater a blockchain community. Crypto developers can use Incentivai’s service to fix their systems before they go live.

“There are many ways to check the code of a smart contract, but there’s no way to make sure the economy you’ve created works as expected,” says Incentivai’s solo founder Piotr Grudzie?. “I came up with the idea to build a simulation with machine learning agents that behave like humans so you can look into the future and see what your system is likely to behave like.”

Incentivai will graduate from Y Combinator next week and already has a few customers. They can either pay Incentivai to audit their project and produce a report, or they can host the AI simulation tool like a software-as-a-service. The first deployments of blockchains it’s checked will go out in a few months, and the startup has released some case studies to prove its worth.

“People do theoretical work or logic to prove that under certain conditions, this is the optimal strategy for the user. But users are not rational. There’s lots of unpredictable behavior that’s difficult to model,” Grudzie? explains. Incentivai explores those illogical trading strategies so developers don’t have to tear out their hair trying to imagine them.

Protecting crypto from the human x-factor

There’s no rewind button in the blockchain world. The immutable and irreversible qualities of this decentralized technology prevent inventors from meddling with it once in use, for better or worse. If developers don’t foresee how users could make false claims and bribe others to approve them, or take other actions to screw over the system, they might not be able to thwart the attack. But given the right open-ended incentives (hence the startup’s name), AI agents will try everything they can to earn the most money, exposing the conceptual flaws in the project’s architecture.

“The strategy is the same as what DeepMind does with AlphaGo, testing different strategies,” Grudzie? explains. He developed his AI chops earning a masters at Cambridge before working on natural language processing research for Microsoft.

Here’s how Incentivai works. First a developer writes the smart contracts they want to test for a product like selling insurance on the blockchain. Incentivai tells its AI agents what to optimize for and lays out all the possible actions they could take. The agents can have different identities, like a hacker trying to grab as much money as they can, a faker filing false claims or a speculator that cares about maximizing coin price while ignoring its functionality.

Incentivai then tweaks these agents to make them more or less risk averse, or care more or less about whether they disrupt the blockchain system in its totality. The startup monitors the agents and pulls out insights about how to change the system.

For example, Incentivai might learn that uneven token distribution leads to pump and dump schemes, so the developer should more evenly divide tokens and give fewer to early users. Or it might find that an insurance product where users vote on what claims should be approved needs to increase its bond price that voters pay for verifying a false claim so that it’s not profitable for voters to take bribes from fraudsters.

Grudzie? has done some predictions about his own startup too. He thinks that if the use of decentralized apps rises, there will be a lot of startups trying to copy his approach to security services. He says there are already some doing token engineering audits, incentive design and consultancy, but he hasn’t seen anyone else with a functional simulation product that’s produced case studies. “As the industry matures, I think we’ll see more and more complex economic systems that need this.”

Powered by WordPress | Theme: Aeros 2.0 by