Nov
22
2019
--

UUIDs are Popular, but Bad for Performance — Let’s Discuss

UUID popular

If you do a quick web search about UUIDs and MySQL, you’ll get a fair number of results. Here are just a few examples:

So, does a well-covered topic like this one needs any more attention? Well, apparently – yes. Even though most posts are warning people against the use of UUIDs, they are still very popular. This popularity comes from the fact that these values can easily be generated by remote devices, with a very low probability of collision. With this post, my goal is to summarize what has already been written by others and, hopefully, bring in a few new ideas.

What are UUIDs?

UUID stands for Universally Unique IDentifier and is defined in the RFC 4122. It is a 128 bits number, normally written in hexadecimal and split by dashes into five groups. A typical UUID value looks like:

yves@laptop:~$ uuidgen 
83fda883-86d9-4913-9729-91f20973fa52

There are officially 5 types of UUID values, version 1 to 5, but the most common are: time-based (version 1 or version 2) and purely random (version 3). The time-based UUIDs encode the number of 10ns since January 1st, 1970 in 7.5 bytes (60 bits), which is split in a “time-low”-“time-mid”-“time-hi” fashion. The missing 4 bits is the version number used as a prefix to the time-hi field.  This yields the 64 bits of the first 3 groups. The last 2 groups are the clock sequence, a value incremented every time the clock is modified and a host unique identifier. Most of the time, the MAC address of the main network interface of the host is used as a unique identifier.

There are important points to consider when you use time-based UUID values:

  • It is possible to determine the approximated time when the value was generated from the first 3 fields
  • There are many repetitive fields between consecutive UUID values
  • The first field, “time-low”, rolls over every 429s
  • The MySQL UUID function produces version one values

Here’s an example using the “uuidgen” Unix tool to generate time-based values:

yves@laptop:~$ for i in $(seq 1 500); do echo "$(date +%s): $(uuidgen -t)"; sleep 1; done
1573656803: 572e4122-0625-11ea-9f44-8c16456798f1
1573656804: 57c8019a-0625-11ea-9f44-8c16456798f1
1573656805: 586202b8-0625-11ea-9f44-8c16456798f1
...
1573657085: ff86e090-0625-11ea-9f44-8c16456798f1
1573657086: 0020a216-0626-11ea-9f44-8c16456798f1
...
1573657232: 56b943b2-0626-11ea-9f44-8c16456798f1
1573657233: 57534782-0626-11ea-9f44-8c16456798f1
1573657234: 57ed593a-0626-11ea-9f44-8c16456798f1
...

The first field rolls over (at t=1573657086) and the second field is incremented. It takes about 429s to see similar values again for the first field. The third field changes only once per about a year. The last field is static on a given host, the MAC address is used on my laptop:

yves@laptop:~$ ifconfig | grep ether | grep 8c
    	ether 8c:16:45:67:98:f1  txqueuelen 1000  (Ethernet)

The other frequently seen UUID version is 4, the purely random one. By default, the Unix “uuidgen” tool produces UUID version 4 values:

yves@laptop:~$ for i in $(seq 1 3); do uuidgen; done
6102ef39-c3f4-4977-80d4-742d15eefe66
14d6e343-028d-48a3-9ec6-77f1b703dc8f
ac9c7139-34a1-48cf-86cf-a2c823689a91

The only “repeated” value is the version, “4”, at the beginning of the 3rd field. All the other 124 bits are random.

What is so Wrong with UUID Values?

In order to appreciate the impact of using UUID values as a primary key, it is important to review how InnoDB organizes the data. InnoDB stores the rows of a table in the b-tree of the primary key. In database terminology, we call this a clustered index. The clustered index orders the rows automatically by the primary key.

When you insert a new row with a random primary key value, InnoDB has to find the page where the row belongs, load it in the buffer pool if it is not already there, insert the row and then, eventually, flush the page back to disk. With purely random values and large tables, all b-tree leaf pages are susceptible to receive the new row, there are no hot pages. Rows inserted out of the primary key order cause page splits causing a low filling factor. For tables much larger than the buffer pool, an insert will very likely need to read a table page from disk. The page in the buffer pool where the new row has been inserted will then be dirty.  The odds the page will receive a second row before it needs to be flushed to disk are very low. Most of the time, every insert will cause two IOPs – one read and one write. The first major impact is on the rate of IOPs and it is a major limiting factor for scalability.

The only way to get decent performance is thus to use storage with low latency and high endurance. That’s where you’ll the second major performance impact. With a clustered index, the secondary indexes use the primary key values as the pointers. While the leaves of the b-tree of the primary key store rows, the leaves of the b-tree of a secondary index store primary key values.

Let’s assume a table of 1B rows having UUID values as primary key and five secondary indexes. If you read the previous paragraph, you know the primary key values are stored six times for each row. That means a total of 6B char(36) values representing 216 GB. That is just the tip of the iceberg, as tables normally have foreign keys, explicit or not, pointing to other tables. When the schema is based on UUID values, all these columns and indexes supporting them are char(36). I recently analyzed a UUID based schema and found that about 70 percent of storage was for these values.

As if that’s not enough, there’s a third important impact of using UUID values. Integer values are compared up to 8 bytes at a time by the CPU but UUID values are compared char per char. Databases are rarely CPU bound, but nevertheless this adds to the latencies of the queries. If you are not convinced, look at this performance comparison between integers vs strings:

mysql> select benchmark(100000000,2=3);
+--------------------------+
| benchmark(100000000,2=3) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (0.96 sec)

mysql> select benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='df878007-80da-11e9-93dd-00163e000003');
+----------------------------------------------------------------------------------------------------+
| benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='df878007-80da-11e9-93dd-00163e000003') |
+----------------------------------------------------------------------------------------------------+
|                                                                                                  0 |
+----------------------------------------------------------------------------------------------------+
1 row in set (27.67 sec)

Of course, the above example is a worst-case scenario but it at least gives the span of the issue. Comparing integers is about 28 times faster. Even if the difference appears rapidly in the char values, it is still about 2.5 times slower:

mysql> select benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='ef878007-80da-11e9-93dd-00163e000003');
+----------------------------------------------------------------------------------------------------+
| benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='ef878007-80da-11e9-93dd-00163e000003') |
+----------------------------------------------------------------------------------------------------+
|                                                                                                  0 |
+----------------------------------------------------------------------------------------------------+
1 row in set (2.45 sec)

Let’s explore a few solutions to address those issues.

Size of the Values

The default representation for UUID, hash, and token values is often the hexadecimal notation. With a cardinality, the number of possible values, of only 16 per byte, it is far from efficient. What about using another representation like base64 or even straight binary? How much do we save? How is the performance affected?

Let’s begin by the base64 notation. The cardinality of each byte is 64 so it takes 3 bytes in base64 to represent 2 bytes of actual value. A UUID value consists of 16 bytes of data, if we divide by 3, there is a remainder of 1. To handle that, the base64 encoding adds ‘=’ at the end:

mysql> select to_base64(unhex(replace(uuid(),'-','')));
+------------------------------------------+
| to_base64(unhex(replace(uuid(),'-',''))) |
+------------------------------------------+
| clJ4xvczEeml1FJUAJ7+Fg==                 |
+------------------------------------------+
1 row in set (0.00 sec)

If the length of the encoded entity is known, like for a UUID, we can remove the ‘==’, as it is just dead weight. A UUID encoded in base64 thus has a length of 22.

The next logical step is to directly store the value in binary format. This the most optimal format but displaying the values in the mysql client is less convenient.

So, how’s the size impacting performance? To illustrate the impact, I inserted random UUID values in a table with the following definition…

CREATE TABLE `data_uuid` (
  `id` char(36) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

… for the default hexadecimal representation. For base64, the ‘id’ column is defined as char(22) while binary(16) is used for the binary example. The database server has a buffer pool size at 128M and its IOPs are limited to 500. The insertions are done over a single thread.

Insertion rates for tables using different representation for UUID values

In all cases, the insertion rate is at first CPU bound but as soon the table is larger than the buffer pool, the insertion rapidly becomes IO bound. This is expected and shouldn’t surprise anyone. The use of a smaller representation for the UUID values just allows more rows to fit in the buffer pool but in the long run, it doesn’t really help the performance, as the random insertion order dominates. If you are using random UUID values as primary keys, your performance is limited by the amount of memory you can afford.

Option 1: Saving IOPs with Pseudo-Random Order

As we have seen, the most important issue is the random nature of the values. A new row may end up in any of the table leaf pages. So unless the whole table is loaded in the buffer pool, it means a read IOP and eventually a write IOP. My colleague David Ducos gave a nice solution to this problem but some customers do not want to allow for the possibility of extracting information from the UUID values, like, for example, the generation timestamp.

What if we somewhat just reduce then the randomness of the values in a way that a prefix of a few bytes is constant for a time interval? During the time interval, only a fraction of the whole table, corresponding to the cardinality of the prefix, would be required to be in the memory to save the read IOPs. This would also increase the likelihood a page receives a second write before being flushed to disk, thus reducing the write load. Let’s consider the following UUID generation function:

drop function if exists f_new_uuid; 
delimiter ;;
CREATE DEFINER=`root`@`%` FUNCTION `f_new_uuid`() RETURNS char(36)
    NOT DETERMINISTIC
BEGIN
    DECLARE cNewUUID char(36);
    DECLARE cMd5Val char(32);


    set cMd5Val = md5(concat(rand(),now(6)));
    set cNewUUID = concat(left(md5(concat(year(now()),week(now()))),4),left(cMd5Val,4),'-',
        mid(cMd5Val,5,4),'-4',mid(cMd5Val,9,3),'-',mid(cMd5Val,13,4),'-',mid(cMd5Val,17,12));

    RETURN cNewUUID;
END;;
delimiter ;

The first four characters of the UUID value comes from the MD5 hash of the concatenation of the current year and week number. This value is, of course, static over a week. The remaining of the UUID value comes from the MD5 of a random value and the current time at a precision of 1us. The third field is prefixed with a “4” to indicate it is a version 4 UUID type. There are 65536 possible prefixes so, during a week, only 1/65536 of the table rows are required in the memory to avoid a read IOP upon insertion. That’s much easier to manage, a 1TB table will need to have only about 16MB in the buffer pool to support the inserts.

Option 2: Mapping UUIDs to Integers

Even if you use pseudo-ordered UUID values stored using binary(16), it is still a very large data type which will inflate the size of the dataset. Remember the primary key values are used as pointers in the secondary indexes by InnoDB. What if we store all the UUID values of a schema in a mapping table? The mapping table will be defined as:

CREATE TABLE `uuid_to_id` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `uuid` char(36) NOT NULL,
  `uuid_hash` int(10) unsigned GENERATED ALWAYS AS (crc32(`uuid`)) STORED NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_hash` (`uuid_hash`)
) ENGINE=InnoDB AUTO_INCREMENT=2590857 DEFAULT CHARSET=latin1;

It is important to notice the uuid_to_id table does not enforce the uniqueness of uuid. The idx_hash index acts a bit like a bloom filter. We’ll know for sure a UUID value is not present in the table when there is no matching hash value but we’ll have to validate with the stored UUID value when there is a matching hash. To help us here, let’s create a SQL function:

DELIMITER ;;
CREATE DEFINER=`root`@`%` FUNCTION `f_uuid_to_id`(pUUID char(36)) RETURNS int(10) unsigned
    DETERMINISTIC
BEGIN
        DECLARE iID int unsigned;
        DECLARE iOUT int unsigned;

        select get_lock('uuid_lock',10) INTO iOUT;

        SELECT id INTO iID
        FROM uuid_to_id WHERE uuid_hash = crc32(pUUID) and uuid = pUUID;

        IF iID IS NOT NULL THEN
            select release_lock('uuid_lock') INTO iOUT;
            SIGNAL SQLSTATE '23000'
                SET MESSAGE_TEXT = 'Duplicate entry', MYSQL_ERRNO = 1062;
        ELSE
            insert into uuid_to_id (uuid) values (pUUID);
            select release_lock('uuid_lock') INTO iOUT;
            set iID = last_insert_id();
        END IF;

        RETURN iID;
END ;;
DELIMITER ;

The function checks if the UUID values passed exist in the uuid_to_id table, and if it does it returns the matching id value otherwise it inserts the UUID value and returns the last_insert_id. To protect against the concurrent submission of the same UUID values, I added a database lock. The database lock limits the scalability of the solution. If your application cannot submit twice the request over a very short time frame, the lock could be removed. I have also another version of the function with no lock calls and using a small dedup table where recent rows are kept for only a few seconds. See my github if you are interested.

Results for the Alternate Approaches

Now, let’s have a look at the insertion rates using these alternate approaches.

Insertion on tables using UUID values as primary keys, alternative solutions

The pseudo-order results are great. Here I modified the algorithm to keep the UUID prefix constant for one minute instead of one week in order to better fit the test environment. Even if the pseudo-order solution performs well, keep in mind it is still bloating the schema and overall the performance gains may not be that great.

The mapping to integer values, although the insert rates are smaller due to the additional DMLs required, decouples the schema from the UUID values. The tables now use integers as primary keys. This mapping removes nearly all the scalability concerns of using UUID values. Still, even on a small VM with limited CPU and IOPS, the UUID mapping technique yields nearly 4000 inserts/s. Put into context, this means 14M rows per hour, 345M rows per day and 126B rows per year. Such rates likely fit most requirements. The only growth limitation factor is the size of the hash index. When the hash index will be too large to fit in the buffer pool, performance will start to decrease.

Other Options than UUID Values?

Of course, there are other possibilities to generate unique IDs.  The method used by the MySQL function UUID_SHORT() is interesting. A remote device like a smartphone could use the UTC time instead of the server uptime. Here’s a proposal:

(Seconds since January 1st 1970) << 32
+ (lower 2 bytes of the wifi MAC address) << 16
+ 16_bits_unsigned_int++;

The 16 bits counter should be initialized at a random value and allowed to roll over. The odds of two devices producing the same ID are very small. It has to happen at approximately the same time, both devices must have the same lower bytes for the MAC and their 16 bits counter at the same increment.

Notes

All the data related to this post can be found in my github.

Nov
15
2019
--

Don’t Use MongoDB Profiling Level 1

MongoDB Profiling

MongoDB ProfilingTLDR: It is not profile level 1 that is the problem; it’s a gotcha with the optional ‘slowms’ argument that causes users to accidentally set verbose logging and fill their disk with log files.

In MongoDB, there are two ways to see, with individual detail, which operations were executed and how long they took.

  • Profiling. Saves the operation details to a capped collection system.profile. You access this information through a mongodb connection.
  • Log lines of “COMMAND” component type in the mongod log files. (Also mongos v4.0+ log files). You have to access the files as Unix or Windows user and work with them as text.

Profiling is low-cost, but it is limited to keeping only a small snapshot. It has levels: 0 (off), 1 (slowOp(s only)), 2 (all).

The (log file) logger also has levels of its own, but there is no ‘off’. Even at level 0, it prints any slow operation in a “COMMAND” log line. ‘Slow’ is defined by the configured slowOpThresholdMs option (originally “slowMS”). That is 100ms by default, which is a good default i.m.o.

Usability problem

The log-or-not code and profile-or-not are unrelated systems to the user, but they share the same global variable for ‘slow’: serverGlobalParams.slowMS

The basic post-mortem will go:

  • Someone used db.setProfilingLevel(1/*slowOp*/, 0/*ms*/) to start profiling all operations (DON’T DO THIS – use db.setProfilingLevel(2/*all*/) instead.)
  • The logger starts writing every command to the log files.
  • They executed db.setProfilingLevel(0/*off*/) to turn the profiler off.
  • The logger continues writing every command to the log files because slowMS is still 0ms.
  • The DBA gets paged after hours because the disk filled up with log file and thought ‘oh, I should have set up log rotation on that server; I’ll do it in the morning’.
  • The DBA gets woken up in the small hours of the morning because the new primary node has also crashed due to a full disk.

So that’s the advice: Until MongoDB is enhanced to have separate slow-op threshold options for the profiler, never use db.setProfilingLevel(1, …). Even if you know the gotcha, someone learning over your shoulder won’t see it.

What to do instead:

  • Use only db.setProfilingLevel(0 /*off*/) <-> db.setProfilingLevel(2 /*all*/) and don’t touch slowms
    • There is still a valid use case for using profiler at level 1, but if you are not taking on the responsibility of looking after the log file’s code interdependence on the slowMS value, don’t go there.
  • If you want the logger to print “COMMAND” lines for every command, including fast ones, use db.setLogLevel(1, "command"). And run db.setLogLevel(-1, "command") to stop it again. ‘Use log level 1, not profile level 1’ could almost be the catchphrase of this article.
    (db.setLogLevel(1) + db.setLogLevel(0) is an alternative to the above, but is an older, blunter method.)
  • If you want to set the sampleRate you can do that without changing level (or ‘slowms’) with the following command: db.setProfilingLevel(db.getProfilingStatus().was, {“sampleRate”: <new float value>})
  • If you want a different threshold for slowMS permanently, use the slowOpThresholdMs option in the config file, but you can also do it dynamically as for the sampleRate instructions above.

Some other gotchas to do with profiling and logging:

  • The logger level is global; the slowOpThresholdMs a.k.a. slowMS value is global, but profiling level is per db namespace.
  • All of the above are local only to the mongod (or mongos) the commands are run on / the config file is set for. If you run it on a primary it does not change the secondaries, and if you run it on a mongos it does not change the shard or the configsvr replicaset nodes.
  • mongos nodes only provide a subset of these diagnostic features. They have no collections of their own, so for starters, they cannot make a system.profile collection.
bool shouldDBProfile(bool shouldSample = true) {
    // Profile level 2 should override any sample rate or slowms settings.
    if (_dbprofile >= 2)
        return true;

    if (!shouldSample || _dbprofile <= 0)
        return false;

      /* Blog: and by elimination if _dbprofile == 1: */
    return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};
}

bool CurOp::completeAndLogOperation(OperationContext* opCtx,
                                    logger::LogComponent component,
                                    boost::optional<size_t> responseLength,
                                    boost::optional<long long> slowMsOverride,
                                    bool forceLog) {
    // Log the operation if it is eligible according to the current slowMS and sampleRate settings.
    const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));
    const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);
    ...
    ...
    const bool shouldSample =
        client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;

    if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {
        auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);

    ...
    // Return 'true' if this operation should also be added to the profiler.
    return shouldDBProfile(shouldSample);
}

 

Nov
11
2019
--

Prepare Your Databases for High Traffic on Black Friday

Prepare Your Databases For High Traffic

Prepare Your Databases For High TrafficIt’s November, so we all know what that means; it’s peak shopping season, and no date is bigger than Black Friday. But how will your database handle all that new, relentless traffic? Not only does your database have to handle traffic without slowing down, but web servers can sometimes see such sudden traffic as an attack; meaning your site(s) could go down completely.

Every year there are news stories about websites that were unresponsive or disappeared completely right at the exact moment when potential shoppers were coming online. Don’t let this be you! To ensure your databases are prepared for a high traffic event, download our Database Disaster Prevention Checklist and read what we advise you do before, during, and after the big day. Let’s get started.

What You Need To Know About High Traffic Events

  1. Your users expect instant response and immediate feedback from your application. If you don’t give it to them, your competition will.
  2. Database slowdowns for you can be perceived as downtime for your customers, and they will lose confidence in your ability.
  3. A few seconds of downtime costs you not only direct revenue but also lost future business. Use our Cost of Database Downtime Calculator to see just how much downtime could cost you.
  4. When you failover a slow system, the slowness follows to the new system. If your slowness alleviates, it may be because your customers went somewhere else when you were down.
  5. Time is finite; you can’t get more of it. You have to make the most of it. It can be much more valuable than money.

Before the Event

  1. Setup monitoring and tooling before the event. If you can’t see what’s going on, how can you measure your success? Percona Monitoring and Management can help with that.
  2. Load test your applications and test how you will scale under normal and peak loads. The best time to find bottlenecks is before the event, as during is costly and impactful.
  3. Test failover and understand how quickly you can recover. The worst time to find out your failover doesn’t work is during the busiest day of the year.
  4. Put a code and configuration freeze in place in advance of the event. It’s really hard to ensure performance if the application is growing and evolving.
  5. Get a second opinion, double-check, and don’t assume. A large portion of your business is tied to this event. Trust but verify things are ready; it’s worth it. Most big outages are caused by easily overlooked things.
  6. Check your backups. Make sure you have reliable and consistent backups of your databases. This will make it easier to restore a crashed database. Percona XtraBackup for MySQL can help.

During the Event

  1. Realize that failing over to another system is the absolute last resort.  Failing over a busy system moves the traffic to a new server. Additionally, most systems are slower when traffic is added as it takes time to warm up the cache.
  2. Have the right people standing by to monitor, tweak, and fix issues before they get out of hand. Too many issues are prolonged by waiting to get the right people in the room to fix issues. Many larger companies make this an all-hands event.
  3. Don’t lose sight of the goal! Getting back up and running and allowing customers access to the site is your goal, not developing a permanent fix in the heat of the moment. Time to make an impact is finite, but equally important people under pressure are more prone to make mistakes.  An easier temporary fix to get you through the day can work in your favor, as long as you don’t forget to make the permanent one eventually.
  4. Don’t make the problem worse. Some activities can cause cascading slowness… know the impact of the changes before you make them. We get a lot of calls from people who had good intentions to try and fix a minor issue but made a much bigger problem. The road to hell is paved with good intentions.
  5. Collect and store the data needed to analyze and improve for the next event. Often when things don’t go right, issues end up being transient and difficult to understand after the fact. Get the data you need, when things are going well or not.

After the Event

  1. Analyze and understand your traffic and usage. Use this data to plan, enhance, and tweak your strategy for future events.
  2. Don’t leave the quick fixes in place and just forget about them, these could escalate at the worst times. Take the time and expense to fix them during slow periods.
  3. Learn from your mistakes and build a plan to mitigate problems and risks in the future.
  4. Update your systems to the latest builds and security fixes. Take advantage of the slower load and catch up after a freeze/blackout period around the event.
  5. Don’t fall into complacency. Congrats that you survived this year, but each application and user base is a living, breathing entity, and what worked last year may not work this year. You have to analyze, plan, and review on a regular basis.

Conclusion

Now you’re better prepared for the database high-traffic days coming your way soon. For more information, learn how to ensure peak database performance for your event and download our Database Disaster Prevention Checklist, or contact us today. Percona’s experts can maximize your application performance with our open source database support, managed services or consulting for MySQL, MariaDB, MongoDB, PostgreSQL in on-premises and cloud environments.

Database Disaster Prevention Checklist

Nov
08
2019
--

Configure HAProxy with PostgreSQL Using Built-in pgsql-check

HAProxy PostgreSQL pgsql-check

PostgreSQLWe discussed one of the traditional ways to configure HAProxy with PostgreSQL in our previous blog about HAProxy using Xinetd. There we briefly mentioned the limitation of the HAProxy’s built-in pgsql-check health check option. It lacks features to detect and differentiate the Primary and Hot-Standby. It tries to establish a connection to the database instance and if the connection request is progressing, it will be considered as a successful check and there is no provision to check the current role (Primary or Standby).

So the question remains:

  1. Is the HAProxy’s built-in pgsql-check completely useless as it cannot distinguish between a Primary and a hot-standby (standby that accepts reads) in an HA setup?
  2. Is there a way to tweak pgsql-check so that it can distinguish between a Primary and Hot-standby?

This blog post discusses what is possible using pgsql-check and how to achieve that.

Note: This blog demonstrates the concept.  Integration with specific HA framework/script is left to users because there are a large number of HA solutions for PostgreSQL and these concepts are equally applicable for them

Concept

When a client initiates the connection to PostgreSQL, the first stage of a check is whether it is acceptable as per rules specified in pg_hba.conf.  That stage needs to be completed before proceeding to the next stage of specific authentication mechanisms.

The pgsql-check is designed to check this first stage (pg_hba.conf) and return success if it is passed because a positive response from the server can be considered as a litmus test for whether the instance is up and capable of accepting connections. It doesn’t have to complete the authentication. pgsql-check abandons the connection after this check before completing the initial handshakes and PostgreSQL terminates it.

A connection request will be straight away rejected if pg_hba.conf rule says to “reject” it. For example, a pg_hba.conf entry like

host    postgres    pmm_user    192.168.80.20/32    reject

tells PostgreSQL to reject connection from IP 192.168.80.20 as pmm_user to postgres database.

We can use this logic of modifying the pg_hba.conf entries as part of the failover/switchover procedure. Such automation is easily achievable by a callback script if you are using any kind of automation for failover or switchover. Alternatively, we can have a small script that checks the database instance status in each node and maintains the pg_hbha.conf entry accordingly.

In a nutshell, routing tables in HAProxy can get automatically modified and connections will be routed according to pg_hba.conf entries which are maintained as part of HA scripts/solution OR switchover/failover procedure.

Demonstration Setup

I have one primary and two hot-standby servers with server nodes (pg0, pg1, pg2) and an application server (app). We shall create two users; one for the Primary connection (read-write) detection and another for standby (read-only) connection detection.

postgres=# create user primaryuser with password 'primaryuser';
CREATE ROLE
postgres=# create user standbyuser with password 'standbyuser';
CREATE ROLE

Now we need to have pg_hba.conf entry such a way that connection request to this user will be taken forward for authentication.

host    primaryuser    primaryuser    192.168.50.0/24    md5
host    standbyuser    standbyuser    192.168.50.0/24    md5

Please see that the username and database names are kept as same. because the default name of the database is the same as user. This will help with the straightaway rejection of connection which is what we want rather than later reporting that

database "xyz" does not exist

.  We should keep in mind that there are NO such databases that exist in this PostgreSQL cluster with the name “primaryuser” or “standbyuser”. So this user won’t be really able to connect to any database even if we are not rejecting it. This is an added security to the whole setup.

We should reload the configuration:

postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)

It is a good idea to verify the client connection from the application server to this user.

$ psql -h pg0 -U primaryuser
Password for user primaryuser:
psql: FATAL: database "primaryuser" does not exist

Here we can see that connection request is taken forward for authentication, so it prompts for the password but the connection will be rejected finally because there is no such database as “primaryuser”. This is sufficient for HAProxy configuration.

We need to have the same setup for all the nodes of the PostgreSQL cluster because any node can be promoted to primary or demoted to standby.

Preparing HAProxy

We are going to have two ports open in haproxy for connection.

  1. Port 5000 for Primary Connections (Read-Write)
  2. Port 5001 for Standby Connections (Read-Only)

Here is the sample haproxy configuration (/etc/haproxy/haproxy.cfg) I used in the demonstration setup

global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen pgReadWrite
    bind *:5000
    option pgsql-check user primaryuser
    default-server inter 3s fall 3
    server pg0 pg0:5432 check port 5432
    server pg1 pg1:5432 check port 5432
    server pg2 pg2:5432 check port 5432

listen pgReadOnly
    bind *:5001
    option pgsql-check user standbyuser
    default-server inter 3s fall 3
    server pg0 pg0:5432 check port 5432
    server pg1 pg1:5432 check port 5432
    server pg2 pg2:5432 check port 5432

Note: haproxy installation and default configuration file location might change based on OS. On Redhat clones, installation is generally as simple as running: $ sudo yum install haproxy

As we can see in the configuration, the check is using option pgsql-check with user primaryuser for “pgReadWrite” connections and standbyuser for “pgReadOnly” connections which are intended for Primary and Standby connections respectively. All three nodes of the cluster are candidates for all the types of connections.

Once we have the configuration ready, the haproxy service can be started up.

$ sudo systemctl start haproxy

At this stage, all nodes will be listed as candidates for both read-write and read-only connections which will be marked in the green background color.

This is not what we want to achieve.

Integration of pg_hba.conf with failover / switchover procedure

All HA solutions for failover/switchover have the provision for housekeeping which updates configuration files like recovery.conf and capability for callback custom scripts. It is the responsibility of the same failover/switchover procedure to make changes to pg_hba.conf in this case. The change should be such a way that the pg_hba.conf setting should reject the standbyuser connection on the primary node and primaryuser connection from standby servers. For this demonstration, I am directly modifying the authentication method “reject”.

On Primary:

$ sed -i 's/\(host\s*standbyuser\s*standbyuser.*\) md5/\1 reject/g' $PGDATA\pg_hba.conf

So that the line will change to:

host    standbyuser    standbyuser    192.168.50.0/24    reject

On Standby:

$ sed -i 's/\(host\s*primaryuser\s*primaryuser.*\) md5/\1 reject/g' $PGDATA\pg_hba.conf

So that the line will change to:

host    primaryuser    primaryuser    192.168.50.0/24    reject

After reloading the configuration, the routing tables will get updated with the right set of information

This is what we need to have.

Testing

Testing of this HAProxy setup can be done from the machine (app) where HAProxy is currently configured.

Connections to port 5000 will be routed to Primary:

$ psql -h localhost -U postgres -p 5000 -c "select pg_is_in_recovery()"
Password for user postgres:
pg_is_in_recovery
-------------------
f
(1 row)

Connection to port 5001 will be routed to one of the standby:

$ psql -h localhost -U postgres -p 5001 -c "select pg_is_in_recovery()"
Password for user postgres:
pg_is_in_recovery
-------------------
t
(1 row)

So with the help of a custom callback script which modifies the pg_hba.conf, HAProxy can maintain its routing table and thereby redirect the connections.

Advantages and Disadvantages of pgsql-check

The advantage of built-in pgsql-check is obvious in that we don’t need any extra components and setup is very straight-forward.

On the disadvantage side, the modification of pg_hba.conf is the key and we should make sure that it is getting updated as part of failover and switchover.

As we described in the concept, pgsql-check is designed to abandon the connections after the check which PostgreSQL will be logging as:

LOG: could not receive data from client: Connection reset by peer

Moreover, all the connection requests are rejected at the pg_bha.conf rule will be logged as well:

FATAL: pg_hba.conf rejects connection for host "192.168.50.40", user "primaryuser", database "primaryuser", SSL off

PostgreSQL log file will be containing a lot of such messages which you can ignore, but it will be nice if you are expecting the log files to be clean.

Special Note:- There are some recent improvements to HAProxy like this commit which improves the disconnection of pgsql-check. So messages like

LOG: could not receive data from client: Connection reset by peer

may not appear in the logs. This is tested and confirmed with HAProxy Version 2.0.8. However, at the time of writing this blog, it is not available in most of the repositories and you may have to build it from source code.

Nov
01
2019
--

Use MySQL Without a Password (And Still Be Secure)

Use MySQL Without a Password

Use MySQL Without a PasswordSome say that the best password is the one you don’t have to remember. That’s possible with MySQL, thanks to the auth_socket plugin and its MariaDB version unix_socket.

Neither of these plugins is new, and some words have been written about the auth_socket on this blog before, for example: how to change passwords in MySQL 5.7 when using plugin: auth_socket. But while reviewing what’s new with MariaDB 10.4, I saw that the unix_socket now comes installed by default and is one of the authentication methods (one of them because in MariaDB 10.4 a single user can have more than one authentication plugin, as explained in the Authentication from MariaDB 10.4 document).

As already mentioned this is not news, and even when one installs MySQL using the .deb packages maintained by the Debian team, the root user is created so it uses the socket authentication. This is true for both MySQL and MariaDB:

root@app:~# apt-cache show mysql-server-5.7 | grep -i maintainers
Original-Maintainer: Debian MySQL Maintainers <pkg-mysql-maint@lists.alioth.debian.org>
Original-Maintainer: Debian MySQL Maintainers <pkg-mysql-maint@lists.alioth.debian.org>

Using the Debian packages of MySQL, the root is authenticated as follows:

root@app:~# whoami
root=
root@app:~# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.27-0ubuntu0.16.04.1 (Ubuntu)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select user, host, plugin, authentication_string from mysql.user where user = 'root';
+------+-----------+-------------+-----------------------+
| user | host      | plugin | authentication_string |
+------+-----------+-------------+-----------------------+
| root | localhost | auth_socket |                       |
+------+-----------+-------------+-----------------------+
1 row in set (0.01 sec)

Same for the MariaDB .deb package:

10.0.38-MariaDB-0ubuntu0.16.04.1 Ubuntu 16.04

MariaDB [(none)]> show grants;
+------------------------------------------------------------------------------------------------+
| Grants for root@localhost                                                                      |
+------------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED VIA unix_socket WITH GRANT OPTION |
| GRANT PROXY ON ''@'%' TO 'root'@'localhost' WITH GRANT OPTION                                  |
+------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

For Percona Server, the .deb packages from the official Percona Repo are also setting the root user authentication to auth_socket. Here is an example of Percona Server for MySQL 8.0.16-7 and Ubuntu 16.04:

root@app:~# whoami
root
root@app:~# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.16-7 Percona Server (GPL), Release '7', Revision '613e312'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select user, host, plugin, authentication_string from mysql.user where user ='root';
+------+-----------+-------------+-----------------------+
| user | host      | plugin | authentication_string |
+------+-----------+-------------+-----------------------+
| root | localhost | auth_socket |                       |
+------+-----------+-------------+-----------------------+
1 row in set (0.00 sec)

So, what’s the magic? The plugin checks that the Linux user matches the MySQL user using the SO_PEERCRED socket option to obtain information about the user running the client program. Thus, the plugin can be used only on systems that support the SO_PEERCRED option, such as Linux. The SO_PEERCRED socket option allows retrieving the uid of the process that is connected to the socket. It is then able to get the user name associated with that uid.

Here’s an example with the user “vagrant”:

vagrant@mysql1:~$ whoami
vagrant
vagrant@mysql1:~$ mysql
ERROR 1698 (28000): Access denied for user 'vagrant'@'localhost'

Since no user “vagrant” exists in MySQL, the access is denied. Let’s create the user and try again:

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 'vagrant'@'localhost' IDENTIFIED VIA unix_socket;
Query OK, 0 rows affected (0.00 sec)

vagrant@mysql1:~$ mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 45
Server version: 10.0.38-MariaDB-0ubuntu0.16.04.1 Ubuntu 16.04
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show grants;
+---------------------------------------------------------------------------------+
| Grants for vagrant@localhost                                                    |
+---------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'vagrant'@'localhost' IDENTIFIED VIA unix_socket |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Success!

Now, what about on a non-debian distro, where this is not the default? Let’s try it on Percona Server for MySQL 8 installed on a CentOS 7:

mysql> show variables like '%version%comment';
+-----------------+---------------------------------------------------+
| Variable_name   | Value                                   |
+-----------------+---------------------------------------------------+
| version_comment | Percona Server (GPL), Release 7, Revision 613e312 |
+-----------------+---------------------------------------------------+
1 row in set (0.01 sec)

mysql> CREATE USER 'percona'@'localhost' IDENTIFIED WITH auth_socket;
ERROR 1524 (HY000): Plugin 'auth_socket' is not loaded

Failed. What is missing? The plugin is not loaded:

mysql> pager grep socket
PAGER set to 'grep socket'
mysql> show plugins;
47 rows in set (0.00 sec)

Let’s add the plugin in runtime:

mysql> nopager
PAGER set to stdout
mysql> INSTALL PLUGIN auth_socket SONAME 'auth_socket.so';
Query OK, 0 rows affected (0.00 sec)

mysql> pager grep socket; show plugins;
PAGER set to 'grep socket'
| auth_socket                     | ACTIVE | AUTHENTICATION | auth_socket.so | GPL     |
48 rows in set (0.00 sec)

We got all we need now. Let’s try again:

mysql> CREATE USER 'percona'@'localhost' IDENTIFIED WITH auth_socket;
Query OK, 0 rows affected (0.01 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'percona'@'localhost';
Query OK, 0 rows affected (0.01 sec)

And now we can log in as the OS user “percona”.

[percona@ip-192-168-1-111 ~]$ whoami
percona
[percona@ip-192-168-1-111 ~]$ mysql -upercona
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 19
Server version: 8.0.16-7 Percona Server (GPL), Release 7, Revision 613e312


Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.


Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


mysql> select user, host, plugin, authentication_string from mysql.user where user ='percona';
+---------+-----------+-------------+-----------------------+
| user    | host   | plugin   | authentication_string |
+---------+-----------+-------------+-----------------------+
| percona | localhost | auth_socket |                       |
+---------+-----------+-------------+-----------------------+
1 row in set (0.00 sec)

Success again!

Question: Can I try to log as the user percona from another user?

[percona@ip-192-168-1-111 ~]$ logout
[root@ip-192-168-1-111 ~]# mysql -upercona
ERROR 1698 (28000): Access denied for user 'percona'@'localhost'

No, you can’t.

Conclusion

MySQL is flexible enough in several aspects, one being the authentication methods. As we see in this post, one can achieve access without passwords by relying on OS users. This is helpful in several scenarios, but just to mention one: when migrating from RDS/Aurora to regular MySQL and using IAM Database Authentication to keep getting access without using passwords.

Oct
31
2019
--

PostgreSQL Application Connection Failover Using HAProxy with xinetd

PostgreSQL Application Connection Failover Using HAProxy with xinetd

PostgreSQLRecently we published a blog about a very simple application failover using libpq features which could be the simplest of all automatic application connection routing.  In this blog post, we are discussing how a proxy server using HAProxy can be used for connection routing which is a well-known technique with very wide deployment. There are multiple ways HAProxy can be configured with PostgreSQL which we shall cover in upcoming blogs, but configuring a xinetd service to respond to Http requests on individual nodes of a database cluster is one of the most traditional approaches.

On HAProxy

HAProxy could be the most popular connection routing and load balancing software available. Along with PostgreSQL, it is used across different types of High Availability Clusters. HAProxy, as the name indicates, works as a proxy for TCP (Layer 4) and HTTP (Layer 7), but it has additional features of load balancing also. The TCP proxying feature allows us to use it for database connections of PostgreSQL. There are three objectives of connection routing of a PostgreSQL cluster:

  1. Read-Write load to Master
  2. Read-Only load to Slave
  3. Load balancing of multiple slaves is achievable by HAProxy.

HAProxy maintains an internal routing table. In this blog, we are going to take a look at the most traditional approach to configure HAProxy with PostgreSQL. This approach is independent of underlying clustering software and can be used even with the traditional PostgreSQL built-in replication feature without any clustering or automation solutions.

In this generic configuration, we won’t use any special software or capabilities offered by clustering frameworks. This requires us to have 3 components:

  1. A simple shell script to check the status of the PostgreSQL instance running on the local machine.
  2. A xinetd service daemonizer.
  3. HAProxy: Which maintains the routing mechanism.

Concept:

HAProxy has a built-in check for PostgreSQL with option pgsql-check.  (Documentation is available here) This is good enough for basic Primary failover. But the lack of features to detect and differentiate the Primary and Hot-Standby nodes makes it less useful.

Meanwhile, HAProxy with xinetd would give us the luxury to see what is the Master and what is a hot standby to redirect connections appropriately. We will be writing about the built-in check pgsql-check in upcoming blog posts and explain how to make use of it effectively.

Xinetd (Extended Internet Service Daemon) is a Super-server daemon. It can listen to requests on custom ports and respond to requests by executing custom logic. In this case, we have custom scripts to check the status of the database. In the script we use writes HTTP header with status code. Different status code represents the status of the database instance. Status code 200 if PostgreSQL instance is Primary, 206 if PostgreSQL is Hot Standby, and 503 if status cannot be verified.

Every database server needs to have a xinetd service running on a port for status checks of PostgreSQL instances running in them. Generally, port: 23267 is used for this purpose, but we can use any port of our choice. This service uses a custom-developed script (shell script) to understand the 3 different statuses of PostgreSQL instances.

  1. Primary database
  2. Standby database
  3. Unable to connect to PostgreSQL – Indication of PostgreSQL down

Since the status check is available through a port exposed by xinetd, HAProxy can send a request to that port and understand the status from the response.

Installation and Configuration

First, we need to have a script that can check the status of a PostgreSQL instance. It is quite simple, the shell script invokes psql utility and executes pg_is_in_recovery() function of postgres. Based on the result, it can understand whether it is a master or slave or whether it failed to connect.

A sample script is here:

#!/bin/bash
# This script checks if a postgres server is healthy running on localhost. It will return:
# "HTTP/1.x 200 OK\r" (if postgres is running smoothly)
# - OR -
# "HTTP/1.x 500 Internal Server Error\r" (else)
# The purpose of this script is make haproxy capable of monitoring postgres properly
# It is recommended that a low-privileged postgres  user is created to be used by this script.
# For eg. create  user healthchkusr login password 'hc321';
 
PGBIN=/usr/pgsql-10/bin
PGSQL_HOST="localhost"
PGSQL_PORT="5432"
PGSQL_DATABASE="postgres"
PGSQL_USERNAME="postgres"
export PGPASSWORD="passwd"
TMP_FILE="/tmp/pgsqlchk.out"
ERR_FILE="/tmp/pgsqlchk.err"
 
 
# We perform a simple query that should return a few results
 
VALUE=`/opt/bigsql/pg96/bin/psql -t -h localhost -U postgres -p 5432 -c "select pg_is_in_recovery()" 2> /dev/null`
# Check the output. If it is not empty then everything is fine and we return something. Else, we just do not return anything.
 
 
if [ $VALUE == "t" ]
then
    /bin/echo -e "HTTP/1.1 206 OK\r\n"
    /bin/echo -e "Content-Type: Content-Type: text/plain\r\n"
    /bin/echo -e "\r\n"
    /bin/echo "Standby"
    /bin/echo -e "\r\n"
elif [ $VALUE == "f" ]
then
    /bin/echo -e "HTTP/1.1 200 OK\r\n"
    /bin/echo -e "Content-Type: Content-Type: text/plain\r\n"
    /bin/echo -e "\r\n"
    /bin/echo "Primary"
    /bin/echo -e "\r\n"
else
    /bin/echo -e "HTTP/1.1 503 Service Unavailable\r\n"
    /bin/echo -e "Content-Type: Content-Type: text/plain\r\n"
    /bin/echo -e "\r\n"
    /bin/echo "DB Down"
    /bin/echo -e "\r\n"
fi

Instead of password-based authentication, any password-less authentication methods can be used.

It is a good practice to keep the script in /opt folder, but make sure that it has got execute permission:

$ sudo chmod 755 /opt/pgsqlchk

Now we can install xinetd on the server. Optionally, we can install a telnet client so that we can test the functionality.

$ sudo yum install -y xinetd telnet

Now let us create a xinetd definition/configuration.

$ sudo vi /etc/xinetd.d/pgsqlchk

Add a configuration specification to the same file as below:

service pgsqlchk
{
        flags           = REUSE
        socket_type     = stream
        port            = 23267
        wait            = no
        user            = nobody
        server          = /opt/pgsqlchk
        log_on_failure  += USERID
        disable         = no
        only_from       = 0.0.0.0/0
        per_source      = UNLIMITED
}

Add the pgsqlchk service to /etc/services.

$ sudo bash -c 'echo "pgsqlchk 23267/tcp # pgsqlchk" >> /etc/services'

Now xinetd service can be started.

$ sudo systemctl start xinetd

Configuring HAProxy to use xinetd

We need to have HAProxy installed on the server:

$ sudo yum install -y haproxy

Create or modify the HAProxy configuration. Open /etc/haproxy/haproxy.cfg using a text editor.

$ sudo vi /etc/haproxy/haproxy.cfg

A sample HAProxy configuration file is given below:

global
    maxconn 100
 
defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s
 
listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /
 
listen ReadWrite
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg0 pg0:5432 maxconn 100 check port 23267
    server pg1 pg1:5432 maxconn 100 check port 23267
 
listen ReadOnly
    bind *:5001
    option httpchk
    http-check expect status 206
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg0 pg0:5432 maxconn 100 check port 23267
    server pg1 pg1:5432 maxconn 100 check port 23267

As per the above configuration, the key points to note are

  • HAProxy is configured to use TCP mode
  • HAProxy service will start listening to port 5000 and 5001
  • Port 5000 is for Read-Write connections and 5001 is for Read-Only connections
  • Status check is done using http-check feature on port 23267
  • Both server pg0 and pg1 are candidates for both Read-write and Read-only connections
  • Based on the http-check and the status returned, it decides the current role

Now everything is set for starting the HAProxy service.

$ sudo systemctl start haproxy

Verification and Testing

As per HAProxy configuration, we should be able to access the port 5000 for a read-write connection.

$ psql -h localhost -p 5000 -U postgres
Password for user postgres:
psql (9.6.5)
Type "help" for help.

postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)

For read-only connection, we should be able to access the port 5001:

$ psql -h localhost -p 5001 -U postgres
Password for user postgres:
psql (9.6.5)
Type "help" for help.

postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)

Conclusion

This is a very generic way of configuring HAProxy with a PostgreSQL cluster, but it’s not limited to any particular cluster topology. Healthcheck is done by a custom shell script and the result of the health check is available through xinetd service. HAProxy uses this information for maintaining the routing table and redirecting the connection to the appropriate node in the cluster.

Oct
07
2019
--

Centralization Vs. Decentralization of DBA Teams

Centralization Vs. Decentralization of DBA Teams

Centralization Vs. Decentralization of DBA TeamsAs a Technical Account Manager (TAM), I have seen many of our clients adopt a decentralized DBA Team.  In many cases, this is an effort to better align the DBA Team with the Development Teams.  This is an admirable and logical goal.  As often happens, you often trade one set of challenges for another.

Centralized DBA Teams

First, let’s talk about the challenges of a centralized DBA Team.  Here, the DBAs are all on a single team which is likely separated by platform.  So, you often have a MySQL Team, Oracle Team, SQL Server Team, etc.  These teams usually report up to one manager and although they act somewhat independently by platform, there is also some level of standardization in documentation, reporting structure, procedure, etc.  There are a number of benefits to this approach and a few challenges.  One benefit is having standardization across the whole of the company for how a given technology is documented, deployed, managed, etc.  Of course, this can be a limiting factor as well since there is little opportunity for customization.  Everything tends to become “vanilla” in nature and everything looks alike; from a positive perspective, this is consistency.

Consistency can be very useful.  When new members are added to the team, all servers are essentially the same.  It really doesn’t matter what the application is, a new team member can become proficient very quickly across the whole of the infrastructure.  Lessons learned on one set of servers translates very well to another set of servers supporting a completely different application.

As noted above, however, this is also a challenge.  What happens when a particular application needs something different?  Breaking the norm is the antithesis of consistency and resistance is met from team leadership.

Another challenge is that often the DBAs are supporting so many different technologies that they are challenged to fully understand the application and how it works.   There are just too many applications to become intimately aware of each.  In this case, DBAs are often more reactive rather than being proactive and becoming advisers to the development teams.  This is quite common in larger enterprises that have many diverse applications.

Decentralized DBA Teams

To combat this, many enterprises adopted the decentralized model and break the DBA Teams up into smaller teams aligned closely with the development team.  This seems to make much more sense in many ways since the DBAs will be laser-focused on fewer applications and work much more closely with the developers to ensure an improved solution.

So, what is the issue with this approach?  There are always trade-offs with any approach.  If there were one clear winner, everyone would just use it.  One of the largest challenges I have seen as a TAM with decentralization has been the lack of standardization.  Each DBA team acts virtually independently from every other team.  Problems that once were solved once and for all are suddenly being faced in parallel by multiple teams.  As a result, teams are often “re-inventing the wheel” each time they are confronted by a challenge that may have already been resolved by another team.  Without strong internal communication, teams are wasting time looking for solutions that have already been found.

This is one of the most fulfilling aspects of my role as a TAM.  I am in the unique position of often meeting with multiple teams and socializing these solutions across teams.  I am often asked in meetings with my clients whether I have seen this issue with another team or even another client.  If the answer is affirmative, the next question is obviously about how they resolved it.  Experience wins the course here and provides significant improvement in time to resolution of the issue.

Another challenge I see pertains to consistency.  In decentralized teams, DBAs will sometimes move from team to team as demand for resources changes.  In such cases, new team members require significant time to get up to speed on systems as consistency has been compromised due to each team doing things differently.  Installations may be in different directories or folders on the servers, documentation may be better or worse, and so on.  With no centralized oversight of standards, moving to a new team can slow the process of getting the DBA up to speed.

Communication is Key

Whether you decide to keep a centralized DBA Team or to decentralize the team, some level of consistency and communication are critical.  If you chose to decentralize the DBA Team, be sure someone is acting as a centralized resource, such as a TAM, who is looking for patterns of issues and working to find proven solutions.

Oct
04
2019
--

Percona XtraDB Cluster 8.0 (experimental release) : SST Improvements

xtradb sst improvements

xtradb sst improvementsStarting with the experimental release of Percona XtraDB Cluster 8.0, we have made changes to the SST process to make the process more robust and easier to use.

  • mysqldump and rsync are no longer supported SST methods.

    Support for mysqldump was deprecated starting with PXC 5.7 and has now been completely removed.

    MySQL 8.0 introduced a new Redo Log format that limited the use of rsync while upgrading from PXC 5.7 to 8.0. In addition, the new Galera-4 also introduced changes that further limits the use of rsync.

    The only supported SST method is xtrabackup-v2.

  • A separate Percona XtraBackup installation is no longer required.

    The required Percona XtraBackup (PXB) binaries are now shipped as part of PXC 8.0, they are not installed for general use. So if you want to use PXB outside of an SST, you will have to install PXB separately.

  • SST logging now uses MySQL error logging

    Previously, the SST script would write directly to the error log file. Now, the SST script uses MySQL error logging. A side effect of this change is that the SST logs are not immediately visible. This is due to the logging subsystem being initialized after the SST has completed.

  • The wsrep_sst_auth variable has been removed.

    PXC 8.0 now creates an internal user (mysql.pxc.sst.user) with a random password for use by PXB to take the backup. The cleartext of the password is not saved and the user is deleted after the SST has completed.

    (This feature is still in development and may change before PXC 8.0 GA)

  • PXC SST auto-upgrade

    When PXC 8.0 detects that the SST came from a lower version, mysql_upgrade is automatically invoked. Also “RESET SLAVE ALL” is run on the new node if needed. This is invoked when receiving an SST from PXC 5.7 and PXC 8.0.

    (This feature is still in development and may change before PXC 8.0 GA)

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Oct
03
2019
--

Percona XtraDB Cluster 8.0 New Feature: wsrep_sst_auth Removal

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0The problem

In PXC 5.6 and 5.7, when using xtrabackup-v2 as the SST method, the DBA must create a user with the appropriate privileges for use by Percona XtraBackup (PXB). The username and password of this backup user are specified in the wsrep_sst_auth variable.

This is a problem because this username and password was being stored in plaintext and required that the configuration file be secured.

The PXC 8.0 solution

(This feature is still under development and may change before PXC 8.0 GA)

Because the wsrep_sst_auth is only needed on the donor side to take a backup, PXC 8.0 uses an internal user (created specifically for use by PXC) with a randomly generated password. Since this user is only needed on the donor, the plaintext password is not needed on the joiner node.

This password consists of 32 characters generated at random. A new password is generated for each SST request. The plaintext of the password is never saved and never leaves the node. The username/password is sent to the SST script via unnamed pipes (stdin).

New PXC internal user accounts

mysql.pxc.internal.session

The mysql.pxc.internal.session user account provides the appropriate security context to create and set up the other PXC accounts. This account has a limited set of privileges, enough needed to create the mysql.pxc.sst.user??.

This account is locked and cannot be used to login (the password field will not allow login).

mysql.pxc.sst.user

The mysql.pxc.sst.user is used by XtraBackup to perform the backup. This account has the full set of privileges needed by XtraBackup.?? This account is created for an SST and is dropped at the end of an SST and also when the PXC node is shutdown. The creation/provisioning of this user account is not written to the binlog and is not replicated to other nodes. However, this account is sent with the backup to the joiner node. So the joiner node also has to drop this user after the SST has finished.

mysql.pxc.sst.role

The mysql.pxc.sst.role is the MySQL role that provides the privileges needed for XtraBackup. This allows for easy addition/removal of privileges needed for an SST.

The experimental release of PXC is based on MySQL 8.0.15, and we have not implemented the role-based support due to issues found with MySQL 8.0.15. This will be revisited in future versions of PXC 8.0.

Program flow

  1. DONOR node receives SST request from the JOINER
  2. DONOR node generates a random password and creates the internal SST user
    SET SESSION sql_log_bin = OFF;
    DROP USER IF EXISTS 'mysql.pxc.sst.user'@localhost;
    CREATE USER 'mysql.pxc.sst.user'@localhost IDENTIFIED WITH 'mysql_native_password' BY 'XXXXXXXX' ACCOUNT LOCK;
    GRANT 'mysql.pxc.sst.role'@localhost TO 'mysql.pxc.sst.user'@localhost;
    SET DEFAULT ROLE 'mysql.pxc.sst.role'@localhost to 'mysql.pxc.sst.user'@localhost;
    ALTER USER 'mysql.pxc.sst.user'@localhost ACCOUNT UNLOCK;

    The code that uses role is not being used in the current release due to issues with MySQL 8.0.15. Currently, we create the user with all the permissions needed explicitly.

  3. Launch the SST script (passing the username/password via stdin)
  4. SST uses the username/password to perform the backup
  5. SST script exits
  6. The DONOR node drops the user.
  7. The JOINER node receives the backup and drops the user. Note that the JOINER node also contains the internal SST user!

As a precaution, the user is also dropped when the server is shutdown.

Oct
01
2019
--

Experimental Binary of Percona XtraDB Cluster 8.0

Experimental Binary XtraDB 8.0

Experimental Binary XtraDB 8.0Percona is happy to announce the first experimental binary of Percona XtraDB Cluster 8.0 on October 1, 2019. This is a major step for tuning Percona XtraDB Cluster to be more cloud- and user-friendly. This release combines the updated and feature-rich Galera 4, with substantial improvements made by our development team.

Improvements and New Features

Galera 4, included in Percona XtraDB Cluster 8.0, has many new features. Here is a list of the most essential improvements:

  • Streaming replication supports large transactions
  • The synchronization functions allow action coordination (wsrep_last_seen_gtid, wsrep_last_written_gtid, wsrep_sync_wait_upto_gtid)
  • More granular and improved error logging. wsrep_debug is now a multi-valued variable to assist in controlling the logging, and logging messages have been significantly improved.
  • Some DML and DDL errors on a replicating node can either be ignored or suppressed. Use the wsrep_ignore_apply_errors variable to configure.
  • Multiple system tables help find out more about the state of the cluster state.
  • The wsrep infrastructure of Galera 4 is more robust than that of Galera 3. It features a faster execution of code with better state handling, improved predictability, and error handling.

Percona XtraDB Cluster 8.0 has been reworked in order to improve security and reliability as well as to provide more information about your cluster:

  • There is no need to create a backup user or maintain the credentials in plain text (a security flaw). An internal SST user is created, with a random password for making a backup, and this user is discarded immediately once the backup is done.
  • Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases). This avoids manual intervention and simplifies the operation in the cloud.
  • SST (State Snapshot Transfer) rolls back or fixes an unwanted action. It is no more “a copy only block” but a smart operation to make the best use of the copy-phase.
  • Additional visibility statistics are introduced in order to obtain more information about Galera internal objects. This enables easy tracking of the state of execution and flow control.

Installation

You can only install this release from a tarball and it, therefore, cannot be installed through a package management system, such as apt or yum. Note that this release is not ready for use in any production environment.

Percona XtraDB Cluster 8.0 is based on the following:

Please be aware that this release will not be supported in the future, and as such, neither the upgrade to this release nor the downgrade from higher versions is supported.

This release is also packaged with Percona XtraBackup 8.0.5. All Percona software is open-source and free.

In order to experiment with Percona XtraDB Cluster 8.0 in your environment, download and unpack the tarball for your platform.

Note

Be sure to check your system and make sure that the packages are installed which Percona XtraDB Cluster 8.0 depends on.

For Debian or Ubuntu:

$ sudo apt-get install -y \
socat libdbd-mysql-perl \
rsync libaio1 libc6 libcurl3 libev4 libgcc1 libgcrypt20 \
libgpg-error0 libssl1.1 libstdc++6 zlib1g libatomic1

For Red Hat Enterprise Linux or CentOS:

$ sudo yum install -y openssl socat  \
procps-ng chkconfig procps-ng coreutils shadow-utils \
grep libaio libev libcurl perl-DBD-MySQL perl-Digest-MD5 \
libgcc rsync libstdc++ libgcrypt libgpg-error zlib glibc openssl-libs

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com