Jan
10
2022
--

MySQL 8.0 Functional Indexes

MySQL 8.0 Functional Indexes

MySQL 8.0 Functional IndexesWorking with hundreds of different customers I often face similar problems around running queries. One very common problem when trying to optimize a database environment is index usage. A query that cannot use an index is usually a long-running one, consuming more memory or triggering more disk iops.

A very common case is when a query uses a filter condition against a column that is involved in some kind of functional expression. An index on that column can not be used.

Starting from MySQL 8.0.13 functional indexes are supported. In this article, I’m going to show what they are and how they work.

The Well-Known Problem

As already mentioned, a very common problem about index usage is when you have a filter condition against one or more columns involved in some kind of functional expression.

Let’s see a simple example.

You have a table called products containing the details of your products, including a create_time TIMESTAMP column. If you would like to calculate the average price of your products on a specific month you could do the following:

mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10;
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

The query returns the right value, but take a look at the EXPLAIN:

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 99015
     filtered: 100.00
        Extra: Using where

 

The query triggers a full scan of the table. Let’s create an index on create_time and check again:

mysql> ALTER TABLE products ADD INDEX(create_time);
Query OK, 0 rows affected (0.71 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 99015
     filtered: 100.00
        Extra: Using where

 

A full scan again. The index we have created is not effective. Indeed any time an indexed column is involved in a function the index can not be used.

To optimize the query the workaround is rewriting it differently in order to isolate the indexed column from the function.

Let’s test the following equivalent query:

mysql> SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01';
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: range
possible_keys: create_time
          key: create_time
      key_len: 5
          ref: NULL
         rows: 182
     filtered: 100.00
        Extra: Using index condition

 

Cool, now the index is used. Then rewriting the query was the typical suggestion.

Quite a simple solution, but not all the times it was possible to change the application code for many valid reasons. So, what to do then?

 

MySQL 8.0 Functional Indexes

Starting from version 8.0.13, MySQL supports functional indexes. Instead of indexing a simple column, you can create the index on the result of any function applied to a column or multiple columns.

Long story short, now you can do the following:

mysql> ALTER TABLE products ADD INDEX((MONTH(create_time)));
Query OK, 0 rows affected (0.74 sec)
Records: 0  Duplicates: 0  Warnings: 0

Be aware of the double parentheses. The syntax is correct since the expression must be enclosed within parentheses to distinguish it from columns or column prefixes.

Indeed the following returns an error:

mysql> ALTER TABLE products ADD INDEX(MONTH(create_time));
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'create_time))' at line 1

Let’s check now our original query and see what happens to the EXPLAIN

mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10;
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

 

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ref
possible_keys: functional_index
          key: functional_index
      key_len: 5
          ref: const
         rows: 182
     filtered: 100.00
        Extra: NULL

 

The query is no longer a full scan and runs faster. The functional_index has been used, with only 182 rows examined. Awesome.

Thanks to the functional index we are no longer forced to rewrite the query.

Which Functional Indexes are Permitted

We have seen an example involving a simple function applied to a column, but you are granted to create more complex indexes.

A functional index may contain any kind of expressions, not only a single function. The following patterns are valid functional indexes:

INDEX( ( col1 + col2 ) )
INDEX( ( FUNC(col1) + col2 – col3 ) )

You can use ASC or DESC as well:

INDEX( ( MONTH(col1) ) DESC )

You can have multiple functional parts, each one included in parentheses:

INDEX( ( col1 + col2 ), ( FUNC(col2) ) )

You can mix functional with nonfunctional parts:

INDEX( (FUNC(col1)), col2, (col2 + col3), col4 )

There are also limitations you should be aware of:

  • A functional key can not contain a single column. The following is not permitted:
    INDEX( (col1), (col2) )
  • The primary key can not include a functional key part
  • The foreign key can not include a functional key part
  • SPATIAL and FULLTEXT indexes can not include functional key parts
  • A functional key part can not refer to a column prefix

At last, remember that the functional index is useful only to optimize the query that uses the exact same expression. An index created with nonfunctional parts can be used instead to solve multiple different queries.

For example, the following conditions can not rely on the functional index we have created:

WHERE YEAR(create_time) = 2019

WHERE create_time > ‘2019-10-01’

WHERE create_time BETWEEN ‘2019-10-01’ AND ‘2019-11-01’

WHERE MONTH(create_time+INTERVAL 1 YEAR)

All these will trigger a full scan.

Functional Index Internal

The functional indexes are implemented as hidden virtual generated columns. For this reason, you can emulate the same behavior even on MySQL 5.7 by explicitly creating the virtual column. We can test this, starting by dropping the indexes we have created so far.

mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
       Table: products
Create Table: CREATE TABLE `products` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `description` longtext,
  `price` decimal(8,2) DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `create_time` (`create_time`),
  KEY `functional_index` ((month(`create_time`)))
) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

 

mysql> ALTER TABLE products DROP INDEX `create_time`, DROP INDEX `functional_index`;
Query OK, 0 rows affected (0.03 sec)

We can try now to create the virtual generated column:

mysql> ALTER TABLE products ADD COLUMN create_month TINYINT GENERATED ALWAYS AS (MONTH(create_time)) VIRTUAL;
Query OK, 0 rows affected (0.04 sec)

Create the index on the virtual column:

mysql> ALTER TABLE products ADD INDEX(create_month);
Query OK, 0 rows affected (0.55 sec)

 

mysql> SHOW CREATE TABLE products\G
*************************** 1. row ***************************
       Table: products
Create Table: CREATE TABLE `products` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `description` longtext,
  `price` decimal(8,2) DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT NULL,
  `create_month` tinyint GENERATED ALWAYS AS (month(`create_time`)) VIRTUAL,
  PRIMARY KEY (`id`),
  KEY `create_month` (`create_month`)
) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

 

We can now try our original query. We expect to see the same behavior as the functional index.

mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10;
+------------+
| AVG(price) |
+------------+
| 202.982582 |
+------------+

mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
   partitions: NULL
         type: ref
possible_keys: create_month
          key: create_month
      key_len: 2
          ref: const
         rows: 182
     filtered: 100.00
        Extra: NULL

 

Indeed, the behavior is the same. The index on the virtual column can be used and the query is optimized.

The good news is that you can use this workaround to emulate a functional index even on 5.7, getting the same benefits. The advantage of MySQL 8.0 is that it is completely transparent, no need to create the virtual column.

Since the functional index is implemented as a hidden virtual column, there is no additional space needed for the data, only the index space will be added to the table.

By the way, this is the same technique used for creating indexes on JSON documents’ fields.

Conclusion

The functional index support is an interesting improvement you can find in MySQL 8.0. Some of the queries that required rewriting to get optimized don’t require that anymore. Just remember that only the queries having the same filter pattern can rely on the functional index. Then you need to create additional indexes or other functional indexes to improve other search patterns.

The same feature can be implemented on MySQL 5.7 with the explicit creation of a virtual generated column and the index.

For more detailed information, read the following page:

https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-functional-key-parts

Jan
10
2022
--

Baby Racing Car Seat From Delta Children

You read it right! Delta Children, the company known and loved for their eye-catching baby chairs and accessories, has created something for the little ones who were born with the speed in their veins.

Delta Children Sit N Play Portable Activity Seat for Babies is what you might call a sporty addition to the baby gear family. It is safe, comfortable, sturdy, and perfectly adjustable to any surface.

About Delta Children Portable Activity Chair

The Delta Children Sit N’ Play Portable Activity Seat will help your little one sit, interact, and play at home and on the road. The sturdy upright seat allows your baby to enjoy and interact with the world completely, while the beautiful design will look great in any room. The portable play seat is easy to fold for storage and take along, while the non-skid bottom will keep it secure on nearly any surface.

Your little one will love playing with the engaging race car-themed toys that help increase gross motor skills, and you’ll love how easy it is to clean–just remove the seat pad and pop it in the washing machine. The rest of the activity seat features water-and-stain-resistant fabric that’s easy to wipe clean!

This infant floor seat is perfect for traveling because of its innovative zippered design and convenient carry handle, which unzips to fold flat quickly.

Why choose a baby racing car seat?

Keeping your baby content in one spot is one of the most challenging tasks when it comes to caring for them. Babies love to move around and explore; they’re constantly crawling everywhere, looking for something they can pick up in their little hands, something that will bring them joy and entertainment.

This is where the Delta Children Sit N’ Play Portable Activity Seat comes in to help you. This product will have your child content on any surface, whether it’s at home or outside, on the porch, for example. You can place it anywhere, and your baby will be thrilled playing with the interactive toys that are included.

It is also vital to consider kids’ interests as soon as possible. You don’t want to wait until they’re older and have become bored with the toys you chose for them during their infantile stage. Let them play with what they enjoy, let them be kids while they still can, and once they get a bit older, things will start getting complicated because of what is expected from them in terms of behavior and maturity.

Quality kids’ chairs at your reach

Besides being a practical solution for your child’s entertainment needs, the Delta Children Sit N’ Play Portable Activity Seat is also very affordable. Now you can have a play seat for your infant that won’t break the bank as it costs just as much as other products on the market today.

Delta Children products were mentioned on ComfyBummy numerous times. You can, for example, see reviews for their amazing kids’ Frozen chairs or explore our guide to the Delta Children’s products.

You’ll never go wrong having Delta Children around. Their products are some of the most durable ones you’ll find, not only when it comes to kids’ furniture but also in terms of toys. Their quality is extremely high while their prices are fair enough that everyone can buy their products. You can find them on Amazon, where you can browse their many items and choose the one you like most.

The post Baby Racing Car Seat From Delta Children appeared first on Comfy Bummy.

Jan
07
2022
--

Configure wiredTiger cacheSize Inside Percona Distribution for MongoDB Kubernetes Operator

wiredTiger cacheSize Inside Percona Distribution for MongoDB Kubernetes Operator

wiredTiger cacheSize Inside Percona Distribution for MongoDB Kubernetes OperatorNowadays we are seeing a lot of customers starting to use our Percona Distribution for MongoDB Kubernetes Operator. The Percona Kubernetes Operators are based on best practices for the configuration of a Percona Server for MongoDB replica set or the sharded cluster. The main component in MongoDB is the wiredTiger cache which helps to define the cache used by this engine and we can set it based on our load.

In this blog post, we will see how to define the resources’ memory and set the wiredTiger cache for the shard replicaset to improve the performance of the sharded cluster.

The Necessity of WT cache

The parameter storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache. To accommodate the additional consumers of RAM, you may have to set WiredTiger’s internal cache size properly.

Starting from MongoDB 3.4, the default WiredTiger internal cache size is the larger of either:

50% of (RAM - 1 GB), or 256 MB.

For example, on a system with a total of 4GB of RAM the WiredTiger cache will use 1.5GB of RAM (0.5 * (4 GB – 1 GB) = 1.5 GB). Conversely, a system with a total of 1.25 GB of RAM will allocate 256 MB to the WiredTiger cache because that is more than half of the total RAM minus one gigabyte (0.5 * (1.25 GB – 1 GB) = 128 MB < 256 MB).

WT cacheSize in Kubernetes Operator

The mongodb wiredTiger cacheSize can be tune with the parameter storage.wiredTiger.engineConfig.cacheSizeRatio and its default value is 0.5. As explained above, if the system allocated memory limit is too low, then the WT cache is set to 256M or calculated as per the formula.

Prior to PSMDB operator 1.9.0, the cacheSizeRatio can be tuned under the sharding section of the cr.yaml file. This is deprecated from v1.9.0+ and unavailable from v1.12.0+. So you have to use the cacheSizeRatio parameter available under replsets configuration instead. The main thing that you will need to check here before changing the cacheSize is to make sure that the resources’ memory limit allocated is also available as per your cacheSize’s requirement. i.e the below section limiting the memory:

     resources:
       limits:
         cpu: "300m"
         memory: "0.5G"
       requests:
         cpu: "300m"
         memory: "0.5G"

 

https://github.com/percona/percona-server-mongodb-operator/blob/main/pkg/psmdb/container.go#L307

From the source code that calculates the mongod.storage.wiredTiger.engineConfig.cacheSizeRatio:

// In normal situations WiredTiger does this default-sizing correctly but under Docker
// containers WiredTiger fails to detect the memory limit of the Docker container. We
// explicitly set the WiredTiger cache size to fix this.
//
// https://docs.mongodb.com/manual/reference/configuration-options/#storage.wiredTiger.engineConfig.cacheSizeGB//

func getWiredTigerCacheSizeGB(resourceList corev1.ResourceList, cacheRatio float64, subtract1GB bool) float64 {
 maxMemory := resourceList[corev1.ResourceMemory]
 var size float64
 if subtract1GB {
  size = math.Floor(cacheRatio * float64(maxMemory.Value()-gigaByte))
 } else {
  size = math.Floor(cacheRatio * float64(maxMemory.Value()))
 }
 sizeGB := size / float64(gigaByte)
 if sizeGB < minWiredTigerCacheSizeGB {
  sizeGB = minWiredTigerCacheSizeGB
 }
 return sizeGB
}

 

Changing the cacheSizeRatio

Here for the test, we deployed the PSMDB operator on GCP. You can refer here for the steps – https://www.percona.com/doc/kubernetes-operator-for-psmongodb/gke.html. With the latest operator v1.11.0, the sharded cluster has been started with a shard and a config server replicaSets along with mongos pods.

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-cluster-name-cfg-0 2/2 Running 0 4m9s
my-cluster-name-cfg-1 2/2 Running 0 2m55s
my-cluster-name-cfg-2 2/2 Running 1 111s
my-cluster-name-mongos-758f9fb44-d4hnh 1/1 Running 0 99s
my-cluster-name-mongos-758f9fb44-d5wfm 1/1 Running 0 99s
my-cluster-name-mongos-758f9fb44-wmvkx 1/1 Running 0 99s
my-cluster-name-rs0-0 2/2 Running 0 4m7s
my-cluster-name-rs0-1 2/2 Running 0 2m55s
my-cluster-name-rs0-2 2/2 Running 0 117s
percona-server-mongodb-operator-58c459565b-fc6k8 1/1 Running 0 5m45s

Now login into the shard and check the default memory allocated to the container and to the mongod instance. In below, the memory size available is 15G, but the memory limit to use in this container is 476MB only:

rs0:PRIMARY> db.hostInfo()
{
"system" : {
"currentTime" : ISODate("2021-12-30T07:16:59.441Z"),
"hostname" : "my-cluster-name-rs0-0",
"cpuAddrSize" : 64,
"memSizeMB" : NumberLong(15006),
"memLimitMB" : NumberLong(476),
"numCores" : 4,
"cpuArch" : "x86_64",
"numaEnabled" : false
},
"os" : {
"type" : "Linux",
"name" : "Red Hat Enterprise Linux release 8.4 (Ootpa)",
"version" : "Kernel 5.4.144+"
},
"extra" : {
"versionString" : "Linux version 5.4.144+ (builder@7d732a1aec13) (Chromium OS 12.0_pre408248_p20201125-r7 clang version 12.0.0 (/var/tmp/portage/sys-devel/llvm-12.0_pre408248_p20201125-r7/work/llvm-12.0_pre408248_p20201125/clang f402e682d0ef5598eeffc9a21a691b03e602ff58)) #1 SMP Sat Sep 25 09:56:01 PDT 2021",
"libcVersion" : "2.28",
"kernelVersion" : "5.4.144+",
"cpuFrequencyMHz" : "2000.164",
"cpuFeatures" : "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities",
"pageSize" : NumberLong(4096),
"numPages" : 3841723,
"maxOpenFiles" : 1048576,
"physicalCores" : 2,
"mountInfo" : [
..
..

 

The cachesize in MB of wiredTiger engine allocated in Shard is as follows:

rs0:PRIMARY> db.serverStatus().wiredTiger.cache["maximum bytes configured"]/1024/1024
256

The cache size of 256MB is too low for the real environment. So let’s see how to tune the memory limit and also the cacheSize of WT engine. You can use the parameter called cacheSizeRatio to mention the WT cache ratio (out of 1) and memlimit to mention the memory allocated to the container. To do this, edit the cr.yaml file under deploy directory in the operator to change the settings. From the PSMDB operator v1.9.0, editing cacheSizeRatio parameter under mongod section is deprecated. So for the WT cache limit, use the cacheSizeRatio parameter under the section “replsets” and to set memory, use the memlimit parameter. Setting 3G for the container and 80% of the memory calculations.

deploy/cr.yaml:58

46 configuration: |
47 # operationProfiling:
48 # mode: slowOp
49 # systemLog:
50 # verbosity: 1
51 storage:
52 engine: wiredTiger
53 # inMemory:
54 # engineConfig:
55 # inMemorySizeRatio: 0.9
56 wiredTiger:
57 engineConfig:
58 cacheSizeRatio: 0.8

 

deploy/cr.yaml:229-232:

226 resources:
227 limits:
228 cpu: "300m"
229 memory: "3G"
230 requests:
231 cpu: "300m"
232 memory: "3G"

 

Apply the new cr.yaml

# kubectl appli -f deploy/cr.yaml
perconaservermongodb.psmdb.percona.com/my-cluster-name configured

The shard pods are re-allocated and you can check the progress as follows:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-cluster-name-cfg-0 2/2 Running 0 36m
my-cluster-name-cfg-1 2/2 Running 0 35m
my-cluster-name-cfg-2 2/2 Running 1 34m
my-cluster-name-mongos-758f9fb44-d4hnh 1/1 Running 0 34m
my-cluster-name-mongos-758f9fb44-d5wfm 1/1 Running 0 34m
my-cluster-name-mongos-758f9fb44-wmvkx 1/1 Running 0 34m
my-cluster-name-rs0-0 0/2 Init:0/1 0 13s
my-cluster-name-rs0-1 2/2 Running 0 60s
my-cluster-name-rs0-2 2/2 Running 0 8m33s
percona-server-mongodb-operator-58c459565b-fc6k8 1/1 Running 0 38m

Now check the new settings of WT cache as follows:

rs0:PRIMARY> db.hostInfo().system
{
"currentTime" : ISODate("2021-12-30T08:37:38.790Z"),
"hostname" : "my-cluster-name-rs0-1",
"cpuAddrSize" : 64,
"memSizeMB" : NumberLong(15006),
"memLimitMB" : NumberLong(2861),
"numCores" : 4,
"cpuArch" : "x86_64",
"numaEnabled" : false
}
rs0:PRIMARY> 
rs0:PRIMARY> 
rs0:PRIMARY> db.serverStatus().wiredTiger.cache["maximum bytes configured"]/1024/1024
1474

Here, the memory calculation for WT is done roughly as follows (Memory limit should be more than 1G, else 256MB is allocated by default:
(Memory limit – 1G) * cacheSizeRatio

(2861 - 1) *0.8 = 1467

 

NOTE:

Till PSMDB operator v1.10.0, the operator takes the change of cacheSizeRatio only if the resources.limit.cpu is also set. This is a bug and it got fixed in v1.11.0 – refer https://jira.percona.com/browse/K8SPSMDB-603 . So if you’re in an older version, don’t be surprised and you have to make sure the resources.limit.cpu is set as well.

https://github.com/percona/percona-server-mongodb-operator/blob/v1.10.0/pkg/psmdb/container.go#L194

if limit, ok := resources.Limits[corev1.ResourceCPU]; ok && !limit.IsZero() {
args = append(args, fmt.Sprintf(
"--wiredTigerCacheSizeGB=%.2f",
getWiredTigerCacheSizeGB(resources.Limits, replset.Storage.WiredTiger.EngineConfig.CacheSizeRatio, true),
))
}

From v1.11.0:
https://github.com/percona/percona-server-mongodb-operator/blob/v1.11.0/pkg/psmdb/container.go#L194

if limit, ok := resources.Limits[corev1.ResourceMemory]; ok && !limit.IsZero() {
    args = append(args, fmt.Sprintf(
       "--wiredTigerCacheSizeGB=%.2f",
       getWiredTigerCacheSizeGB(resources.Limits, replset.Storage.WiredTiger.EngineConfig.CacheSizeRatio, true),
))
}

 

Conclusion

So based on the application load, you will need to set the cacheSize of WT for better performance. You can use the above methods to tune the cache size for the shard replicaset in the PSMDB operator.

Reference Links :

https://www.percona.com/doc/kubernetes-operator-for-psmongodb/operator.html

https://www.percona.com/doc/kubernetes-operator-for-psmongodb/gke.html

https://www.percona.com/doc/kubernetes-operator-for-psmongodb/operator.html#mongod-storage-wiredtiger-engineconfig-cachesizeratio

MongoDB 101: How to Tune Your MongoDB Configuration After Upgrading to More Memory

Jan
05
2022
--

In Application and Database Design, Small Things Can Have a Big Impact

Application and Database Design

Application and Database DesignWith modern application design, systems are becoming more diverse, varied and have more components than ever before. Developers are often forced to become master chefs adding the ingredients from dozens of different technologies and blending them together to create something tasty and amazing. But with so many different ingredients, it is often difficult to understand how the individual ingredients interact with each other. The more diverse the application, the more likely it is that some seemingly insignificant combination of technology may cause cascading effects.

Many people I talk to have hundreds if not thousands of different libraries, APIs, components, and services making up the systems they support. In this type of environment, it is very difficult to know what small thing could add up to something much bigger. Look at some of the more recent big cloud or application outages, they often have their root cause in something rather small.

Street Fight: Python 3.10 -vs- 3.9.7

Let me give you an example of something I ran across recently. I noticed that performance on two different nodes running the same hardware/application code was performing drastically different than one another. The app server was running close to 100% CPU on one server while the other was around 40%. Each had the same workload and the same database, etc. It turned out that one server was using Python 3.10.0 and the other was running 3.9.7. This combined with the MySQL Connector/Python lead to almost a 50% reduction in database throughput (the 3.10.0 release saw a regression). However, this performance change was not seen either in my PostgreSQL testing or in testing the mysqlclient connector. It happened only when running the pure python version of the MySQL connector. See the results below:

Python 3.10 -vs- 3.9.7

Note: This workload was using a warmed BP, with workload running for over 24 hours before the test. I cycled the application server making the change to either the version of python or the MySQL library. These tests are repeatable regardless of the length of the run. All data fits into memory here. I am not trying to make an authoritative statement on a specific technology, merely pointing out the complexity of layers of technology.

Looking at this purely from the user perspective, this particular benchmark simulates certain types of users.  Let’s look at the number of users who could complete their actions per second.  I will also add another pure python MySQL driver to the mix.

Note: There appears to be a significant regression in 3.10. The application server was significantly busier when using Python 3.10 and one of the pure python drivers than when running the same test in 3.9 or earlier.

The main difference between the MySQL Connector and mysqlclient is the mysqlclient is using the C libmysqlclient. Oddly the official MySQL Connector says it should switch between the pure python and C version if available, but I was not seeing that behavior ( so I have to look into it ). This resulted in the page load time for the app in Python 3.10.0 taking 0.05 seconds up from 0.03 seconds in Python 3.9.7. However, the key thing I wanted to highlight is that sometimes seemingly small or insignificant changes can lead to a drastic difference in performance and stability. You could be running along fine for months or even years before something upgrades to something that you would not think would drastically impact performance.

Can This Be Fixed With Better Testing?

You may think this is a poster child for testing before upgrading components, and while that is a requirement, it won’t necessarily prevent this type of issue. While technology combinations, upgrades, and releases can often have some odd side effects, oftentimes issues they introduce remain hidden and don’t manifest until some outside influence pops up. Note: These generally happen at the worst time possible, like during a significant marketing campaign, event, etc. The most common way I see nasty bugs get exposed is not through a release or in testing but often with a change in workload. Workload changes can hide or raise bottlenecks at the worst times. Let’s take a look at the above combination of Python version and different connectors with a different workload:

Here the combination of reporting and read/write workload push all the app nodes and database nodes to the redline. These look fairly similar in terms of performance, but the workload is hiding the above issues I mentioned. A system pushed to the red will behave differently than it will in the real world. If you ended up testing upgrading python on your app servers to 3.10.0 by pushing your systems to the max, you may see the above small regression as within acceptable limits. In reality, however, the upgrade could net you seeing a 50% decrease in throughput when moved to production.

Do We Care?

Depending on how your application is built many people won’t notice the above-mentioned performance regression after the upgrade happens. First, most people do not run their servers even close to 100% load, adding more load on the boxes may not immediately impact their user’s performance. Adding .02 seconds of load time to a user may be imperceptible unless under heavy load ( which would increase that load time). The practical impact is to speed up the point at which you need to either add more nodes or upgrade their instances sooner.

Second, scaling application nodes automatically is almost a requirement in most modern cloud-native environments. Reaching a point where you need to add more nodes and more processing power will come with increases in users on your application, so it is easily explained away.

Suppose users won’t immediately notice and the system will automatically expand as needed ( preventing you from knowing or getting involved ). In that case, do we, or should we care about adding more nodes or servers? Adding nodes is cheap; it is not free.

First, there is a direct cost to you. Take your hosting costs for your application servers and double them in the case above. What is that? $10K, $100K, $1M a year? That is money that is wasted. Look no further than the recent news lamenting the ever-increasing costs of the cloud i.e.:

Second, there is a bigger cost that comes with complexity. Observability is such a huge topic because everything we end up doing in modern environments is done in mass. The more nodes and servers you have the more potential exists for issues or one node behaving badly. While our goal is to create a system where everything is replaceable and can be torn down and rebuilt to overcome problems, this is often not the reality. Instead, we end up replicating bad code, underlying bottlenecks, and making a single problem a problem at 100x the scale.

We need to care. Application workload is a living entity that grows, shrinks, and expands with users. While modern systems need to scale quickly up and down to meet the demand, that should not preclude us from having to look out for hidden issues and bottlenecks. It is vitally important that we understand our applications workloads and look for deviations in the normal patterns.  We need to ask why something changed and dig in to find the answer.  Just because we can build automation to mitigate the problem does not mean we should get complacent and lazy about fixing and optimizing our systems.

Jan
05
2022
--

Comparing AMD EPYC Performance with Intel Xeon in GCP

AMD EPYC Intel Xeon

AMD EPYC Intel XeonRecently we were asked to check the performance of the new family of AMD EPYC processors when using MySQL in Google Cloud Virtual Machines. This was motivated by a user running MySQL in the N1 machines family and willing to upgrade to N2D generation considering the potential cost savings using the new AMD family. 

The idea behind the analysis is to do a side-by-side comparison of performance considering some factors: 

  • EPYC processors have demonstrated better performance in purely CPU-based operations according to published benchmarks. 
  • EPYC platform has lower costs compared to the Intel Xeon platform. 

The goal of this analysis is to check if cost reductions by upgrading from N1 to N2D are worth the change to avoid suffering from performance problems and eventually reduce the machine size from the current 64 cores based (N1 n1-highmem-64 – Intel Haswell) to either N2D 64 cores (n2d-highmem-64 – AMD Rome) or even to 48 cores (n2d-highmem-48 – AMD Rome), to provide some extra context we included N2 (the new generation of Intel machines) into the analysis. 

In order to do a purely CPU performance comparison we created 4 different VMs:

NAME: n1-64
MACHINE_TYPE: n1-highmem-64
Intel Haswell – Xeon 2.30GHz
*This VM corresponds to the same type as the type we use in Production.

NAME: n2-64
MACHINE_TYPE: n2-highmem-64
Intel Cascade Lake – Xeon 2.80GHz

NAME: n2d-48
MACHINE_TYPE: n2d-highmem-48
AMD Epyc Rome – 2.25Ghz

NAME: n2d-64
MACHINE_TYPE: n2d-highmem-64
AMD Epyc Rome – 2.25Ghz

For the analysis, we used MySQL Community Server 5.7.35-log and this is the basic configuration:

[mysqld]
datadir   = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
log-error   = /var/lib/mysql/mysqld.err
pid-file = /var/run/mysqld/mysqld.pid
server_id                       = 100
log_bin
binlog_format                   = ROW
sync_binlog                     = 1000
expire_logs_days                = 2
skip_name_resolve

innodb_buffer_pool_size         = 350G
innodb_buffer_pool_instances    = 32
innodb_concurrency_tickets      = 5000
innodb_thread_concurrency       = 128
innodb_write_io_threads         = 16
innodb_read_io_threads          = 16
innodb_flush_log_at_trx_commit  = 1
innodb_flush_method             = O_DIRECT
innodb_log_file_size            = 8G
innodb_file_per_table           = 1
innodb_autoinc_lock_mode        = 2
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_load_at_startup  = 1

table_open_cache                = 5000
thread_cache_size               = 2000
query_cache_size                = 0
query_cache_type                = 0

In all cases, we placed a 1TB balanced persistent drive so we get enough IO performance for the tests. We wanted to normalize all the specs so we can focus on the CPU performance, so don’t pay too much attention to the chances for improving performance for IO operations and so. 

The analysis is based on sysbench oltp read-only workload with an in-memory dataset, the reason for this is that we want to generate traffic that can saturate CPU while not being affected by IO or Memory. 

The approach for the benchmark was also simple, we executed RO OLTP work for 16, 32, 64, 128, and 256 threads with a one-minute wait between runs. Scripts and results from tests can be found here

Let’s jump into the analysis, these are the number of Queries that instances are capable to run: 

MySQL Queries

The maximum amount of TPS by Instance Type by the number of threads:

Threads/Instance N1-64 N2-64 N2D-48 N2D-64

16

164k 230k 144k

155k

32

265k 347k 252k 268k

64

415k 598k 345k

439k

128

398k 591k 335k

444k

256 381k 554k 328k

433k

Some observations: 

  • In all cases we reached the maximum TPS at 64 threads, this is somehow expected as we are not generating CPU context switches. 
  • Roughly we get a maximum of 598k tps in n2-highmem-64 and 444k tps in n2d-highmem-64 instance types which are the bigger ones. While this is expected Intel-based architecture outperforms AMD by a 35% 
  • Maximum tps seems to be reached with 64 threads, this is expected considering the number of CPU threads we can use in parallel. 
  • While n1-highmem-64 (Intel Xeon) and n2d-highmem-48 (AMD Epyc) seems to start suffering performance issues when the amount of threads exceeds the max number of cores the bigger instances running with 64 cores are capable to sustain the throughput a bit better, these instances start to be impacted when we reach 4x the amount of CPU cores. 

Let’s have a look at the CPU utilization on each node:

CPU utilization on each node

Additional observations: 

  • n1-highmem-64 and n2d-highmem-48 are reaching 100% utilization at 64 threads running. 
  • With 64 threads running n2-highmem-64 reaches 100% utilization while n2d-highmem-64 is still below. Although Intel provides better throughput overall probably by having a faster CPU clock (2.8Ghz vs 2.25Ghz) 
  • For 128 and 256 threads all CPUs show similar utilization. 

For the sake of analysis this is the estimated costs of each of used machines (at the moment of writing the post):
n1-highmem-64 $2,035.49/month = $0.000785297/second
n2-highmem-64 $2,549.39/month = $0.000983561/second
n2d-highmem-48 $1,698.54/month = $0.000655301/second
n2d-highmem-64 $2,231.06/month = $0.000860748/second

Costs above will give us roughly at peaks of TPS:
n1-highmem-64 costs are $0.0000000019/trx
n2-highmem-64 costs are $0.0000000016/trx
n2d-highmem-48 costs are $0.0000000019/trx
n2d-highmem-64 costs are $0.0000000019/trx

Conclusions

While this is not a super exhaustive analysis of all implications of CPU performance for MySQL workload we get a very good understanding of cost vs performance analysis. 

  • n1 family, currently used in production, shows very similar performance to n2d family (AMD) when running with the same amount of cores. This changes a lot when we move into the n2 family (Intel) which outperforms all other instances. 
  • While the cut in costs for moving into n2d-highmem-48 will represent ~$4k/year the performance penalty is close to 20%.
  • Comparing the costs per trx at peaks of loads we can see that both n2-64 and n2d-64 are pretty much the same but n2-64 will give us 35% more throughput, this is definitely something to consider if we plan to squeeze the CPU power.   
  • If the consideration is to go with n2 generation then definitely the n2d-highmem-64 is a very good choice to balance performance and costs but n2-highmem-64 will give much better performance per dollar spent. 
Jan
04
2022
--

Percona Server for MySQL Encryption Options and Choices

Percona Server for MySQL Encryption

Percona Server for MySQL EncryptionSecurity will always be a main focal point of a company’s data. A common question I get from clients is, “how do I enable encryption?” Like every good consulting answer, it depends on what you are trying to encrypt. This post is a high-level summary of the different options available for encryption in Percona Server for MySQL.

Different certifications require different levels of encryption. For example, PCI requires both encryptions of data at rest and in transit. Here are the main facets of encryption for MySQL:

  • Data at Rest
    • Full disk encryption (at the OS level)
    • Transparent Data Encryption – TDE
    • Column/field-level encryption
  • Data in Transit
    • TLS Connections

Data at Rest

Data at rest is frequently the most asked about part of encryption. Data at rest encryption has multiple components, but at the core is simply ensuring that the data is encrypted at some level when stored. Here are the primary ways we can look at the encryption of data at rest.

Full Disk Encryption (FDE)

This is the easiest and most portable method of encrypting data at rest. When using full disk encryption, the main goal is to protect the hard drives in the event they are compromised. If a disk is removed from the server or the server is removed from a rack, the disk isn’t readable without the encryption key.

This can be managed in different ways, but the infrastructure team generally handles it. Frequently, enterprises already have disk encryption as part of the infrastructure stack. This makes FDE a relatively easy option for data at rest encryption. It also has the advantage of being portable. Regardless of which database technology you use, the encryption is managed at the server level.

The main disadvantage of FDE is that when the server is running, and the disk is mounted, all data is readable. It offers no protection against an attack on a running server once mounted.

Transparent Data Encryption (TDE)

Moving up the chain, the next option for data at rest encryption is Transparent Data Encryption (TDE). In contrast to FDE, this method encrypts the actual InnoDB data and log files. The main difference with database TDE is that the encryption is managed through the database, not at the server level. With this approach, the data and log files are encrypted on disk by the database. As data is read by MySQL/queries, the encrypted pages are read from disk and decrypted to be loaded into InnoDB’s buffer pool for execution.

For this method, the encryption keys are managed either through local files or a remote KMS (such as Hashicorp Vault) with the keyring_plugin. While this approach helps prevent any OS user from simply copying data files, the decrypted data does reside in memory which could be susceptible to a clever hacker. We must rely on OS-level memory protections for further assurance. It also adds a level of complexity for key management and backups that is now shifted to the DBA team.

Column Level Encryption

While the prior methods of at-rest encryption can help to meet various compliance requirements, both are limited when it comes to a running system. In either case, if a running system is compromised, the data stored is fully readable. Column level encryption works to protect the data in a running system without a key. Without a key, the data in the encrypted column is unreadable.

While this method protects selected data in a running system, it often requires application-level changes. Inserts are done with a specific encryption function (AES_ENCRYPT in MySQL, for example). To read the data, AES_DECRYPT with the specified key is required. The main risk with this approach is sending the plaintext values as part of the query. This can be sniffed if not using TLS or potentially leaked through log files. The better approach is to encrypt the data in the application BEFORE sending it to MySQL to ensure no plaintext is ever passed between systems.

In some cases, you can use a shared key for the entire application. Other approaches would be to use an envelope method and store a unique key alongside each encrypted value (protected by a separate master key).

Either way, it is important to understand one of the primary downsides to this approach – indexes and sort order can and will be impacted. For example, if you are encrypting the SSN number, you won’t be able to sort by SSN within MySQL. You would be able to look up a row using the SSN number but would need to pass the encrypted value.

Data in Transit

Now that we’ve discussed the different types of data-at-rest encryption, it is important to encrypt traffic to and from the database. Connecting to the server via TLS ensures that any sensitive sent to or from the server is encrypted. This can prevent data from leaking over the wire or via man-in-the-middle attacks.

This is a straightforward way to secure communication, and when combined with some at-rest encryption, serves to check a few more boxes towards various compliances.

Summary

Overall, there are several aspects of encryption in MySQL. This makes it possible to meet many common compliance requirements for different types of regulations. Security is a critical piece of the database tier, and these discussions are needed across teams in an organization. Ensuring that security, infrastructure, and the database team are on the same page is essential, especially during the design phase. Let our Professional Services team help you implement the approach that is best suited for your requirements – we are here to help!

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Jan
04
2022
--

Deuter Kid Comfort Pro – Child Carrier You Can Trust

When you’re out for a hike with your kids, the most important thing about choosing the right child carrier is to make sure it’s going to be comfortable not only for yourself but also for your child. I was more than happy to test a child carrier from a reputable German company Deuter which has been designing and building top-quality backpacks and child carriers since 1938.

The popularity of Deuter products does not come by accident. Their backpacks and carriers are stylish, comfortable, practical, and durable. The customers love the great mix of reliable materials, functional design, and good looks. Deuter uses quality products and modern designs to meet all of your child’s comfort needs.

This product has a high rating of 4.8 stars with over 200 reviews on Amazon. There is no doubt that this child carrier certainly is one of the best available at the moment on the market, but is it worth every cent you pay for? It indeed comes packed with features but let’s look at them now.

Deuter Kid Comfort Pro Features

  • Aircomfort back system
  • The large VariFlex ECL hip fins are energy-efficient and can be adjusted for maximum comfort.
  • With VariSlide back-length with the wide-ranging adjustment, you may adjust the child carrier to fit either parent comfortably.
  • The Pull-Forward system construction makes it simple to adjust the hip belt even when you’re carrying a lot of weight.
  • The height-adjustable child’s seat has a variable cushion width to promote a healthy sitting posture.
  • Integrated safety harness – keeps kids safe and sound while you’re biking
  • 3 outer pockets for snacks, toys, and more
  • The durable aluminum frame with a sturdy kickstand is tip-resistant, which is very useful when loading the child.
  • Permanently integrated sunroof and a mirror for a rearview. Child Carrier comes with a backpack that may be used separately.

Deuter Kid Comfort Pro child carrier is suitable for children from 8 months old up to 45 pounds or 4 years of age. It has a ventilated back system that allows air to circulate, which is excellent news for your child’s comfort as not many offer this feature. It also comes with lumbar support and elastic on the hip belt, making it exceptionally comfortable for you, especially when carrying your child for more extended periods.

This child carrier comes with a sunshade, and we all know how precious this can be when you take your kid out, and the elements (or other people) get in their eyes. The sun can also heat up quite quickly, and it will make them hot and fussy, so if you plan on taking your child out for a long time, make sure you take something along to protect them.

It’s also really lightweight (8 lbs 5 oz), even considering all its features! It can be easily stored into the boot of your car or just about anywhere else when you aren’t using it while still having access to everything you need while out and about with your child.

It also comes with a rain cover to protect your child when the weather is not in your favor. This rain cover is operated by just one zip, which you will find on the front of the carrier. You can see how much easier this makes it for when you need it. Don’t get caught in the elements when you have your child with you, no matter where you are!

If you plan on going on a hike or anything else of the sort with your child, this carrier is comfortable enough for it. The shoulder straps come with adjustable load lifters to get the right comfort level no matter what you are doing.

Pros of the Deuter Kid Comfort Pro

  1. The Deuter’s padded back system ensures your child is both comfortable and safe during a hike, as it distributes their weight equally on your shoulders and your back. This helps to reduce pressure points that can cause pain and put you off from doing the activities you love.
  2. The unique child harness protects your child as it’s designed to keep them securely in their seat, right throughout even the roughest trails or any unexpected accidents that might happen along the way! This is especially valuable if your little one has fallen asleep on the track, as the harness will help to keep them seated in comfort against your back.
  3. The additional safety features of the Deuter Kid Comfort Pro will put your mind at ease as it comes with a fully adjustable footrest and the 5-point harness that is padded and reinforced for extra sturdiness and durability. The footwell can be adjusted depending on your child’s size and allows them to participate in the hike instead of just getting carried along.
  4. This child carrier is perfect for both short and tall parents as it has several height adjustments so that even if you are very short or very tall, this unit will work well with you!
  5. The AirComfort back system allows you to customize the unit to fit you as it uses a mesh system that promotes airflow and ensures your child’s back is well ventilated as they sit against your spine.
  6. It is pretty roomy, which means your trip with this carrier won’t be cramped at all! You will easily go on hikes of up to several days and still ensure your child’s comfort.
  7. This is a lightweight carrier, which means you will have no trouble carrying it along with the rest of your gear without feeling too weighed down! It weighs just over 8 pounds, so you won’t feel the added weight even if you are trekking for hours on end.
  8. The Deuter Kid Comfort Pro is made of high-quality materials that are tear-resistant and provide just the right amount of flexibility as you take your child on a hike along a trail.
  9. This Deuter product is entirely free of per- and polyfluorinated chemicals. As a result, it lowers the amount of environmentally harmful chemicals that pollute the environment and endanger human health. PVC is used for various applications, including rain protection since it has dirt- and grease-repellent characteristics. Instead, Deuter employs DWR (Durable Water Repellency) impregnation, which is non-toxic to people and the environment.

Cons of the Deuter Kid Comfort Pro

  1. Some parents have mentioned that the seat is not entirely flat, which could be uncomfortable for your little one. This is, of course, entirely up to the child’s preference, but it is worth keeping that in mind.
  2. Some parents have complained that the sunshade is flimsy and not quite big enough, so you might want to opt for another child carrier if you are hiking in extreme heat conditions.
  3. This carrier is definitely on the pricier end of the scale. Still, if you want a durable and genuinely reliable hiking carrier, then this is something we highly recommend you invest in! It will not only give your child a comfortable ride, but it will also last for years and years if taken care of properly.

Conclusion – is Deuter Kid Comfort Pro worth it?

Yes, we do highly recommend the Deuter Kid Comfort Pro! This is one of the top-rated childrens’ carriers on Amazon, and while it is pricier than most, it certainly delivers. It offers unparalleled comfort for your child along with extreme durability making it perfect for long treks in any terrain or climate condition. While the sunshade is a little on the light side, that can be remedied by using an umbrella with this unit, as it does have multiple adjustable harnesses and footwells to accommodate you and your child’s unique needs.

The high-quality product design ensures maximum safety for your child as well as a truly comfortable ride with a very sturdy back system and a well-padded seat that is made with mesh for airflow, keeping your child cool throughout the entire hiking adventure. Overall, this unit is excellent for hikers of any level who want to ensure their child’s safety and comfort while enjoying nature from a new perspective.

The reviews for this product on Amazon.com are very encouraging, with many shoppers seeing value in the Deuter Kid Comfort Pro for their families.

The post Deuter Kid Comfort Pro – Child Carrier You Can Trust appeared first on Comfy Bummy.

Jan
04
2022
--

Taking a Look at BTRFS for MySQL

BTRFS for MySQL

BTRFS for MySQLFollowing my post MySQL/ZFS Performance Update, a few people have suggested I should take a look at BTRFS (“butter-FS”, “b-tree FS”) with MySQL. BTRFS is a filesystem with an architecture and a set of features that are similar to ZFS and with a GPL license. It is a copy-on-write (CoW) filesystem supporting snapshots, RAID, and data compression. These are compelling features for a database server so let’s have a look.

Many years ago, in 2012, Vadim wrote a blog post about BTRFS and the results were disappointing. Needless to say that since 2012, a lot of work and effort has been invested in BTRFS. So, this post will examine the BTRFS version that comes with the latest Ubuntu LTS, 20.04. It is not bleeding edge but it is likely the most recent release and Linux kernel I see in production environments. Ubuntu 20.04 LTS is based on the Linux kernel 5.4.0.

Test Environment

Doing benchmarks is not my core duty at Percona, as, before all, I am a consultant working with our customers. I didn’t want to have a cloud-based instance running mostly idle for months. Instead, I used a KVM instance in my personal lab and made sure the host could easily provide the required resources. The test instance has the following characteristics:

  • 4 CPU
  • 4 GB of RAM
  • 50 GB of storage space throttle at 500 IOPS and 8MB/s

The storage bandwidth limitation is to mimic the AWS EBS behavior capping the IO size to 16KB. With KVM, I used the following iotune section:

<iotune>
     <total_iops_sec>500</total_iops_sec>
     <total_bytes_sec>8192000</total_bytes_sec>
  </iotune>

These IO limitations are very important to consider in respect to the following results and analysis.

For these benchmarks, I use Percona Server for MySQL 8.0.22-13 and unless stated otherwise, the relevant configuration variables are:

[mysqld]
skip-log-bin
datadir = /var/lib/mysql/data
innodb_log_group_home_dir = /var/lib/mysql/log
innodb_buffer_pool_chunk_size=32M
innodb_buffer_pool_size=2G
innodb_lru_scan_depth = 256 # given the bp size, makes more sense
innodb_log_file_size=500M
innodb_flush_neighbors = 0
innodb_fast_shutdown = 2 # skip flushing
innodb_flush_method = O_DIRECT # ZFS uses fsync
performance_schema = off

Benchmark Procedure

For this post, I used the sysbench implementation of the TPCC. The table parameter was set to 10 and the size parameter was set to 20. Those settings yielded a MySQL dataset size of approximately 22GB uncompressed.

All benchmarks used eight threads and lasted two hours. For simplicity, I am only reporting here the total number of events over the duration of the benchmark. TPCC results usually report only one type of event, New order transactions. Keep this in mind if you intend to compare my results with other TPCC benchmarks.

Finally, the dataset is refreshed for every run either by restoring a tar archive or by creating the dataset using the sysbench prepare option.

BTRFS

BTRFS was created and mounted using:

mkfs.btrfs /dev/vdb -f
mount -t btrfs /dev/vdb /var/lib/mysql -o compress-force=zstd:1,noatime,autodefrag
mkdir /var/lib/mysql/data; mkdir /var/lib/mysql/log;
chown -R mysql:mysql /var/lib/mysql

ZFS

ZFS was created and configured using:

zpool create -f bench /dev/vdb
zfs set compression=lz4 atime=off logbias=throughput bench
zfs create -o mountpoint=/var/lib/mysql/data -o recordsize=16k -o primarycache=metadata bench/data
zfs create -o mountpoint=/var/lib/mysql/log bench/log
echo 2147483648 > /sys/module/zfs/parameters/zfs_arc_max
echo 0 > /sys/module/zfs/parameters/zfs_arc_min

Since the ZFS file cache, the ARC, is compressed, an attempt was made with 3GB of ARC and only 256MB of buffer pool. I called this configuration “ARC bias”.

ext4

mkfs.ext4 -F /dev/vdb"
mount -t ext4 /dev/vdb /var/lib/mysql -o noatime,norelatime"

Results

TPCC events

The performance results are presented below. I must admit my surprise at the low results of BTRFS. Over the two-hour period, btrfs wasn’t able to reach 20k events. The “btrfs tar” configuration restored the dataset from a tar file instead of doing a “prepare” using the database. This helped BTRFS somewhat (see the filesystem size results below) but it was clearly insufficient to really make a difference. I really wonder if it is a misconfiguration on my part, contact me in the comments if you think I made a mistake.

The ZFS performance is more than three times higher, reaching almost 67k events. By squeezing more data in the ARC, the ARC bias configuration even managed to execute more events than ext4, about 97k versus 94k.

TPCC performance comparison

TPCC performance comparison, ext4, BTRFS and ZFS

Filesystem Size

Performance, although an important aspect, is not the only consideration behind the decision of using a filesystem like BTRFS or ZFS. The impacts of data compression on filesystem size is also important. The resulting filesystem sizes are presented below.

TPCC dataset size comparison, ext4, BTRFS and ZFS

TPCC dataset size comparison, ext4, BTRFS and ZFS

The uncompressed TPCC dataset size with ext4 is 21GB. Surprisingly, when the dataset is created by the database, with small random disk operations, BTRFS appears to not compress the data at all. This is in stark contrast to the behavior when the dataset is restored from a tar archive. The restore of the tar archive causes large sequential disk operations which trigger compression. The resulting filesystem size is 4.2GB, a fifth of the original size.

Although this is a great compression ratio, the fact that normal database operations yield no compression with BTRFS is really problematic. Using ZFS with lz4 compression, the filesystem size is 6GB. Also, the ZFS compression ratio is not significantly affected by the method of restoring the data.

The performance issues of BTRFS have also been observed by Phoronix; their SQLite and PostgreSQL results are pretty significant and inline with the results presented in this post. It seems that BTRFS is not optimized for the small random IO type of operations requested by database engines. Phonorix recently published an update for the kernel 5.14 and the situation may have improved. The BTRFS random IO write operation results still seems to be quite low.

Conclusion

Although I have been pleased with the ease of installation and configuration of BTRFS, a database workload seems to be far from optimal for it. BTRFS struggles with small random IO operations and doesn’t compress the small blocks. So until these shortcomings are addressed, I will not consider BTRFS as a prime contender for database workloads.

Jan
03
2022
--

Talking Drupal #328 – 2021 in Review

Today we are talking about the year in Review.

TalkingDrupal.com/328

Topics

  • John – Office light
  • Stephen – New favorite podcast
  • Nic – Finally found a graphics card
  • FLDC
  • TD 2021
    • Stephen stepped back from weekly recordings
    • Changes at TD
    • Division of labor
    • Production Schedule
    • Keynote at GovCon
  • Drupal 8 EoL
  • Drupal 7 EoL approaching
  • Virtual Conferences
  • Tech changes
    • Stephen
      • Split keyboard
      • synology
      • SBC and micro controllers
    • John
      • Linux
      • Alexa
      • Fire Tablets
    • Nic
      • TV Shows
  • Favorite moments in 2021
    • Stephen
      • Kids growing up
    • John
      • Changing Jobs
      • Kids growing up
    • Nic
      • Kitako growing up
      • Cooking
  • Thoughts for 2022
    • Stephen
      • New year, new start
    • John
      • In person events
      • More contribution
    • Nic
      • Take stock of life

Resources

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Stephen Cross – @stephencross

Jan
03
2022
--

MongoDB – Converting Replica Set to Standalone

Converting Replica Set to Standalone MongoDB

Converting Replica Set to Standalone MongoDBOne of the reasons for using MongoDB is its simplicity in scaling its structure,  either as Replicaset(scale-up) or as Sharded Cluster(scale-out).

Although a bit tricky in Sharded environments, the reverse process is also feasible.

From the possibilities, you can:

So far, so good, but if for some reason that mongo project which started as a Replica Set does not require such structure and now a Standalone server meets the requirements; 

  • Can you shrink from Replica Set to a Standalone server? 
    • If possible, what is the procedure to achieve this configuration?

This blog post will walk through the necessary steps to convert a Replica Set into a Standalone server.

Observations:

Before starting the process, it is important to mention that it’s not advised to run Standalone configuration on production servers as your entire availability will only rely on a single point.

Another observation is this activity requires downtime, that’s because you will need to remove the replication parameter, which can not be at runtime.

Last note; to avoid any data change during the procedure, it’s recommended to stop writing from the application. You will need to adjust the connection string at the end of the process for the new configuration.

How To:

The testing environment is a standard Replica Set configuration with three nodes, as seen at the following replication status.

[
	{
		"_id" : 0,
		"name" : "node1:27017",
		"stateStr" : "PRIMARY"
	},
	{
		"_id" : 1,
		"name" : "node2:27017",
		"stateStr" : "SECONDARY",
	},
	{
		"_id" : 2,
		"name" : "node3:27017",
		"stateStr" : "SECONDARY",
]

  • The PRIMARY is our reliable data source; thus, it will be the Standalone server at the end of the process.

It is not needed to trigger any election process unless you have a preferred node to be the Standalone. If that’s your case and that node is not the PRIMARY yet, please make sure to make PRIMARY before starting the procedure.

That said, let’s get our hands dirty!

1) Removing the secondaries:

  • From the PRIMARY node, let’s remove node 2 and 3:
rs0:PRIMARY> rs.remove("node2:27017")
{
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1640109346, 1),
		"signature" : {
			"hash" : BinData(0,"663Y5u3GkZkQ0Ci7EU8IYUldVIM="),
			"keyId" : NumberLong("7044208418021703684")
		}
	},
	"operationTime" : Timestamp(1640109346, 1)
}

rs0:PRIMARY> rs.remove("node3:27017")
{
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1640109380, 1),
		"signature" : {
			"hash" : BinData(0,"RReAzLZmHWfzVcdnLoGJ/uz04Vo="),
			"keyId" : NumberLong("7044208418021703684")
		}
	},
	"operationTime" : Timestamp(1640109380, 1)
}

Please note: Although my Replica Set comprises three nodes, you must let only the PRIMARY on replication at this step. If you have more nodes, the process is the same.

Then, the expected status should be:

rs0:PRIMARY> rs.status().members
[
	{
		"_id" : 0,
		"name" : "node1:27017",
		"health" : 1,
		"state" : 1,
		"stateStr" : "PRIMARY",
		"uptime" : 1988,
		"optime" : {
			"ts" : Timestamp(1640109560, 1),
			"t" : NumberLong(1)
		},
		"optimeDate" : ISODate("2021-12-21T17:59:20Z"),
		"syncSourceHost" : "",
		"syncSourceId" : -1,
		"infoMessage" : "",
		"electionTime" : Timestamp(1640107580, 2),
		"electionDate" : ISODate("2021-12-21T17:26:20Z"),
		"configVersion" : 5,
		"configTerm" : 1,
		"self" : true,
		"lastHeartbeatMessage" : ""
	}
]

 

2) Changing the configuration:

  • Even though node2 and node3 are no longer part of the replication, let’s stop the mongod process. Keeping their data safe, as it can be used in a recovery scenario if necessary:
node2-shell> systemctl stop mongod
node3-shell> systemctl stop mongod

  • Then, you can remove the replication parameter from PRIMARY and restart it, but first! In the configuration file:
    • Please remove or comment the lines:
#replication:
#  replSetName: "rs0"

If you start MongoDB via the command line, please remove the option  --replSet.

  • Once changed, you can restart the mongod on PRIMARY:
node1-shell> systemctl restart mongod

3) Removing replication objects:

  • Once connected to former PRIMARY, now Standalone, MongoDB will warn you with the following message:
2021-12-21T18:23:52.056+00:00: Document(s) exist in 'system.replset', but started without --replSet. Database contents may appear inconsistent with the writes that were visible when this node was running as part of a replica set. Restart with --replSet unless you are doing maintenance and no other clients are connected. The TTL collection monitor will not start because of this. For more info see http://dochub.mongodb.org/core/ttlcollections

 

That is because the node was working under a Replica Set configuration, and although it’s a Standalone now, there is the old structure existing.

For further detail, that structure resides under the local database, which holds all necessary data for replication, and for the database itself:

mongo> use local
switched to db local

mongo> show collections
oplog.rs
replset.election
replset.minvalid
startup_log
system.replset

 

To remove that warning, you should remove the old Replica Set object on local.system.replset.

local.system.replset holds the replica set’s configuration object as its single document. To view the object’s configuration information, issue rs.conf() from mongosh. You can also query this collection directly.

However, it’s important to mention the local.system.replset is a system object, and there is no built-in role that you can use to remove objects –  Unless you create a custom role that grants anyAction on anyResource to the user, Or you grant the internal role __system to a user.

?

Please note, both of that options gives access to every resource in the system and is intended for internal use. Do not use this permissions, other than in exceptional circumstances.  

For this process, we will create a new user, temporary, with __system privilege in which we can drop later after the action.

3. a) Creating the temporary user:

mongo> db.getSiblingDB("admin").createUser({user: "dba_system", "pwd": "sekret","roles" : [{ "db" : "admin", "role" : "__system" }]});

Confirmation output:

Successfully added user:{
   "user":"dba_system",
   "roles":[
      {
         "db":"admin",
         "role":"__system"
      }
   ]
}

3. b) Removing the object:
  1. First; Connect with the created user.
  2. Then:
mongo> use local;
mongo> db.system.replset.remove({})

Confirmation output:

WriteResult({ "nRemoved" : 1 })

 

3. c) Dropping the temporary user:
  1. Disconnect from that system user.
  2. Then:
mongo> db.dropUser("dba_system")

 

4) Restart the service:

shell> systemctl restart mongod

 

Once reconnected, you will not see that warning again, and the server is ready!

 

5) Connection advice:

  • Please make sure to adjust the connection string from the app and clients, passing only the Standalone server.

 

Conclusion

The conversion process, in general, is simple and should not face problems. But is essential to highlight that we strongly recommend you test and run this procedure in a lower environment. And if you plan to use it on production, please bear in mind the said before.

And if you have any questions, feel free to contact us in the comment section below!

 

See you!

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com