Sep
27
2022
--

Give Me Some Latitude… and Longitude

Geo locations in MySQL

Geo locations in MySQLGeo locations are a cornerstone of modern applications. Whether you’re a food delivery business or a family photographer, knowing the closest “something” to you or your clients can be a great feature.

In our ‘Scaling and Optimization’ training class for MySQL, one of the things we discuss is column types. The spatial types are only mentioned in passing, as less than 0.5% of MySQL users know of their existence (that’s a wild guess, with no factual basis). In this post, we briefly discuss the POINT type and how it can be used to calculate distances to the closest public park.

Import the data

To start off, we need a few tables and some data. The first table will hold the mapping between the zip code and its associated latitude/longitude. GeoNames has this data under the Creative Commons v3 license, available here for most countries. The data files are CSV and the readme.txt explains the various columns. We are only interested in a few of the columns.

CREATE TABLE usazips (
 id int unsigned NOT NULL AUTO_INCREMENT,
 zipCode int unsigned NOT NULL COMMENT 'All USA postal codes are integers',
 state varchar(20) NOT NULL,
 placeName varchar(200) NOT NULL,
 placeDesc varchar(100) DEFAULT NULL,
 latLong point NOT NULL /*!80003 SRID 4326 */,
 PRIMARY KEY (id));

There are a couple of things about this schema that are not typical in MySQL.

Firstly, the latLong column type is POINT which can store an X and Y coordinate. By default, these coordinates could be on any plane. They could represent latitude and longitude, but they could also be centimeters up and down on the surface of your desk, or yards left and right of the biggest tree at your local park. How do you know which? You need a Spatial Reference System. Thankfully, MySQL comes preloaded with about 5,100 of these systems (See INFORMATION_SCHEMA.ST_SPATIAL_REFERENCE_SYSTEMS). Each SRS has an associated Spatial Reference ID (SRID).

The SQL comment on the POINT column above ties the data in this column with a specific SRID, 4326. We can see in the I_S table noted earlier, this SRID maps to the ‘World Geodetic System 1984’ (aka WGS84) which is the same SRS used in the GeoNames dataset. By having our column and the dataset aligned, we won’t need to do any geo-transformation later on.

The second thing to note in the schema is the use of a SPATIAL index on the latLong column. A SPATIAL index in MySQL is created using R-Trees which are geospatial-specific data structures.

Let’s load this data into the table. There are 12 fields in the CSV but we only care about six of them. The LOAD DATA command allows you to associate each CSV field, positionally, with either the column name in the table or a user variable. The first field in our CSV is the country code, which we don’t care about, so we assign that field to the @dummy variable. The second field is the zip code, and we want that to go directly into the table-column so we specify the column name. We do the same for the third, fifth, sixth, and ninth fields.

The 10th and 11th CSV fields are the latitude and longitude. We need to convert those two VARCHAR fields into a POINT. We can apply some SQL transformations using the SET command which can reference user variables assigned earlier. These two fields are assigned to @lat and @lon. The remaining fields all go into @dummy since they are not used.

LOAD DATA INFILE '/var/lib/mysql-files/US.txt'
INTO TABLE usazips
FIELDS TERMINATED BY '\t' (@dummy, zipCode, placeName, @dummy, state, placeDesc, @dummy, @dummy, @dummy, @lat, @lon, @dummy)
SET id = NULL, latLong = ST_PointFromText(CONCAT('POINT(', @lat, ' ', @lon, ')'), 4326);

Unfortunately, the POINT() function in MySQL always returns an SRID of 0. Since we specified our column to be a specific SRID, a direct import will fail.

mysql> LOAD ... SET latLong = POINT(@lat, @lon);
ERROR 3643 (HY000): The SRID of the geometry does not match the SRID of the column 'latLong'.
The SRID of the geometry is 0, but the SRID of the column is 4326. Consider changing the SRID
of the geometry or the SRID property of the column.

Instead, we must “go the long route” by creating a string representation of a POINT object in the Well-Known-Text format, which MySQL can parse into a Point column type with an associated SRID.

mysql> LOAD ... SET latLong = ST_PointFromText(CONCAT('POINT(', @lat, ' ', @lon, ')'), 4326);

Yes, this was very confusing to me as well:

“POINT(123 56)” <— WKT Format (a string)
POINT(123, 56) <— MySQL Column Type (function that returns data)

Here’s a quick verification of our data with an additional column showing the lat-long binary data being converted back into WKT format.

mysql> SELECT *, ST_AsText(latLong) FROM usazips WHERE zipCode = 76455;
+-------+---------+-------+-----------+-----------+------------------------------------------------------+-------------------------+
| id    | zipCode | state | placeName | placeDesc | latLong                                              | ST_AsText(latLong)      |
+-------+---------+-------+-----------+-----------+------------------------------------------------------+-------------------------+
| 34292 |   76455 | TX    | Gustine   | Comanche  | 0xE6100000010100000068B3EA73B59958C0B84082E2C7D83F40 | POINT(31.8468 -98.4017) |
+-------+---------+-------+-----------+-----------+------------------------------------------------------+-------------------------+
1 row in set (0.04 sec)

More data to load

Now that we have all this zip code + latitude and longitude data, we next need the locations of where we want to find our distances. This could be a list of grocery stores or coffee shops. In this example, we will use a list of public parks in San Antonio, TX. Thankfully, San Antonio has all of this data openly available. I downloaded the ‘Park Boundaries’ dataset in GeoJSON format since the CSV did not contain any latitude/longitude coordinates.

CREATE TABLE parks (
 parkId int unsigned NOT NULL PRIMARY KEY,
 parkName varchar(100) NOT NULL,
 parkLocation point NOT NULL /*!80003 SRID 4326 */);

The GeoJSON data is one giant JSON string with all of the needed data nested in various JSON objects and arrays. Instead of writing some Python script to ETL the data, I decided to try out the JSON import feature of the MySQL Shell and first import the JSON directly into a temporary table.

$ mysqlsh appUser@127.0.0.1/world --import Park_Boundaries.geojson tempParks
Importing from file "Park_Boundaries.geojson" to collection `world`.`tempParks` in MySQL Server at 127.0.0.1:33060
..1..1
Processed 3.29 MB in 1 document in 0.3478 sec (1.00 document/s)
Total successfully imported documents 1 (1.00 document/s)

The MySQL Shell utility created a new table called ‘tempParks’, in the ‘world’ database with the following schema. One row was inserted, which was the entire dataset as one JSON object.

CREATE TABLE `tempParks` (
 `doc` json DEFAULT NULL,
 `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
 `_json_schema` json GENERATED ALWAYS AS (_utf8mb4'{"type":"object"}') VIRTUAL,
 PRIMARY KEY (`_id`),
 CONSTRAINT `$val_strict_75A2A431C77036365C11677C92B55F4B307FB335` CHECK (json_schema_valid(`_json_schema`,`doc`)) /*!80016 NOT ENFORCED */
);

From here, I extracted the needed information from within the JSON and inserted it into the final table, shown above. This SQL uses the new JSON_TABLE function in MySQL 8 to create a result set from nested JSON data (strangely, there were a couple of duplicate parks in the dataset, which is why you see the ON DUPLICATE modifier). Note: that this dataset does longitude at index 0, and latitude at index 1.

INSERT INTO parks
  SELECT k.parkID, k.parkName, ST_PointFromText(CONCAT('POINT(', parkLat, ' ', parkLon, ')'), 4326)
  FROM tempParks, JSON_TABLE(
    tempParks.doc,
    "$.features[*]" COLUMNS (
      parkID int PATH "$.properties.ParkID",
      parkName varchar(150) PATH "$.properties.ParkName",
      parkLat varchar(20) PATH "$.geometry.coordinates[0][0][0][1]",
      parkLon varchar(20) PATH "$.geometry.coordinates[0][0][0][0]"
    )
  ) AS k ON DUPLICATE KEY UPDATE parkId = k.parkID;

Query OK, 391 rows affected (0.11 sec)
Records: 392  Duplicates: 0  Warnings: 0

What’s the closest park?

Now that all the information is finally loaded, let’s find the nearest park to where you might live, based on your zip code.

First, get the reference location (as MySQL POINT data) for the zip code in question:

mysql> SELECT latLong INTO @zipLocation FROM usazips WHERE zipCode = 78218;

Then, calculate the distance between our reference zip code and all of the park locations. Show the five closest (not including any school parks):

mysql> SELECT p.parkName, ST_AsText(p.parkLocation) AS location, ST_Distance_Sphere(@zipLocation, p.parkLocation) AS metersAway
FROM parks p WHERE parkName NOT LIKE '%School%' ORDER BY metersAway LIMIT 5;
+-----------------------+----------------------------------------------+--------------------+
| parkName              | location                                     | metersAway         |
+-----------------------+----------------------------------------------+--------------------+
| Robert L B Tobin Park | POINT(29.501796110000043 -98.42111988)       |   1817.72881969296 |
| Wilshire Terrace Park | POINT(29.486397887000063 -98.41776879999996) | 1830.8541086553364 |
| James Park            | POINT(29.48168424800008 -98.41725954999998)  |  2171.268012559491 |
| Perrin Homestead Park | POINT(29.524477369000067 -98.41258342199997) |  3198.082796589456 |
| Tobin Library Park    | POINT(29.510899863000077 -98.43343147299998) |  3314.044809806559 |
+-----------------------+----------------------------------------------+--------------------+
5 rows in set (0.01 sec)

The closest park is 1,817.7 meters away! Time for a picnic!

Conclusion

Geo locations in MySQL are not difficult. The hardest part was just finding the data and ETL’ing into some tables. MySQL’s native column types and support for the OpenGIS standards make working with the data quite easy.

Several things to note:

1) The reference location for your zip code may be several hundred or a thousand meters away. Because of this, that park that is right behind your house might not show up as the actual closest because another park is closer to the reference.

2) No SPATIAL indexes were added to the tables as they would be of no use because we must execute the distance function against all parks (ie: all rows) to find the closest park, hence the ‘metersAway’ query will always perform a full table scan.

3) All USA postal codes are integers. You can expand this to your country by simply changing the column type.

Sep
26
2022
--

Talking Drupal #366 – Schema.org Blueprints Module

Today we are talking about The Schema.org Blueprints Module with Jacob Rockowitz.

www.talkingDrupal.com/366

Topics

  • What is Schema.org
  • What is the Schema.org first approach
  • Is this just for SEO
  • What is the Schema.org Blueprints Module
  • What are the goals
  • How does this module benefit your project
  • Can you give us a demo
  • Why Paragraphs
  • How do you handle schema you do not need
  • How do you handle missing schema
  • Have any sites used this
  • What is your maintenance approach
  • Roadmap
  • Recommendations
  • Contributing to Schema.org

Resources

Guests

Jacob Rockowitz – www.jrockowitz.com @jrockowitz

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi Tim Lehnen – @hestenet

MOTW

Flex Field Defines a new “FlexField” field type that lets you create simple inline multiple-value fields without having to use entity references.

Sep
26
2022
--

MySQL in Microservices Environments

MySQL in Microservices Environments

MySQL in Microservices EnvironmentsThe microservice architecture is not a new pattern but has become very popular lately for mainly two reasons: cloud computing and containers. That combo helped increase adoption by tackling the two main concerns on every infrastructure: Cost reduction and infrastructure management.

However, all that beauty hides a dark truth:

The hardest part of microservices is the data layer

And that is especially true when it comes to classic relational databases like MySQL. Let’s figure out why that is.

MySQL and the microservice

Following the same two pillars of microservices (cloud computing and containers), what can one do with that in the MySQL space? What do cloud computing and containers bring to the table?

Cloud computing

The magic of the cloud is that it allows you to be cost savvy by letting you easily SCALE UP/SCALE DOWN the size of your instances. No more wasted money on big servers that are underutilized most of the time. What’s the catch? It’s gotta be fast. Quick scale up to be ready to deal with traffic and quick scale down to cut costs when traffic is low. 

Containers

The magic of containers is that one can slice hardware to the resource requirements. The catch here is that containers were traditionally used on stateless applications. Disposable containers.

Relational databases in general and MySQL, in particular, are not fast to scale and are stateful. However, it can be adapted to the cloud and be used for the data layer on microservices.

The Scale Cube

The Scale Cube

The book “The Art of Scalability” by Abott and Fisher describes a really useful, three dimension scalability model: the Scale Cube. In this model, scaling an application by running clones behind a load balancer is known as X-axis scaling. The other two kinds of scaling are Y-axis scaling and Z-axis scaling. The microservice architecture is an application of Y-axis scaling: It defines an architecture that structures the application as a set of loosely coupled, collaborating services.

  • X-Axis: Horizontal Duplication and Cloning of services and data (READ REPLICAS)
  • Y-Axis: Functional Decomposition and Segmentation – (MULTI-TENANT)
  • Z-Axis: Service and Data Partitioning along Customer Boundaries – (SHARDING)

On microservices, each service has its own database in order to be decoupled from other services. In other words: a service’s transactions only involve its database; data is private and accessible only via the microservice API.

It’s natural that the first approach to divide the data is by using the multi-tenant pattern:

Actually before trying multi-tenant, one can use a tables-per-service model where each service owns a set of tables that must only be accessed by that service, but by having that “soft” division, the temptation to skip the API and access directly other services’ tables is huge.

Schema-per-service, where each service has a database schema that is private to that service is appealing since it makes ownership clearer. It is easy to create a user per database, with specific grants to limit database access.

This pattern of “shared database” however comes with some drawbacks like:

  • Single hardware: a failure in your database will hurt all the microservices
  • Resource-intensive tasks related to a particular database will impact the other databases (think on DDLs)
  • Shared resources: disk latency, IOPS, and bandwidth needs to be shared, as well as other resources like CPU, Network bandwidth, etc.

The alternative is to go “Database per service”

Database per service

Share nothing. Cleaner logical separation. Isolated issues. Services are loosely coupled. In fact, this opens the door for microservices to use a database that best suits their needs, like a graph db, a document-oriented database, etc. But as with everything, this also comes with drawbacks:

  • The most obvious: cost. More instances to deploy
  • The most critical: Distributed transactions. As we mentioned before, microservices are collaborative between them and that means that transactions span several services. 

The simplistic approach is to use a two-phase commit implementation. But that solution is just an open invitation to a huge amount of locking issues. It just doesn’t scale. So what are the alternatives?

  • Implementing transactions that span services: The Saga pattern
  • Implementing queries that span services: API composition or Command Query Responsibility Segregation (CQRS)

A saga is a sequence of local transactions. Each local transaction updates the database and publishes messages or events that trigger the next local transaction in the saga. If a local transaction fails for whatever reason, then the saga executes a series of compensating transactions that undo the changes made by the previous transactions. More on Saga here: https://microservices.io/patterns/data/saga.html

An API composition is just a composer that invokes queries on each microservice and then performs an in-memory join of the results:
https://microservices.io/patterns/data/api-composition.html

CQRS is keeping one or more materialized views that contain data from multiple services. This avoids the need to do joins on the query size: https://microservices.io/patterns/data/cqrs.html

What do all these alternatives have in common? That is taken care of at the API level: it becomes the responsibility of the developer to implement and maintain it. The data layer keep continues to be data, not information.

Make it cloud

There are means for your MySQL to be cloud-native: Easy to scale up and down fast; running on containers, a lot of containers; orchestrated with Kubernetes; with all the pros of Kubernetes (health checks, I’m looking at you).

Percona Operator for MySQL based on Percona XtraDB Cluster

A Kubernetes Operator is a special type of controller introduced to simplify complex deployments. Operators provide full application lifecycle automation and make use of the Kubernetes primitives above to build and manage your application. 

Percona Operator for MySQL

In our blog post “Introduction to Percona Operator for MySQL Based on Percona XtraDB Cluster” an overview of the operator and its benefits are covered. However, it’s worth mentioning what does it make it cloud native:

  • It takes advantage of cloud computing, scaling up and down
  • Runs con containers
  • Is orchestrated by the cloud orchestrator itself: Kubernetes

Under the hood is a Percona XtraDB Cluster running on PODs. Easy to scale out (increase the number of nodes: https://www.percona.com/doc/kubernetes-operator-for-pxc/scaling.html) and can be scaled up by giving more resources to the POD definition (without downtime)

Give it a try https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html and unleash the power of the cloud on MySQL.

Sep
23
2022
--

Keep Your Data Safe with Percona

Keep Your Data Safe with Percona

Keep Your Data Safe with PerconaSeptember was and is an extremely fruitful month (especially for the black-hat hackers) for news about data leaks and breaches:

  1. Uber suffers computer system breach, alerts authorities
  2. GTA 6 source code and videos leaked after Rockstar Games hack
  3. Revolut breach: personal and banking data exposed

In this blog post, we want to remind you how to keep your data safe when running your favorite open source databases.

Network exposure

Search engines like Shodan are an easy way to search for publicly available databases. Over 3.6 million MySQL servers found exposed on the Internet.

The best practice here is to run database servers in the isolated private network, even from the rest of your corporate network. In this case, you have a low risk of exposure even in the case of server misconfiguration.

If for some reason you run your database on the server in a public network, you still can avoid network exposure:

  • Bind your server to the localhost or private IP address of the server

For example, for MySQL use bind-address option in your my.cnf:

bind-address = 192.168.0.123

  • Configure your firewall to block access through a public network interface on the operating system

Users and passwords

To complement the network exposure story, ensure that your users cannot connect from just any IP address. Taking MySQL as an example, the following GRANT command allows to connect from one of the private networks only:

GRANT ALL ON db1.* TO 'perconaAdmin'@'192.168.0.0/255.255.0.0';

MySQL also has an auth_socket plugin, that controls the connection to the database through Unix sockets. Read more in this blog post: Use MySQL Without a Password (and Still be Secure).

Minimize the risk and do not use default usernames and passwords. SecList is a good example of bad choices for passwords: MySQL, PostgreSQL, and a misc list. Percona Platform provides users with Advisors (read more below) that preemptively check for misconfigured grants, weak passwords, and more.

So now we agree that a strong password is a must. Did you know that you can enforce it? This Percona post talks about Improving MySQL Password Security with Validation Plugin that performs such enforcement.

A strong password is set, great! To make your system even more resilient to security risks, it is recommended to have a password rotation policy. This policy can be manually executed, but also can be automated through various integrations, like LDAP, KMIP, HashiCorp Vault, and many more. For example, this document describes how Percona Server for MongoDB can work with LDAP.

Encryption

There are two types of encryption when you talk about databases and ideally, you’re going to use both of them:

  1. Transport encryption – secure the traffic between client and server and between cluster nodes
  2. Data-at-rest encryption (or Transparent Data Encryption – TDE) – encrypt the data on a disk to prevent unauthorized access

Transport

With an unencrypted connection between the client and the server, someone with access to the network could watch all your traffic and steal the credentials and sensitive data. We recommend enabling network encryption by default. Read the following blog posts highlighting the details:

Data-at-rest

Someone can get access to the physical disk or a network block storage and read the data. To mitigate this risk, you can encrypt the data on the disk. It can be done on the file system, block storage level, and with the database storage engine itself.

Tools like fscrypt or in-built encryption in ZFS can help with file system encryption. Public clouds provide built-in encryption for their network storage solutions (ex AWS EBS, GCP). Private storage solutions, like Ceph, also come with the support of data-at-rest encryption on the block level.

Percona takes security seriously, which is why we recommend enabling data-at-rest encryption by default, especially for production workloads. Percona Server for MySQL and Percona Server for MongoDB provides you with a wide variety of options to perform TDE on the database level.

Preventive measures

Mistakes and misconfiguration can happen and it would be cool if there was a mechanism to alert you about issues before it is too late. Guess what – we have it! 

Percona Monitoring and Management (PMM) comes with Advisors which are the checks that identify potential security threats, vulnerabilities, data loss or data corruption, and other issues. Advisors are the software representation of the years of Percona’s expertise in database security and performance.

By connecting PMM to Percona Platform, users can get more sophisticated Advisors for free, whereas our paid customers are getting even deeper database checks, which discover various misconfiguration or non-compliance gems.

Learn more about Percona Platform with PMM on our website and check if your databases are secured and fine-tuned right away.

If you still believe you need more help, please let us know through our Community Forums or contact the Percona team directly.

Sep
23
2022
--

Scaling MySQL – A Good Problem to Have

scaling MySQL

scaling MySQLWhen you develop an application you expect success, and often success comes with growth problems.  These problems especially show themselves in the area of data storage, where being stateful is not as easy to scale as the stateless parts of the application.

There are several stages of approaching database scalability:

  1. Configuration and query optimization. This step can help a lot, and I would recommend a recent book by Daniel Nichter “Efficient MySQL Performance: Best Practices and Techniques”, which goes into this topic.
  2. If #1 is done and you continue to push the limits of your database, the next step is to improve the hardware: adding extra memory, improving storage throughput (regular SSD, NVMe storage layer, etc.), or increasing the size of the cloud instances (this is what I call “optimization by credit card”). This typically should help, but only to a certain limit. And there is only so much memory or storage you can push into a server before it becomes very expensive very quickly, never mind the physical limits of how much memory you can fit into a single server.
  3. Step 3 is distributing the workload on multiple servers when the limit of the single server is met. When the workload is read-intensive, it can be typically solved by a regular MySQL source->replica replication, however, if you need to scale writes, it becomes quite complicated.

I would like to provide an excerpt from the Square Cash Engineering blog, which describes their experience (Sharding Cash | Square Corner Blog (squareup.com)):

“Cash was growing tremendously and struggled to stay up during peak traffic. We were running through the classic playbook for scaling out: caching, moving out historical data, replica reads, buying expensive hardware. But it wasn’t enough. Each one of these things bought us time, but we were also growing really fast and we needed a solution that would let us scale out infinitely. We had one final item left in the playbook.”

Before jumping to the solution that Square Cash is using, I should mention a traditional solution to sharding (this is how we name distributing MySQL workload into multiple smaller pieces – servers) –  sharding on the application level – basically, an application decides what server to use to execute a query. However, there is a constantly increasing need to have sharding logic separated from the application, so developers are focusing on creating business outcomes rather than solving database workload distribution problems again and again. Therefore there is a need to move the “distribute workload” logic from an application level to a middleware or even to a database level.

There is already an available middleware that helps with these problems (and this is what Square Cash is using): Vitess (Vitess | A database clustering system for horizontal scaling of MySQL)

Vitess is an open source software that works like a proxy and helps to distribute MySQL workload across multiple servers (for more details about Vitess architecture I will refer to the Sugu Sougoumarane presentation: 2019-sugu-highload.pdf (vitess.io) and to our introduction: Introduction to Vitess on Kubernetes for MySQL – Part I of III – Percona Database Performance Blog).

Originally, Vitess was developed to scale YouTube, although I’ve heard recent rumors that Google migrates YouTube to a different backend. Later, it was adopted not only by Square Cash but also by companies like Slack, Pinterest, GitHub, and HubSpot. They all are looking to solve MySQL scalability problems.

I wrote this blog post to collect your opinions:

  • Did you try and have an experience with scaling-out solutions, and can you share it?
  • Would you be interested in hearing more about scale-out from Percona, like basic usage, setup, and general guidance?
  • Would you be interested in having support for products for scaling-out from Percona?

Please share your thoughts with us in this survey.

Sep
21
2022
--

Contribute to Open Source With Percona and Hacktoberfest

Contribute to Open Source With Percona and Hacktoberfest

Contribute to Open Source With Percona and HacktoberfestOctober is close, and it is time for Hacktoberfest! If you love open source as much as we do, this event is created for you. It is a unique opportunity to be a part of a great open source community. And you get not only a memorable experience but also some fantastic prizes! You can even have a tree planted in your name, isn’t that amazing?

The aim of Hacktoberfest is to provide support to open source projects and teams maintaining them. Your contribution can be a helping hand to them. And it is a unique opportunity for you to sharpen your development skills and give back to the project you love.

This year Percona participates in this fabulous event with its open source software. You can contribute to 12 repositories with Percona Monitoring and Management (PMM), Percona Kubernetes Operators, pg_stat_monitor, and other projects! In addition to official Hacktoberfest prizes, you can receive a piece of Percona swag: a T-shirt or a hat.

Follow these simple steps to start:

  1. Register on the Hacktoberfest website anytime between September 26 and October 31, 2022.
  2. Look through the participating projects and choose your favorites.
  3. Contribute to their code with PR/MRs during the period from October 1 to October 31. You should have at least four pull/merge requests accepted that meet the Hacktoberfest requirements.
  4. Get your prizes!

Follow the link to find all Percona GitHub repositories participating in the event. You can easily find them by searching the tag “hacktoberfest”.

We also encourage you to look through some issues that may not be that hard to complete. They may give you an idea of where to start.

You can find them:

Experienced contributors check all Jira issues to find a challenging task.

Did you know that not only code contributions are welcome? Contributing to open source projects is not only for those who code. This time Hacktoberfest is making a point to boost low-code or non-code contributions. Designers, testers, content creators, translators… Even if you have little technical experience or none at all, you can have your seat at the table and use this opportunity to bump up the open source with your professional skills.

If you made a non-code contribution (for example, wrote an article or recorded a demo), create a Pull Request to our Percona Community repository with links to your work to participate in Hackroberfest. Add your contribution to our list of blog posts and videos, and follow the process of submitting changes to documentation. Other types of contributions (custom dashboards, configuration files, etc.) can be added to our developer artifacts collection.

Also, Hacktoberfest prepared some useful resources for those taking the first steps in the open source world.

The Percona team will review your PRs constantly and give you feedback. We would love to advise you and provide you with all the support that you might need. Ask any questions in the Discord chat channel and may the open source power be with you!

Sep
21
2022
--

MongoDB 6.0: Should You Upgrade Now?

MongoDB 6.0 Should You Upgrade Now

MongoDB 6.0 Should You Upgrade NowMongoDB is a cross-platform, document-oriented NoSQL database. It has been developed as an answer to the growing need for easy-to-use yet very performant, scalable, and content-agnostic storage. MongoDB has been widely adopted by engineers across application development anywhere from banking to social media. Unfortunately, after MongoDB Inc.’s IPO in 2017, they chose an aggressive path of monetization, changing the license to SSPL (which is a license model that’s bad for you) and promoting Atlas (MongoDB’s Database as a Service (DBaaS) solution) even over the costly MongoDB Enterprise. The company put the community’s user needs way behind catering to high-end enterprise customers, leaving the MongoDB community stranded.

While limited by the SSPL license (not recognized by the Open Standards Initiative (OSI) as open source), Percona, known for its deep commitment to open source software, chose to come with support to the stranded MongoDB community by:

  • Providing Percona Server for MongoDB (PSMDB) – a source-available MongoDB drop-in replacement database based on MongoDB Community Edition (CE) yet adding enterprise features developed on top of that by Percona.
  • Delivering a freely available open source product for MongoDB backup and recovery: Percona Backup for MongoDB (PBM) works with PSMDB as well as (to some extent) with MongoDB Inc.’s MongoDB Community and Enterprise editions.
  • Packaging PSMDB and PBM in Percona Distribution for MongoDB: an easy-to-deploy complete solution for MongoDB.
  • Providing Percona Monitoring and Management (PMM), which can be used as an open source, multi-database alternative to MongoDB Ops Manager.
  • Developing a MongoDB Prometheus exporter, free to use for anyone (from Grafana, through Ansible to Dynatrace) needing insights into how their MongoDB instance is doing that’s widely used in the industry both by the open source communities and enterprise APM tools.
  • Delivering a Percona Operator for MongoDB, a complete solution for containerized environments of your Percona Server for MongoDB, containing the necessary Kubernetes settings to maintain consistent PSMDB instance (if you’re unsure whether running MongoDB in Kubernetes is for you, check out pros and cons of solutions for that).

I think it’s fair to say that the list is impressive. Individuals, organizations, and even enterprises benefit from the fact that the software Percona provides is free and open.

A bittersweet edition

Seeing all the critical bugs that MongoDB 5.0 has introduced, it feels as if its release has been rushed, allowing for half-baked features to go GA. Looking at numerous critical problems that could result in data corruption, you could still argue that it’s a natural state of things in IT development, to quote Albert Einstein:

a person who never made a mistake never tried anything new.

True point, but following the story by The Register, is not an argument I’d use here. The “accelerated release cadence” introduced by MongoDB Inc. assumes that major improvements are released in “rapid releases” available only for Atlas (DBaaS) customers. Neither MongoDB Community nor even MongoDB Enterprise customers will get a taste of those improvements in 5.0.x, even though they have a chance to taste all the instabilities, limitations, and bugs available with 5.0.

Of course, MongoDB Inc will argue that the Rapid Releases are for the bleeding edge adopters, that they include new features, whereas all the issues are fixed in the bugfix, patchset releases. From my experience, not only bug fixes solve user issues. Think of the release cycle and the situations where, due to the deadlines, some features are released in the major version in a limited scope. It does sound all too familiar, right? Now that’s not that bad, as (with semantic versioning) minor versions will fill in the missing capabilities, lift the limitations, and make the often go-to-market spotlight features of the major version complete. Not in this case, at least not if you are the “second class citizen user” of the Community or Enterprise edition. Rapid Releases are what semantic versioning calls minor releases. Meaning you have to live with the limitations and lacking features till the next major release, for now having to satisfy with the bug-fix holding patches only.

Consider that MongoDB 5.0 introduced very appealing capabilities like time-series collections or resharding that allows for an automatic changing of a shard-key for a collection. Choosing a good shard-key during the application design when the initial sharding takes place is often challenging. Having a poorly designed sharding in your database means everything to MongoDB’s performance. For now, to change it, a manual and cumbersome process has been needed. Even taking into account the downfalls of the introduced resharding like performance and storage overhead during the process, it is still a very tempting feature that for many situations could be a game changer. Unfortunately, with a lack of trust in MongoDB 5.0 and the new release cadence not having the community’s back, the community often simply cannot benefit from it.

Percona has waited a long time for the release of 5.0 to feel stable enough to release. It was not until MongoDB 5.0.6 CE was released almost half a year after 5.0.0 that Percona decided it was safe enough for our users and customers. This sort of third-party overwatch is an invaluable asset that open source brings. With companies like Percona standing behind a software release, you get the added benefit of extra verification of your software “for free”.

End-of-life strategy

End-of-life strategy MongoDBLooking at the previous chapter, the adoption of releases of 5.0 being not as impressive as one could expect is not that surprising. As this blog post is being written, the telemetry data gathered by Percona shows:

  • MongoDB 4.4 = 47%
  • MongoDB 4.2 = 17%
  • MongoDB 5.0 = 15%
  • MongoDB 4.0 = 13%

With the end-of-life calendar MongoDB 4.x looking as follows:

  • 4.0 EOL April 2022
  • 4.2 EOL April 2023
  • 4.4 EOL April 2024

Add in the apparent lack of trust in 5.0, we see a growing trend for the adoption of MongoDB 4.4 that gives some “breathing space” until the EOL.

That’s a fair strategy that makes sense, but limits the value you are getting. What if there was another way that could allow you to get some more benefits?

Here comes Percona Server for MongoDB 6.0

With the introduction of MongoDB 6.0, users got long-awaited improvements and usability fixes to MongoDB 5.0 that only the Atlas customers could taste before. After the EOL of major versions that required users to upgrade, 6.0 could become their landing zone. This way users can benefit from more advanced features of the new version as well as a later EOL.

Percona Server for MongoDB 6.0A quick look at the features that made it to MongoDB 6.0 shows a range of interesting ones, like:

  • Cluster to cluster sync
  • Queryable encryption
  • Time-series collections improvements
  • Analytics improvements
  • Change streams improvements
  • New aggregation operators
  • Improved search

Obviously, not all of these will make it to MongoDB Community Edition, since some are reserved for MongoDB Enterprise or even Atlas only.

Even without the features not available in the Community Edition, the 6.0 release providing fixes over the unstable 5.0 is a large improvement that’s worth considering in your long-time update strategy.

While updating your Community Edition, it’s worth considering migrating from MongoDB CE to Percona Server for MongoDB. This way you have all the benefits of MongoDB CE 6.0 plus the advantages that Percona brings to the release cycle for the community. With the upcoming release of Percona Server for MongoDB 6.0, as well as the freshly released Percona Backup for MongoDB 2.0 and the support of PBM in Percona Monitoring and Management, the solution becomes complete. With features like an in-memory engine, extensive data at rest encryption, hot backups, LDAP, and Kerberos integration on top of what MongoDB Community Edition already provides, PDMDB provides a complete solution that Percona is committed to keeping open. Be on the lookout for the announcement of PSMDB 6.0 very soon!

What now?

We see over the years that companies change their licenses, becoming less open source while claiming to be more open in an obvious marketing play. At its core, Percona chooses to stay true to the open source philosophy.

Over the years, Percona experts have meticulously delivered increments of Percona Server for MongoDB based on the same upstream codebase as the MongoDB Community Edition. As a drop-in replacement for MongoDB CE, it’s the enterprise features like these that PSMDB adds on top of it that make it so interesting:

  • in-memory storage engine,
  • KMIP support,
  • HashiCorp Vault integration,
  • data-at-rest encryption,
  • audit logging,
  • external LDAP authentication with SASL,
  • hot backups.

These enterprise-grade feature enhancements were added to Percona Server for MongoDB. This way the open source community could benefit from features previously reserved only for MongoDB Enterprise customers. With PSMDB 6.0, things are not going to change. Percona is on a mission to provide open database solutions to everyone and everywhere. With this in mind, we are open to your suggestions as to which features are the most important to you, our users. Reach out and let us know!

Learn more about Percona Server for MongoDB

Sep
20
2022
--

Percona Server for MySQL – Automatic Cloud Deployment with Terraform

Automatic Cloud Deployment with Terraform MySQL

Automatic Cloud Deployment with Terraform MySQLWe are looking to provide simplified ways to deploy Percona software in cloud environments, especially for more advanced scenarios like replication and multi-node cluster (in the case of Percona XtraDB Cluster).

For this I propose trying out our new Terraform provider, with the capabilities:

  • Deploy in AWS or GCP Cloud. The provider will automatically create instances and deploy Percona Server for MySQL
  • Deploy either Percona Server for MySQL or Percona XtraDB Cluster
  • Choose instance size
  • Choose storage volume size, type, and IOPs
  • Multi-node async replication or multi-node Percona XtraDB Cluster
  • Customize MySQL configuration file
  • Capability to deploy MyRocks Engine

To get more understanding let’s review some examples, but before that, where you can obtain modules:

The provider is available from Terraform registry:

https://registry.terraform.io/providers/Percona-Lab/percona/0.9.0

And source code at GitHub:

Percona-Lab/terraform-provider-percona: Terraform modules to deploy Percona Server and Percona XtraDB Cluster (github.com)

Keep in mind that this is an EXPERIMENTAL software yet and is not covered by Percona Support.

Examples:

As a single server is quite trivial, let’s jump to more complex scenarios: Asynchronous replication between two nodes.

Here is our main.tf file:

# AWS provider configuration
provider "percona" {
  region  = "us-east-2"
  profile = "default"
  cloud   = "aws"
}
 
 
resource "percona_ps" "psrepl2" {
  instance_type            = "t3.micro" # for AWS
  key_pair_name            = "sshKey2"
  password                 = "password"
  replica_password         = "replicaPassword"
  cluster_size             = 2
  volume_size              = 30
}

And we apply with:

terraform apply
…
percona_ps.psrepl2: Still creating... [4m20s elapsed]
percona_ps.psrepl2: Still creating... [4m30s elapsed]
percona_ps.psrepl2: Still creating... [4m40s elapsed]
percona_ps.psrepl2: Creation complete after 4m47s [id=JZnEGllTfJIOqjgMyyRl]

So five minutes later we have the following:

  • We created and initialized two t3.micro AWS instances in us-east-2 region
  • Each instance uses 30GB volume sizes
  • We can log in to instances with an ssh key (which will be created if it didn’t exist before)
  • Each instance will have installed Percona Server for MySQL with the latest 8.0 version
  • Instances will be connected by SOURCE->Replica MySQL replication

Similarly, for deploying a 3 node Percona XtraDB Cluster, we use:

# AWS provider configuration
provider "percona" {
  region  = "us-east-2"
  profile = "default"
  cloud   = "aws"
}
 
 
resource "percona_pxc" "pxc3" {
  instance_type            = "t3.micro" # for AWS
  key_pair_name            = "sshKey2"
  password                 = "password"
  cluster_size             = 3
  volume_size              = 30
}

Now we also can show an example for Google Cloud:

provider "percona" {
  region  = "us-central1"
  zone    = "us-central1-a"
  project = "cto-lab-284219"
  cloud =   "gcp"
}
 
 
resource "percona_ps" "ps3" {
  instance_type            = "e2-highmem-2" 
  key_pair_name            = "sshKey2"
  password                 = "password"
  cluster_size             = 3
  volume_type              = "pd-ssd"
  volume_size              = 40
}

In this case, we will use more powerful instances with dedicated SSD volumes of 40GB each.

The script will deploy instances and install three Percona Server for MySQL servers connected in replication (one source and two replicas).

In conclusion:

We offer a Terraform provider that simplifies the deployment of Percona Server for MySQL and Percona XtraDB Cluster in cloud environments and offers various customizations for instances and replication configurations.

Try out our new Terraform provider

Sep
20
2022
--

Column-Level Encryption in MySQL

Column-Level Encryption in MySQL

Column-Level Encryption in MySQLIn a post written earlier this year – Percona Server for MySQL Encryption Options and Choices I discussed some of the options around encryption in MySQL.  Being such a complex topic, that post was meant to clarify and highlight various aspects of “encryption” at different levels.  I recently had this topic come up again, but specifically around column-level encryption and various options so I wanted to touch on this in more detail.

As of the current release of Percona Server for MySQL, there is no built-in way to define a single column as encrypted.  Ideally, there could be some metadata passed in a create statement and this would just automatically happen, such as this:

Unfortunately, this option isn’t available and we need to do some data manipulation at or prior to read/write time.

Built-in MySQL encryption functions

One of the most common approaches is to use the built-in MySQL encryption functions described here: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html.  The standard flows look roughly like this:

This works perfectly fine and the data will be stored encrypted for that column.  If an unauthorized person were to gain access to the running table, they would be unable to read that column without the key.  The biggest concern with this approach is that the plaintext key and plaintext data are BOTH specified in the query.  This leads to potential leaks in log files (slow query, binary log, etc) as well as potential sniffing over non-secure networks (i.e. connections not using SSL).

Also, key storage can become more cumbersome.  If you plan to share the same key for the entire table, key rotation and management can become non-trivial.  General best practices recommend that you rotate keys at least once a year.  On large tables, this could be a massive undertaking.  Let’s look at the envelope encryption pattern as an alternative.

Envelope encryption

In contrast to using built-in encryption, envelope encryption uses the concept of individual data keys.  While there are many ways to approach this, my personal experience uses the AWS KMS service.  In KMS, you can create what is called a “Customer Master Key” (CMK).  This is great for encrypting small strings like passwords but is limited to encrypting strings up to 4KB.

A more flexible approach is to use individual data keys, encrypt the data locally in the application, and then store the encrypted data along with the encrypted data key.  This can be done at various levels of granularity – from the per-row level to the table level, to the database level.   Here is the general process for envelope encryption:

When you need to decrypt the data, you first decrypt the key and then use that to decrypt the data:

Because both the data key and the data are encrypted, both are safe to store together.  This is one of the main benefits of this method.  You can have hundreds to millions of data keys independent of one another but protected by a single master key.  This allows you to deactivate all of the individual keys by disabling one single master key.  

It also simplifies key rotation – you can simply rotate the master key and start using the new key to generate new data keys. Note that this doesn’t re-encrypt the data, but it does allow you to follow the best practice of periodically rotating keys.  As long as you don’t delete any of the old keys, KMS can determine which key to use for decryption from the key itself and automatically decrypt your data (i.e. you don’t have to specify which key encrypted the data).  

I’ve included a link to a sample script in Python that shows this process in more detail.  While there isn’t actually an active database in the demo, the code prints out the INSERT and SELECT statements that could be run against a live server: https://github.com/mbenshoof/kms-envelope-demo

Challenges with column-level encryption

Searching

It should go without saying that introducing column-level encryption isn’t a trivial task.  One of the biggest challenges is reviewing how encrypted data is retrieved.  For example, if I store social security numbers encrypted individually, then how do I search for the user with an SSN of 123-45-6789?  If I use a shared key for the entire table/schema, then it is possible with proper indexing.  I just have to pass the encrypted value to the where clause and if it exists, it should be found.

However, on a per-row model where each row uses a unique key, this is no longer possible.  As I don’t know the key the value was encrypted with, I can’t search for the encrypted value in the table.  In cases like this, you might consider a one-way hash field that could be searched against.  For example, you could store the SHA256 hash of an SSN as an additional column for searching but then decrypt any other sensitive information.

CPU and space overhead

The other challenge is adding additional write/read overhead to handle encryption and decryption.  While this may or may not be an issue depending on the use case, the extra CPU needed either on the application side or MySQL side could come into play.  You will need to consider the extra processing required and factor that in when planning the size of your resources.

Additionally, depending on the encryption libraries used, there can be additional space needed to store encrypted values.  In some cases, the encrypted value (when stored in base64 specifically) may end up requiring a higher storage footprint.  The space could be compounded if using an index on an additional hash column.  For small values (like SSN), the hash may be much larger than the actual data.  This can result in a much higher storage footprint when applied to millions of records. 

Wrapping up

Encryption and security are very important and complicated topics.  When considering column-level encryption in MySQL, you definitely have some options.  The easiest way would be to just leverage the built-in encryption functions in MySQL.  However, you can take things a step further and handle all encryption and decryption in your application.  

As is always the case with complicated topics like this, the choice and approach depend entirely on your requirements and use case.  Just know that with the flexibility of MySQL, there is most likely a design and approach that works for you!

Sep
19
2022
--

Talking Drupal #365 – Event Platform

Today we are talking about The Event Platform with Kaleem Clarkson & Martin Anderson-Clutz.

www.talkingDrupal.com/365

Topics

  • Where the idea originated
  • What are the main goals
  • What does it do out of the box
  • Is this a Conference Organizing Distribution (COD) Replacement
  • Why make a module and not a distribution
  • Does this follow the new recipe or starter kit concept
  • Which theme do you target
  • What is on the roadmap
  • Where can the event platform team use help
  • Anything else

Resources

Guests

Kaleem Clarkson – kaleemclarkson.com @kaleemclarkson Martin Anderson-Clutz – @mandclu

Hosts

Nic Laflin – www.nLighteneddevelopment.com @nicxvan John Picozzi – www.epam.com @johnpicozzi

MOTW

Smart Date This module attempts to provide a more user-friendly date field, by upgrading the functionality of core in a number of ways.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com