Mar
29
2022
--

Migrating to utf8mb4: Things to Consider

Migrating to utf8mb4

Migrating to utf8mb4The utf8mb4 character set is the new default as of MySQL 8.0, and this change neither affects existing data nor forces any upgrades.

Migration to utf8mb4 has many advantages including:

  • It can store more symbols, including emojis
  • It has new collations for Asian languages
  • It is faster than utf8mb3

Still, you may wonder how migration affects your existing data. This blog covers multiple aspects of it.

Storage Requirements

As the name suggests, the maximum number of bytes that one character can take with character set utf8mb4 is four bytes. This is larger than the requirements for utf8mb3 which takes three bytes and many other MySQL character sets.

Fortunately, utf8mb3 is a subset of utf8mb4, and migration of existing data does not increase the size of the data stored on disk: each character takes as many bytes as needed. For example, any digit or letter in the Latin alphabet will require one byte. Characters from other alphabets can take up to four bytes. This can be verified with a simple test.

mysql?> set names utf8mb4;
Query OK, 0 rows affected (0,00 sec)

mysql?> CREATE TABLE charset_len( name VARCHAR(255), val CHAR(1) ) CHARACTER SET=utf8mb4;
Query OK, 0 rows affected (0,03 sec)

mysql?> INSERT INTO charset_len VALUES('Latin A', 'A'),  ('Cyrillic ?', '?'), ('Korean ?', '?'), ('Dolphin ?', '?');
Query OK, 4 rows affected (0,02 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql?> SELECT name, val, HEX(val), BIT_LENGTH(val)/8 FROM charset_len;
+--------------+------+----------+-------------------+
| name         | val  | HEX(val) | BIT_LENGTH(val)/8 |
+--------------+------+----------+-------------------+
| Latin A      | A    | 41       |            1.0000 |
| Cyrillic ?   | ?    | D090     |            2.0000 |
| Korean ?    | ?    | E389BF   |            3.0000 |
| Dolphin ?   | ?    | F09F90AC |            4.0000 |
+--------------+------+----------+-------------------+
4 rows in set (0,00 sec)

As a result, all your data that uses a maximum of three bytes would not change and you will be able to store characters that require 4-bytes encoding.

Maximum Length of the Column

While the data storage does not change, when MySQL calculates the maximum amount of data that the column can store, it may fail for some column size definitions that work fine for utf8mb3. For example, you can have a table with this definition:

mysql?> CREATE TABLE len_test(
      -> foo VARCHAR(16384)
      -> ) ENGINE=InnoDB CHARACTER SET utf8mb3;
Query OK, 0 rows affected, 1 warning (0,06 sec)

If you decide to convert this table to use the utf8mb4 character set, the operation will fail:

mysql?> ALTER TABLE len_test CONVERT TO CHARACTER SET utf8mb4;
ERROR 1074 (42000): Column length too big for column 'foo' (max = 16383); use BLOB or TEXT instead

The reason for this is that the maximum number of bytes that MySQL can store in a VARCHAR column is 65,535, and that is 21845 characters for utf8mb3 character set and 16383 characters for the utf8mb4 character set.

Therefore, if you have columns that could contain more than 16383 characters, you will need to convert them to the TEXT or LONGTEXT data type.

You can find all such columns if you run the query:

SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME,
       CHARACTER_MAXIMUM_LENGTH, DATA_TYPE
FROM information_schema.columns
WHERE CHARACTER_MAXIMUM_LENGTH > 16383 AND
      DATA_TYPE NOT LIKE '%text%' AND 
      DATA_TYPE NOT LIKE '%blob%' AND
      TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema');

For example, in my test environment, it returns:

*************************** 1. row ***************************
TABLE_SCHEMA: test
TABLE_NAME: setup
COLUMN_NAME: value
CHARACTER_MAXIMUM_LENGTH: 20000
DATA_TYPE: varchar
1 row in set (0,02 sec

 

Index Storage Requirement

MySQL does not know in advance which characters you will store in the column when you are creating indexes. Therefore, when it calculates the storage required for the index, it takes the maximum value for the character set chosen. As a result, you may hit the index storage limit when converting from another character set to utf8mb4. For InnoDB, the maximum size of the index is 767 bytes for REDUNDANT and COMPACT row formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats. See The User Reference Manual for details.

That means you need to check if you have indexes that could grow to exceed these values before performing the update. You can do this with the following query:

WITH indexes AS (
     WITH tables AS  (
          SELECT SUBSTRING_INDEX(t.NAME, '/', 1) AS `database`, SUBSTRING_INDEX(t.NAME, '/', -1) AS `table`, i.NAME AS `index`, ROW_FORMAT
          FROM information_schema.INNODB_INDEXES i JOIN information_schema.INNODB_TABLES t USING(TABLE_ID)
    )
    SELECT `database`, `table`, `index`, ROW_FORMAT, GROUP_CONCAT(kcu.COLUMN_NAME) AS columns,
           SUM(c.CHARACTER_MAXIMUM_LENGTH) * 4 AS index_len_bytes
    FROM tables JOIN information_schema.KEY_COLUMN_USAGE kcu
         ON (`database` = TABLE_SCHEMA AND `table` = kcu.TABLE_NAME AND `index` = kcu.CONSTRAINT_NAME)
         JOIN information_schema.COLUMNS c ON (kcu.COLUMN_NAME = c.COLUMN_NAME AND `database` = c.TABLE_SCHEMA AND `table` = c.TABLE_NAME)
    WHERE c.CHARACTER_MAXIMUM_LENGTH IS NOT NULL
    GROUP BY `database`, `table`, `index`, ROW_FORMAT ORDER BY index_len_bytes
) SELECT * FROM indexes WHERE index_len_bytes >= 768;

Here is the result of running the query in my test environment:

+----------+--------------+---------+------------+------------+-----------------+
| database | table        | index   | ROW_FORMAT | columns    | index_len_bytes |
+----------+--------------+---------+------------+------------+-----------------+
| cookbook | hitcount     | PRIMARY | Dynamic    | path       |            1020 |
| cookbook | phrase       | PRIMARY | Dynamic    | phrase_val |            1020 |
| cookbook | ruby_session | PRIMARY | Dynamic    | session_id |            1020 |
+----------+--------------+---------+------------+------------+-----------------+
3 rows in set (0,04 sec)

Once you have identified such indexes, check the columns and adjust the table definition accordingly.

Note: The query uses CTE, available as of MySQL 8.0. If you are still on version 5.7 or earlier, you will need to rewrite the query.

Temporary Tables

One more issue you can hit after converting to the utf8mb4 character set is an increased size of the implicit temporary tables that MySQL creates to resolve queries. Since utf8mb4 may store more data than other character sets, the column size of such implicit tables will also be bigger. To figure out if you are affected by this issue, watch the global status variable Created_tmp_disk_tables. If this starts significantly increasing after the migration, you may consider updating RAM on your machine and increasing the maximum size of the temporary tables. Note that this issue could be a symptom that some of your queries are poorly optimized.

Conclusion

Converting to the utf8mb4 character set brings you the advantages of better performance, a larger range of characters that you can use, including emojis and new collations (sorting rules). This conversion comes at almost no price, and it can be done smoothly.

Ensure:

  • You converted all VARCHAR columns that could store more than 16383 characters to the TEXT or LONGTEXT data type
  • You adjusted index definitions that could take more than 767 bytes for the REDUNDANT and COMPACT row formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats after migration.
  • You optimized your queries so that they should not start using internal disk-based temporary tables
Feb
27
2019
--

Understanding How MySQL Collation and Charset Settings Impact Performance

MySQL 8.0 utf8mb4

This blog was originally published in February 2019 and was updated in September 2023.

Web applications rely on databases to run the internet, powering everything from e-commerce platforms to social media networks to streaming services. MySQL is one of the most popular database management systems, playing a pivotal role in the functionality and performance of web applications.

In today’s blog, I’ll take a look at MySQL collation and charset settings to shed light on how they impact the performance of web applications and how to use them to effectively communicate with your users.

Understanding Character Sets and Encoding in MySQL

Character sets and encoding in MySQL play a vital role in how data is stored and retrieved in a database. A character set is a collection of characters with unique representations for each character, such as letters, numbers, and symbols, that define how data is stored and how it is interpreted.

Character encoding refers to the method used to represent characters as binary data for storage and transmission. It specifies how characters are converted into binary code and vice-versa. 

The choice of character set and encoding impacts not only efficiency but also how the data appears to users.

How Character Sets Affect Data Storage and Retrieval

You can specify the character set for each column when you create a table, indicating the set of characters allowed in that column. This affects the type and range of characters that can be inserted into the column.

When data is inserted into the database, it is converted into the specified character set’s binary representation and stored accordingly. When retrieving data, MySQL converts the binary representation back into characters according to the character set and encoding rules. This conversion ensures that the data appears correctly to users and can be processed and displayed as intended.

An Example Illustrating Character Set Concepts

If you have a MySQL database for a multilingual website, you might use the UTF-8 character set, which supports characters from various languages, including English, Chinese, Arabic, and others. Using UTF-8 encoding, you can store and retrieve data in these languages seamlessly, ensuring that text displays correctly for users worldwide.

However, if you use a character set that doesn’t support specific characters, such as storing Arabic text in a Chinese character set, there would be issues with the display of the data.

MySQL and Percona work better together. Download and install Percona Distribution for MySQL today!

MySQL Collation and its Relationship with Character Sets

Collation refers to a set of rules and conventions that dictate how character data is compared and sorted, playing a crucial role in determining the order in which data is retrieved from the database and how various string operations, such as searching and filtering, are performed. 

Collation is closely intertwined with character sets, defining how characters within a specific character set are ordered. To ensure consistent results, it’s important to select compatible collations for your chosen character sets. Incompatibilities between character sets and collations can lead to unexpected sorting and comparison outcomes, which could lead to issues in database operations and application functionality. 

Choosing the Appropriate Character Set

When deciding on the character set for your data, several important considerations should guide your decision-making process. First and foremost, you should take into account the nature of your data and your target audience. Is your data primarily composed of Latin-based text, or do you expect a diverse audience that requires support for various languages? Additionally, it’s crucial to ensure that the character set you select aligns with the encoding used in your web application, ensuring seamless communication between your database and the application layer.

If your application serves a global audience, it’s often wise to opt for a Unicode character set like UTF-8. Unicode provides comprehensive support for various languages and scripts, making it an excellent choice for multilingual applications. However, if your application predominantly serves a single language or region, choosing a character set optimized for that specific context can lead to more efficient data storage and improved overall performance.

Dealing with the complexities of multilingual data may involve not only choosing the right character set but also ensuring that your database design, application code, and collation settings are configured to handle multilingual content effectively. It’s essential to plan for data input, storage, and retrieval in a manner that accommodates working with diverse character sets and languages while maintaining a seamless user experience.

Impact of Charset and Collation on Indexing Strategies

Charset and collation choices in MySQL can have a significant impact on indexing strategies. These choices influence how data is stored, sorted, and compared within the database, which directly affects how indexes function.

When it comes to indexing, one of the key considerations is the length of indexed values — especially for text columns. Different character sets have different storage requirements for characters, with some requiring more bytes to represent certain characters. For example, UTF-8, a widely used character set, can use one to four bytes per character, while Latin1 typically uses one byte per character. This difference in storage size can lead to variations in the size of index entries, which can impact efficiency.

Collation determines how string comparisons are executed, which affects how indexes are used in sorting and searching operations. For example, if your application requires case-insensitive searches, selecting a case-insensitive collation can be more efficient than using a case-sensitive one. However, it’s essential to note that different collations can have different performance implications. Some collations may be faster for sorting operations but slower for searching, while others may be optimized for specific languages or uses.

To illustrate, let’s look at a real-world example. Suppose you’re in the process of building a multilingual system that needs to support a diverse array of languages. In this scenario, opting for a UTF-8 character set is a logical choice, as it provides comprehensive language support, and you could employ a collation that facilitates case-insensitive searches, enhancing user-friendly information retrieval.

Alternatively, if you find yourself developing an application where space constraints are a concern, you may opt for a more space-efficient character set like Latin1. Here, it becomes crucial to select a collation that strikes a balance between sorting and searching performance, ensuring efficient data handling.

Ultimately, the influence of character sets and collations on your indexing strategies should meet the unique requirements and priorities of your application, ensuring optimal database performance and a positive user experience.

Are you looking to optimize database efficiency and reduce the risk of downtime? This on-demand webinar covers everything you need to know. Watch the recorded webinar.

Charset and Collation Effects on Query Execution

Choosing the correct charset and collation settings in a MySQL database can significantly impact query execution time and overall database performance. When not appropriately configured, they can become bottlenecks in query execution. For example, if your database uses a character set that doesn’t match the character set of the data sent in a query, MySQL may need to perform character set conversions on the fly, resulting in slower query execution times.

In addition, collation settings can affect string comparisons, which are common in queries involving WHERE clauses or JOIN operations. Using an inefficient collation can lead to suboptimal query performance. For instance, if you have a case-insensitive collation but most of your queries are case-sensitive, it slows down the query.

To demonstrate the importance of optimizing charset and collation settings, here’s an example. Say you have a web application with a user database using a UTF-8 character set and a case-insensitive collation, and the application queries user data to validate login credentials. By switching to a UTF-8 character set with a case-sensitive collation, user authentication queries can be significantly faster, as they no longer require case-insensitive comparisons. This optimization improves query execution times and provides a much better user experience.

Testing Read-Only CPU Intensive Workloads

Following my post MySQL 8 is not always faster than MySQL 5.7, this time I decided to test very simple read-only CPU-intensive workloads when all data fits memory. In this workload, there are NO IO operations, only memory and CPU operations.

My Testing Setup

Environment specification

  • Release | Ubuntu 18.04 LTS (bionic)
  • Kernel | 4.15.0-20-generic
  • Processors | physical = 2, cores = 28, virtual = 56, hyperthreading = yes
  • Models | 56xIntel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz<
  • Memory Total | 376.6G
  • Provider | packet.net x2.xlarge.x86 instance

I will test two workloads, sysbench oltp_read_only and oltp_point_select varying amount of threads

sysbench oltp_read_only mysqlssl=off reportinterval=1 time=300 threads=$i tables=10 tablesize=10000000 mysqluser=root run

sysbench oltp_point_select mysqlssl=off reportinterval=1 time=300 threads=$i tables=10 tablesize=10000000 mysqluser=root run

The results for OLTP read-only (latin1 character set):

MySQL 5.7.25 MySQL 8.0.15
threads throughput throughput throughput ratio
1 1241.18 1114.4 1.11
4 4578.18 4106.69 1.11
16 15763.64 14303.54 1.10
24 21384.57 19472.89 1.10
32 25081.17 22897.04 1.10
48 32363.27 29600.26 1.09
64 39629.09 35585.88 1.11
128 38448.23 34718.42 1.11
256 36306.44 32798.12 1.11

The results for point_select (latin1 character set):

point select MySQL 5.7.25 MySQL 8.0.15
threads throughput throughput throughput ratio
1 31672.52 28344.25 1.12
4 110650.7 98296.46 1.13
16 390165.41 347026.49 1.12
24 534454.55 474024.56 1.13
32 620402.74 554524.73 1.12
48 806367.3 718350.87 1.12
64 1120586.03 972366.59 1.15
128 1108638.47 960015.17 1.15
256 1038166.63 891470.11 1.16

We can see that in the OLTP read-only workload, MySQL 8.0.15 is slower by 10%, and for the point_select workload MySQL 8.0.15 is slower by 12-16%.

Although the difference is not necessarily significant, this is enough to reveal that MySQL 8.0.15 does not perform as well as MySQL 5.7.25 in the variety of workloads that I am testing.

However, it appears that the dynamic of the results will change if we use the utf8mb4 character set instead of latin1.

Let’s compare MySQL 5.7.25 latin1 vs. utf8mb4, as utf8mb4 is now the default CHARSET in MySQL 8.0.

But before we do that, let’s take a look also at COLLATION.

MySQL 5.7.25 uses a default collation utf8mb4_general_ci, However, I read that to use proper sorting and comparison for Eastern European languages, you may want to use the utf8mb4_unicode_ci collation. For MySQL 8.0.5, the default collation is

So let’s compare each version latin1 vs utf8mb4 (with default collation). First 5.7:

Threads utf8mb4_general_ci latin1 latin1 ratio
4 2957.99 4578.18 1.55
24 13792.55 21384.57 1.55
64 24516.99 39629.09 1.62
128 23977.07 38448.23 1.60

So here we can see that utf8mb4 in MySQL 5.7 is really much slower than latin1 (by 55-60%)

And the same for MySQL 8.0.15

MySQL 8.0 defaultcollations

Threads utf8mb4_0900_ai_ci (default) latin1 latin1 ratio
4 3968.88 4106.69 1.03
24 18446.19 19472.89 1.06
64 32776.35 35585.88 1.09
128 31301.75 34718.42 1.11

For MySQL 8.0 the hit from utf8mb4 is much lower (up to 11%)

Now let’s compare all collations for utf8mb4

For MySQL 5.7

MySQL 5.7 utf8mb4

utf8mb4_general_ci (default) utf8mb4_bin utf8mb4_unicode_ci utf8mb4_unicode_520_ci
4 2957.99 3328.8 2157.61 1942.78
24 13792.55 15857.29 9989.96 9095.17
64 24516.99 28125.16 16207.26 14768.64
128 23977.07 27410.94 15970.6 14560.6

If you plan to use utf8mb4_unicode_ci, you will get an even further performance hit (comparing to utf8mb4_general_ci).

And for MySQL 8.0.15

MySQL 8.0 utf8mb4

utf8mb4_general_ci utf8mb4_bin utf8mb4_unicode_ci utf8mb4_0900_ai_ci (default)
4 3461.8 3628.01 3363.7 3968.88
24 16327.45 17136.16 15740.83 18446.19
64 28960.62 30390.29 27242.72 32776.35
128 27967.25 29256.89 26489.83 31301.75

So now let’s compare MySQL 8.0 vs MySQL 5.7 in utf8mb4 with default collations:

mysql 8 and 5.7 default collation

MySQL 8.0 utf8mb4_0900_ai_ci MySQL 5.7 utf8mb4_general_ci MySQL 8.0 ratio
4 3968.88 2957.99 1.34
24 18446.19 13792.55 1.34
64 32776.35 24516.99 1.34
128 31301.75 23977.07 1.31

So there we are. In this case, MySQL 8.0 is actually better than MySQL 5.7 by 34%

After Testing Conclusions

There are several observations to make:

  • MySQL 5.7 outperforms MySQL 8.0 in latin1 charset
  • MySQL 8.0 outperforms MySQL 5.7 by a wide margin if we use utf8mb4 charset
  • Be aware that utf8mb4  is now the default MySQL 8.0, while MySQL 5.7 has latin1 by default
  • When running comparisons between MySQL 8.0 and MySQL 5.7, be aware of what charset you are using, as it may affect the comparison a lot.

Best Practices for Charset and Collation Optimization

Optimizing charset and collation settings in MySQL involves careful planning and execution to ensure compatibility, efficiency, and long-term maintenance. It’s recommended to choose character sets and collations that align with your application’s needs, so consider the types of data your database will store and process, as well as the languages and character sets used in your application. For multilingual applications, UTF-8 is a popular choice due to its broad character support. When selecting collations, opt for those that match your application’s case sensitivity requirements to prevent unnecessary overhead.

When making charset and collation settings changes to existing databases, it can be a little trickier. Back up your data before making any modifications, conduct a thorough analysis of the database’s current state and the potential impact of changes, and test the changes in a controlled environment to ensure they won’t disrupt operations.

Also, you should regularly review and update your charset and collation settings as your application evolves and data requirements change. Robust monitoring solutions, like Percona Monitoring and Management, can track query performance, identify bottlenecks, and ensure that charset and collation settings continue to align with your application’s demands.

Looking to upgrade to MySQL 8.0 or stay on 5.7? Percona can help.

Choosing the appropriate character set and collation for your data involves considerations of your data’s nature, your target audience, and your application’s needs, and highlights the role that optimized charset and collation settings play in query execution times, the user experience, and overall database performance.

Percona offers comprehensive support solutions to ensure a smooth transition to MySQL 8.0 or EOL support for 5.7.

Move to MySQL 8.0  Get Post-EOL Support For MYSQL 5.7

Aug
07
2018
--

Replicating from MySQL 8.0 to MySQL 5.7

replicate from MySQL 8 to MySQL 5.7

In this blog post, we’ll discuss how to set a replication from MySQL 8.0 to MySQL 5.7. There are some situations that having this configuration might help. For example, in the case of a MySQL upgrade, it can be useful to have a master that is using a newer version of MySQL to an older version slave as a rollback plan. Another example is in the case of upgrading a master x master replication topology.

Officially, replication is only supported between consecutive major MySQL versions, and only from a lower version master to a higher version slave. Here is an example of a supported scenario:

5.7 master –> 8.0 slave

while the opposite is not supported:

8.0 master –> 5.7 slave

In this blog post, I’ll walk through how to overcome the initial problems to set a replication working in this scenario. I’ll also show some errors that can halt the replication if a new feature from MySQL 8 is used.

Here is the initial set up that will be used to build the topology:

slave > select @@version;
+---------------+
| @@version     |
+---------------+
| 5.7.17-log |
+---------------+
1 row in set (0.00 sec)
master > select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.12    |
+-----------+
1 row in set (0.00 sec)

First, before executing the CHANGE MASTER command, you need to modify the collation on the master server. Otherwise the replication will run into this error:

slave > show slave status\G
                   Last_Errno: 22
                   Last_Error: Error 'Character set '#255' is not a compiled character set and is not specified in the '/opt/percona_server/5.7.17/share/charsets/Index.xml' file' on query. Default database: 'mysql8_1'. Query: 'create database mysql8_1'

This is because the default character_set and the collation has changed on MySQL 8. According to the documentation:

The default value of the character_set_server and character_set_database system variables has changed from latin1 to utf8mb4.

The default value of the collation_server and collation_database system variables has changed from latin1_swedish_ci to utf8mb4_0900_ai_ci.

Let’s change the collation and the character set to utf8 on MySQL 8 (it is possible to use any option that exists in both versions):

# master my.cnf
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
collation-server=utf8_unicode_ci

You need to restart MySQL 8 to apply the changes. Next, after the restart, you have to create a replication user using mysql_native_password.This is because MySQL 8 changed the default Authentication Plugin to caching_sha2_password which is not supported by MySQL 5.7. If you try to execute the CHANGE MASTER command with a user using caching_sha2_password plugin, you will receive the error message below:

Last_IO_Errno: 2059
Last_IO_Error: error connecting to master 'root@127.0.0.1:19025' - retry-time: 60 retries: 1

To create a user using mysql_native_password :

master> CREATE USER 'replica_user'@'%' IDENTIFIED WITH mysql_native_password BY 'repli$cat';
master> GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'%';

Finally, we can proceed as usual to build the replication:

master > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000007
Position: 155
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
slave > CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=155; start slave;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
# This procedure works with GTIDs too
slave > CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='replica_user', MASTER_PASSWORD='repli$cat',MASTER_PORT=19025,MASTER_AUTO_POSITION = 1 ; start slave;

Checking the replication status:

master > show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: replica_user
Master_Port: 19025
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 155
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 321
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 155
Relay_Log_Space: 524
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 100
Master_UUID: 00019025-1111-1111-1111-111111111111
Master_Info_File: /home/vinicius.grippa/sandboxes/rsandbox_5_7_17/master/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.01 sec)

Executing a quick test to check if the replication is working:

master > create database vinnie;
Query OK, 1 row affected (0.06 sec)

slave > show databases like 'vinnie';
+-------------------+
| Database (vinnie) |
+-------------------+
| vinnie |
+-------------------+
1 row in set (0.00 sec)

Caveats

Any tentative attempts to use a new feature from MySQL 8 like roles, invisible indexes or caching_sha2_password will make the replication stop with an error:

master > alter user replica_user identified with caching_sha2_password by 'sekret';
Query OK, 0 rows affected (0.01 sec)

slave > show slave status\G
               Last_SQL_Errno: 1396
               Last_SQL_Error: Error 'Operation ALTER USER failed for 'replica_user'@'%'' on query. Default database: ''. Query: 'ALTER USER 'replica_user'@'%' IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$H	MEDi\"gQ
                        wR{/I/VjlgBIUB08h1jIk4fBzV8kU1J2RTqeqMq8Q2aox0''

Summary

Replicating from MySQL 8 to MySQL 5.7 is possible. In some scenarios (especially upgrades), this might be helpful, but it is not advisable to have a heterogeneous topology because it will be prone to errors and incompatibilities under some cases.

You might also like:

 

The post Replicating from MySQL 8.0 to MySQL 5.7 appeared first on Percona Database Performance Blog.

Mar
28
2017
--

Troubleshooting Issues with MySQL Character Sets Q & A

MySQL Character Sets

MySQL Character SetsIn this blog, I will provide answers to the Q & A for the Troubleshooting Issues with MySQL Character Sets webinar.

First, I want to thank everybody for attending the March 9 MySQL character sets troubleshooting webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: We’ve had some issues converting tables from

utf8

  to

utf8mb4

. Our issue was that the collation we wanted to use –

utf8mb4_unicode_520_ci

 – did not distinguish between spaces and ideographic (Japanese) spaces, so we were getting unique constraint violations for the 

varchar

 fields when two entries had the same text with different kinds of spaces. Have you seen this problem and is there a workaround? We were wondering if this was related to the mother-child character bug with this collation.

A: Unfortunately this issue exists for many languages. For example, in Russian you cannot distinguish “?” and “?” if you use

utf8

 or

utf8mb4

. However, there is hope for Japanese: Oracle announced that they will implement new language-specific

utf8mb4

 collations in MySQL 8.0. I already see 21 new collations in my 8.0.0 installation.

mysql> show collation like '%0900%';
+----------------------------+---------+-----+---------+----------+---------+
| Collation                  | Charset | Id  | Default | Compiled | Sortlen |
+----------------------------+---------+-----+---------+----------+---------+
| utf8mb4_0900_ai_ci         | utf8mb4 | 255 |         | Yes      |       8 |
| utf8mb4_cs_0900_ai_ci      | utf8mb4 | 266 |         | Yes      |       8 |
| utf8mb4_da_0900_ai_ci      | utf8mb4 | 267 |         | Yes      |       8 |
| utf8mb4_de_pb_0900_ai_ci   | utf8mb4 | 256 |         | Yes      |       8 |
| utf8mb4_eo_0900_ai_ci      | utf8mb4 | 273 |         | Yes      |       8 |
| utf8mb4_es_0900_ai_ci      | utf8mb4 | 263 |         | Yes      |       8 |
| utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 |         | Yes      |       8 |
| utf8mb4_et_0900_ai_ci      | utf8mb4 | 262 |         | Yes      |       8 |
| utf8mb4_hr_0900_ai_ci      | utf8mb4 | 275 |         | Yes      |       8 |
| utf8mb4_hu_0900_ai_ci      | utf8mb4 | 274 |         | Yes      |       8 |
| utf8mb4_is_0900_ai_ci      | utf8mb4 | 257 |         | Yes      |       8 |
| utf8mb4_la_0900_ai_ci      | utf8mb4 | 271 |         | Yes      |       8 |
| utf8mb4_lt_0900_ai_ci      | utf8mb4 | 268 |         | Yes      |       8 |
| utf8mb4_lv_0900_ai_ci      | utf8mb4 | 258 |         | Yes      |       8 |
| utf8mb4_pl_0900_ai_ci      | utf8mb4 | 261 |         | Yes      |       8 |
| utf8mb4_ro_0900_ai_ci      | utf8mb4 | 259 |         | Yes      |       8 |
| utf8mb4_sk_0900_ai_ci      | utf8mb4 | 269 |         | Yes      |       8 |
| utf8mb4_sl_0900_ai_ci      | utf8mb4 | 260 |         | Yes      |       8 |
| utf8mb4_sv_0900_ai_ci      | utf8mb4 | 264 |         | Yes      |       8 |
| utf8mb4_tr_0900_ai_ci      | utf8mb4 | 265 |         | Yes      |       8 |
| utf8mb4_vi_0900_ai_ci      | utf8mb4 | 277 |         | Yes      |       8 |
+----------------------------+---------+-----+---------+----------+---------+
21 rows in set (0,03 sec)

In 8.0.1 they promised new case-sensitive and Japanese collations. Please see this blog post for details. The note about the planned Japanese support is at the end.

Meanwhile, I can only suggest that you implement your own collation as described here. You may use

utf8_russian_ci

 collation from Bug #51976 as an example.

Although the user manual does not list

utf8mb4

 as a character set for which it’s possible to create new collations, you can actually do it. What you need to do is add a record about the character set

utf8mb4

 and the new collation into

Index.xml

, then restart the server.

<charset name="utf8mb4">
<collation name="utf8mb4_russian_ci" id="1033">
 <rules>
    <reset>u0415</reset><p>u0451</p><t>u0401</t>
  </rules>
</collaiton>
</charset>
mysql> show collation like 'utf8mb4_russian_ci';
+--------------------+---------+------+---------+----------+---------+
| Collation          | Charset | Id   | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_russian_ci | utf8mb4 | 1033 |         |          |       8 |
+--------------------+---------+------+---------+----------+---------+
1 row in set (0,03 sec)
mysql> create table test_yo(gen varchar(100) CHARACTER SET utf8mb4, yo varchar(100) CHARACTER SET utf8mb4 collate utf8mb4_russian_ci) engine=innodb default character set=utf8mb4;
Query OK, 0 rows affected (0,25 sec)
mysql> set names utf8mb4;
Query OK, 0 rows affected (0,02 sec)
mysql> insert into test_yo values('??', '??'), ('???', '???'), ('????', '????');
Query OK, 3 rows affected (0,05 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> insert into test_yo values('??', '??'), ('???', '???'), ('????', '????');
Query OK, 3 rows affected (0,06 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> select * from test_yo order by gen;
+----------+----------+
| gen      | yo       |
+----------+----------+
| ??       | ??       |
| ??       | ??       |
| ????     | ????     |
| ????     | ????     |
| ???      | ???      |
| ???      | ???      |
+----------+----------+
6 rows in set (0,00 sec)
mysql> select * from test_yo order by yo;
+----------+----------+
| gen      | yo       |
+----------+----------+
| ??       | ??       |
| ??       | ??       |
| ???      | ???      |
| ???      | ???      |
| ????     | ????     |
| ????     | ????     |
+----------+----------+
6 rows in set (0,00 sec)

Q: If receiving

utf8

 on

latin1

 charset it will be corrupted. Just want to confirm that you can reformat as

utf8

 and un-corrupt the data? Also, is there a time limit on how quickly this needs to be done?

A: It will be corrupted only if you store

utf8

 data in the 

latin1

 column. For example, if you have a table, defined as:

create table latin1(
  f1 varchar(100)
) engine=innodb default charset=latin1;

And then insert a word in

utf8

 format into it that contains characters that are not in the 

latin1

 character set:

mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> set sql_mode='';
Query OK, 0 rows affected, 1 warning (0,00 sec)
mysql> insert into latin1 values('Sveta'), ('?????');
Query OK, 2 rows affected, 1 warning (0,04 sec)
Records: 2  Duplicates: 0  Warnings: 1

The data in

UTF8

 will be corrupted and can never be recovered:

mysql> select * from latin1;
+-------+
| f1    |
+-------+
| Sveta |
| ????? |
+-------+
2 rows in set (0,00 sec)
mysql> select f1, hex(f1) from latin1;
+-------+------------+
| f1    | hex(f1)    |
+-------+------------+
| Sveta | 5376657461 |
| ????? | 3F3F3F3F3F |
+-------+------------+
2 rows in set (0,01 sec)

However, if your data is stored in the 

UTF8

 column and you use

latin1

 for a connection, you will only get a corrupted result set. The data itself will be left untouched:

mysql> create table utf8(f1 varchar(100)) engine=innodb character set utf8;
Query OK, 0 rows affected (0,18 sec)
mysql> insert into utf8 values('Sveta'), ('?????');
Query OK, 2 rows affected (0,15 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> set names latin1;
Query OK, 0 rows affected (0,00 sec)
mysql> select f1, hex(f1) from utf8;
+-------+----------------------+
| f1    | hex(f1)              |
+-------+----------------------+
| Sveta | 5376657461           |
| ????? | D0A1D0B2D0B5D182D0B0 |
+-------+----------------------+
2 rows in set (0,00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> select f1, hex(f1) from utf8;
+------------+----------------------+
| f1         | hex(f1)              |
+------------+----------------------+
| Sveta      | 5376657461           |
| ?????      | D0A1D0B2D0B5D182D0B0 |
+------------+----------------------+
2 rows in set (0,00 sec)

Q: Can you discuss how charsets affect mysqldump? Specifically, how do we dump a database containing tables with different default charsets?

A: Yes, you can. MySQL can successfully convert data that uses different character sets, so your only job is to specify option

--default-character-set

 for

mysqldump

. In this case, strings in any character set you use can be converted to the character set specified. For example, if you use

cp1251

 and

latin1

, you may set option

--default-character-set

 to

cp1251

,

utf8

 and 

utf8mb4

. However, you cannot set it to

latin1

 because Cyrillic characters exist in the 

cp1251

 character set, but do not exist in

latin1

.

The default value for

mysqldump

 is

utf8

. You only need to change this default if you use values that are outside of the range supported by 

utf8

 (for example, the smileys in

utf8mb4

).

Q: But if you use the 

--single-transaction

 option for

mysqldump

, you can only specify one character set in the default?

A: Yes, and this is OK: all data will be converted into this character set. And then, when you will restore the dump, it will be converted back to the character set specified in column definitions.

Q: I noticed that MySQL doesn’t support case-sensitive

UTF-8

 character sets. What do you recommend for implementing case-sensitive

UTF-8

, if it’s at all possible?

A: In the link I provided earlier, Oracle promises to implement case-sensitive collations for

utf8mb4

 in version 8.0.1. Before that happens, I recommend you to implement your own case-sensitive collation.

Q: How are tools like

pt-table-checksum

 affected by charsets? Is it safe to use a 4-byte charset (like

utf8mb4

) as the default charset for all comparisons? Assuming our tables are a mix of

latin1

 ,

utf8

 and

utf8mb4

.

A: With this combination, you won’t have any issues:

pt-table-checksum

 uses a complicated set of functions that joins columns and calculates a 

crc32

 checksum on them. In your case, all data will be converted to

utf8mb4

 and no conflicts will happen.

However, if you use incompatible character sets in a single table, you may get the error

"Illegal mix of collations for operation 'concat_ws' "

:

mysql> create table cp1251(f1 varchar(100) character set latin1, f2 varchar(100) character set cp1251) engine=innodb;
Query OK, 0 rows affected (0,32 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> insert into cp1251 values('Sveta', '?????');
Query OK, 1 row affected (0,07 sec)
sveta@Thinkie:~/build/mysql-8.0/mysql-test$ ~/build/percona-toolkit/bin/pt-table-checksum h=127.0.0.1,P=13000,u=root,D=test
Diffs cannot be detected because no slaves were found.  Please read the --recursion-method documentation for information.
03-18T03:51:58 Error executing EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/"] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 11351.
03-18T03:51:58 Error checksumming table db1.cp1251: Error executing checksum query: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*checksum table*/" with ParamValues: 0='db1', 1='cp1251', 2=1, 3=undef, 4=undef, 5=undef] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 10741.
TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-18T03:51:58      2      0        0       1       0   0.003 db1.cp1251
03-18T03:51:58      0      0        2       1       0   0.167 db1.latin1
03-18T03:51:58      0      0        6       1       0   0.198 db1.test_yo
...

The tool continues working, and will process the rest of your tables. I reported this behavior as Bug #1674266.

Thanks for attending the Troubleshooting Issues with MySQL Character Sets webinar.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com