Jun
24
2022
--

Debug Symbols for Percona Server for MongoDB and MongoDB

Debug Symbols MongoDB

Debug Symbols MongoDBBoth Percona Server for MongoDB and vanilla MongoDB packages do not contain debug symbols by default. This is because the debug symbols package can be up to a 3GB download depending on the version and target platform. Fortunately, you only need debug symbols in those rare cases when you have got a serious enough issue, and you really want to debug it. So for most users, it is an absolutely reasonable decision to not download gigabytes of debug symbols by default.

This blog post provides pointers to where to get debug symbols for Percona Server for MongoDB or vanilla MongoDB.

Percona Server for MongoDB

Using the corresponding package manager

Percona provides debug symbols packages for Percona Server for MongoDB. It is recommended to install Percona packages from official Percona repositories using the corresponding tool for your system:

Installing debug symbols manually

You can also download packages from the Percona website and install them manually using dpkg or rpm.

To get debug symbols for Percona Server for MongoDB, go to the downloads page. Then look for the “Percona Server for MongoDB” section and click the “Download X.X Now” button corresponding to the version you are interested in.

On the new page, select the minor release version from the Version: dropdown list and the target platform from the Software: dropdown list. This will reveal a list of available packages for the selected platform. You can search for a dbg or debuginfo package (depending on the target platform) and download it.

In most cases, it is possible to download debug symbols as a special package or as part of the “All Packages” bundle. For example, on the Percona Server for MongoDB 5.0.98 page for Ubuntu 20.04 you can download either a separate percona-server-mongodb-dbg_5.0.98.focal_amd64.deb package or all packages bundle which contains debug symbols package: percona-server-mongodb-5.0.9 8-r15a95b4-focal-x86_64-bundle.tar

MongoDB Debug Symbols

There are no debug symbols packages provided by MongoDB Inc., but fortunately, it is possible to download binary tarballs containing debug symbols files from the non-advertised location: https://www.mongodb.org/dl/linux/x86_64 

Be careful – the above link opens a huge list containing thousands of tgz archives created since 2009. This is virtually the full MongoDB history, in a single directory.

The names of those files are talking for themselves: it’s a combination of architecture, platform, and MongoDB version.

For example, there are two files for MongoDB 5.0.9 on Ubuntu 20.04:

The first of those files are just server core binaries:

$ tar -tf mongodb-linux-x86_64-ubuntu2004-5.0.9.tgz
mongodb-linux-x86_64-ubuntu2004-5.0.9/LICENSE-Community.txt
mongodb-linux-x86_64-ubuntu2004-5.0.9/MPL-2
mongodb-linux-x86_64-ubuntu2004-5.0.9/README
mongodb-linux-x86_64-ubuntu2004-5.0.9/THIRD-PARTY-NOTICES
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/install_compass
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/mongo
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/mongod
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/mongos

The second file contains corresponding debug symbols files:

$ tar -tf mongodb-linux-x86_64-ubuntu2004-debugsymbols-5.0.9.tgz
mongodb-linux-x86_64-ubuntu2004-5.0.9/LICENSE-Community.txt
mongodb-linux-x86_64-ubuntu2004-5.0.9/MPL-2 
mongodb-linux-x86_64-ubuntu2004-5.0.9/README 
mongodb-linux-x86_64-ubuntu2004-5.0.9/THIRD-PARTY-NOTICES 
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/mongo.debug 
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/mongod.debug 
mongodb-linux-x86_64-ubuntu2004-5.0.9/bin/mongos.debug

Thus, you can use those symbol files with gdb to analyze the core dump file if you have one or to debug a running instance of MongoDB Community Edition.

Each xxxx.debug file is a debug symbols file for the corresponding xxxx binary. If you accidentally try to debug with a mismatched symbols file, gdb will politely inform you about that:

Reading symbols from ./mongod...
warning: the debug information found in "/home/igor/5.0.9/bin/mongod.debug" does not match "/home/igor/5.0.9/bin/mongod" (CRC mismatch).

(no debugging symbols found)...done.

This especially can happen if you upgrade the binaries package but not debug symbols.

Conclusion

It is a really rare case when you will need to debug Percona Server for MongoDB or MongoDB, but if you really need to I hope this debug symbols information will save a few minutes for you.

Happy debugging!

Jun
24
2020
--

Lightrun raises $4M for its continuous debugging and observability platform

Lightrun, a Tel Aviv-based startup that makes it easier for developers to debug their production code, today announced that it has raised a $4 million seed round led by Glilot Capital Partners, with participation from a number of engineering executives from several Fortune 500 firms.

The company was co-founded by Ilan Peleg (who, in a previous life, was a competitive 800m runner) and Leonid Blouvshtein, with Peleg taking the CEO role and Blouvshtein the CTO position.

The overall idea behind Lightrun is that it’s too hard for developers to debug their production code. “In today’s world, whenever a developer issues a new software version and deploys it into production, the only way to understand the application’s behavior is based on log lines or metrics which were defined during the development stage,” Peleg explained. “The thing is, that is simply not enough. We’ve all encountered cases of missing a very specific log line when trying to troubleshoot production issues, then having to release a new hotfix version in order to add this specific logline, or — alternatively — reproduce the bug locally to better understand the application’s behavior.”

Image Credits: Lightrun

With Lightrun, as the co-founders showed me in a demo, developers can easily add new logs and metrics to their code from their IDE and then receive real-time data from their real production or development environments. For that to work, they need to have the Lightrun agent installed, but the overhead here is generally low because the agent sits idle until it is needed. In the IDE, the experience isn’t all that different from setting a traditional breakpoint in a debugger — only that there is no break. Lightrun can also use existing logging tools like Datadog to pipe its logging data to them.

While the service’s agent is agnostic about the environment it runs in, the company currently only supports JVM languages. Blouvshtein noted that building JVM language support was likely harder than building support for other languages and the company plans to launch support for more languages in the future.

“We make a point of investing in technologies that transform big industries,” said Kobi Samboursky, founder and managing partner at Glilot Capital Partners . “Lightrun is spearheading Continuous Debugging and Continuous Observability, picking up where CI/CD ends, turning observability into a real-time process instead of the iterative process it is today. We’re confident that this will become DevOps and development best practices, enabling I&O leaders to react faster to production issues.”

For now, there is still a bit of an onboarding process to get started with Lightrun, though that’s generally a very short process, the team tells me. Over time, the company plans to make this a self-service process. At that point, Lightrun will likely also become more interesting to smaller teams and individual developers, though the company is mostly focused on enterprise users and, despite only really launching out of stealth today and offering limited language support, the company already has a number of paying customers, including major enterprises.

“Our strategy is based on two approaches: bottom-up and top-down. Bottom-up, we’re targeting developers, they are the end-users and we want to ensure they get a quality product they can trust to help them. We put a lot of effort into reaching out through the developer channels and communities, as well as enabling usage and getting feedback. […] Top-down approach, we are approaching R&D management like VP of R&D, R&D directors in bigger companies and then we show them how Lightrun saves company development resources and improves customer satisfaction.”

Unsurprisingly, the company, which currently has about a dozen employees, plans to use the new funding to add support for more languages and improve its service with new features, including support for tracing.

Sep
25
2018
--

Why Optimization derived_merge can Break Your Queries

MySQL optimizer bugs

MySQL optimizer bugsLately, I worked on several queries which started returning wrong results after upgrading MySQL Server to version 5.7 The reason for the failure was derived merge optimization which is one of the default

optimizer_switch

  options. Issues were solved, though at the price of performance, when we turned it

OFF

 . But, more importantly, we could not predict if any other query would start returning incorrect data, to allow us to fix the application before it was too late. Therefore I tried to find reasons why

derived_merge

  can fail.

Analyzing the problem

In the first run, we turned SQL Mode

ONLY_FULL_GROUP_BY

on, and this removed most of the problematic queries. That said, few of the queries that were successfully working with

ONLY_FULL_GROUP_BY

  were affected.

A quick search in the MySQL bugs database gave me a not-so-short list of open bugs:

At first glance, the reported queries do not follow any pattern, and we still cannot quickly identify which would break and which would not.

Then I took a second look by running all of the provided test cases in my environment and found that for four bugs, the optimizer rewrote the query. For three of the bugs, it rewrote in both 5.7 and 8.0, and one case it rewrote in 8.0 only.

The remaining three buggy queries (Bug #85117, Bug #91418, Bug #91878) have things in common. Let’s first look at them:

  1. Bug #85117
    select
        temp.sel
    from
        table1 t1
        left join (
            select *,1 as sel from table1_use t1u where t1u.`table1id`=1
        ) temp on temp.table1id = t1.id
    order by t1.value
  2. Bug #91418
    select
        TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    from
        TEST_SUB_PROBLEM TST
        join (
            select uuid() as DIST_UID, vw.*
            from (
                select DISTINCT BID, THING_NAME
                from TEST_SUB_PROBLEM
            ) vw
        ) vw2
    on vw2.BID = TST.BID;
  3. Bug #91878
    SELECT
        Virtual_Table.T_FP AS T_FP,
        (
            SELECT COUNT(Virtual_Table.T_FP)
            FROM t1 t
            WHERE t.f1 = Virtual_Table.T_FP
            AND Virtual_Table.T_FP = 731834939448428685
        ) AS Test_Value
    FROM (
        SELECT t.f1 AS T_FP, tv.f1 AS TV_FP
        FROM t1 AS t
        JOIN t2 AS tv
        ON t.f1 = tv.t1_f1
    ) AS Virtual_Table
    GROUP BY
        Virtual_Table.TV_FP
    HAVING
        Test_Value > 0;

Two of the queries use

DISTINCT

  or

GROUP BY

 , one uses

ORDER BY

  clause. The cases do not have not the same clause in common—which is what I’d expect to see—and so, surprisingly, these are not the cause of the failure. However, all three queries use generated values: a constant in the first one;

UUID()

  and

COUNT()

  functions in the second and third respectively. This similarity is something we need to investigate.

To find out why

derived_merge

  might work incorrectly for these queries we need to understand how this optimization works and why it was introduced.

The intent behind derived_merge

First I recommend checking the official MySQL User Reference Manual and MariaDB knowledge base. It is correct to use both manuals: even if low-level implementations vary, the high-level architecture and the purpose of this optimization are the same.

In short:

derived_merge

  is used for queries that have subqueries in the 

FROM

  clause,  also called “derived tables” and practically converts them into

JOIN

 queries. This optimization allows avoiding unnecessary materialization (creating internal temporary tables to hold results). Virtually this is the same thing as a manual rewrite of a query with a subquery into a query that has

JOIN

 clause(s) only. The only difference is that when we rewrite queries manually, we can compare the expected and actual result, then adjust the resulting query if needed. The MySQL optimizer has to do a correct rewrite at the first attempt. And sometimes this effort fails.

Let’s check why this happens for these particular queries, reported in the MySQL Bugs Database.

Case Study 1: a Query from Bug #85117

Original query

select
      temp.sel
from
    table1 t1
    left join (
         select *,1 as sel from table1_use t1u where t1u.`table1id`=1
    ) temp on temp.table1id = t1.id
order by t1.value

was rewritten to:

Note (Code 1003):
/* select#1 */
select 1 AS `sel`
    from
        `test`.`table1` `t1`
    left join
        (`test`.`table1_use` `t1u`)
    on(((`test`.`t1`.`id` = 1) and (`test`.`t1u`.`table1id` = 1)))
    where 1
    order by `test`.`t1`.`value`;

You can always find a query that the optimizer converts the original one to in the

SHOW WARNINGS

 output following

EXPLAIN [EXTENDED]

 for the query.

In this case, the original query asks to return all rows from the table

table1

 , but selects only the generated field from the subquery. The subquery selects the only row with

table1id=1

 .

Avoiding derived merge optimization is practically the same as joining table

table1

 with a table with one row. You can see how it works in this code snippet:

mysql> create temporary table temp as select *,1 as sel from table1_use t1u where t1u.`table1id`=1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql> select * from temp;
+----+----------+------+-----+
| id | table1id | uid  | sel |
+----+----------+------+-----+
|  1 |        1 |   99 |   1 |
+----+----------+------+-----+
1 row in set (0.00 sec)
mysql> select temp.sel from table1 t1 left join temp on temp.table1id = t1.id order by t1.value;
+------+
| sel  |
+------+
|    1 |
| NULL |
| NULL |
+------+
3 rows in set (0.00 sec)

However, when the optimizer uses derived-merge optimization, it completely ignores the fact that the resulting table has one row, and that the calculated value would be either

NULL

  or 1 depending if a row corresponding to

table1

  exists in the table. That it prints

select 1 AS `sel`

  in the

EXPLAIN

  output while uses

select NULL AS `sel`

  does not change anything: both are wrong. The correct query without a subquery should look like:

mysql> select if(`test`.`t1u`.`table1id`, 1, NULL) AS `sel`
    -> from `test`.`table1` `t1`
    -> left join (`test`.`table1_use` `t1u`)
    -> on(((`test`.`t1`.`id` = 1) and (`test`.`t1u`.`table1id` = 1)))
    -> where 1
    -> order by `test`.`t1`.`value`;
+------+
| sel  |
+------+
|    1 |
| NULL |
| NULL |
+------+
3 rows in set (0.00 sec)

This report is the easiest of the bugs we will discuss in this post, and is also fixed in MariaDB.

Case Study 2: a Query from Bug #91418

mysql> select * from TEST_SUB_PROBLEM;
+-----+--------+------------+---------------------+
| UID | BID    | THING_NAME | OTHER_IFO           |
+-----+--------+------------+---------------------+
|   1 | thing1 | name1      | look a chicken      |
|   2 | thing1 | name1      | look an airplane    |
|   3 | thing2 | name2      | look a mouse        |
|   4 | thing3 | name3      | look a taperecorder |
|   5 | thing3 | name3      | look an explosion   |
|   6 | thing4 | name4      | look at the stars   |
+-----+--------+------------+---------------------+
6 rows in set (0.00 sec)
mysql> select
    ->     TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    -> from
    ->     TEST_SUB_PROBLEM TST
    -> join (
    ->     select uuid() as DIST_UID, vw.*
    ->     from (
    ->         select DISTINCT BID, THING_NAME
    ->         from TEST_SUB_PROBLEM
    ->     ) vw
    -> ) vw2
    -> on vw2.BID = TST.BID;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | e4c288fd-b29c-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | e4c28aef-b29c-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | e4c28c47-b29c-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | e4c28d92-b29c-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | e4c28ed9-b29c-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | e4c29031-b29c-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.00 sec)

This query should create a unique

DIST_UID

  for each unique

BID

 name. But, instead, it generates a unique

ID

  for each row.

First, let’s split the query into a couple of queries using temporary tables, to confirm our assumption that it was written correctly in the first place:

mysql> create temporary table vw as select DISTINCT BID, THING_NAME from TEST_SUB_PROBLEM;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from vw;
+--------+------------+
| BID    | THING_NAME |
+--------+------------+
| thing1 | name1      |
| thing2 | name2      |
| thing3 | name3      |
| thing4 | name4      |
+--------+------------+
4 rows in set (0.00 sec)
mysql> create temporary table vw2 as select uuid() as DIST_UID, vw.* from vw;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from vw2;
+--------------------------------------+--------+------------+
| DIST_UID                             | BID    | THING_NAME |
+--------------------------------------+--------+------------+
| eb155f0e-b29d-11e8-b0d7-0242673a86b2 | thing1 | name1      |
| eb158c05-b29d-11e8-b0d7-0242673a86b2 | thing2 | name2      |
| eb159b28-b29d-11e8-b0d7-0242673a86b2 | thing3 | name3      |
| eb15a916-b29d-11e8-b0d7-0242673a86b2 | thing4 | name4      |
+--------------------------------------+--------+------------+
4 rows in set (0.00 sec)
mysql> select
    -> TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    -> from TEST_SUB_PROBLEM TST
    -> join vw2
    -> on vw2.BID = TST.BID;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | eb155f0e-b29d-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | eb155f0e-b29d-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | eb158c05-b29d-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | eb159b28-b29d-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | eb159b28-b29d-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | eb15a916-b29d-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.01 sec)
mysql> select distinct DIST_UID
    -> from (
    ->     select
    ->         TST.UID ,TST.BID ,TST.THING_NAME ,TST.OTHER_IFO ,vw2.DIST_UID
    ->     from TEST_SUB_PROBLEM TST
    ->     join vw2
    ->     on vw2.BID = TST.BID
    -> ) t;
+--------------------------------------+
| DIST_UID                             |
+--------------------------------------+
| eb155f0e-b29d-11e8-b0d7-0242673a86b2 |
| eb158c05-b29d-11e8-b0d7-0242673a86b2 |
| eb159b28-b29d-11e8-b0d7-0242673a86b2 |
| eb15a916-b29d-11e8-b0d7-0242673a86b2 |
+--------------------------------------+
4 rows in set (0.00 sec)

With temporary tables, we have precisely four unique

DIST_UID

  values unlike the six values that our original query returned.

Let’s check how the original query was rewritten:

Note (Code 1003):
/* select#1 */
select
    `test`.`TST`.`UID` AS `UID`,
    `test`.`TST`.`BID` AS `BID`,
    `test`.`TST`.`THING_NAME` AS `THING_NAME`,
    `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`,
    uuid() AS `DIST_UID`
from `test`.`TEST_SUB_PROBLEM` `TST`
join
    (/* select#3 */
    select
        distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`,
        `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME`
    from `test`.`TEST_SUB_PROBLEM`) `vw`
where (`vw`.`BID` = `test`.`TST`.`BID`)

You can see that the optimizer did not wholly remove the subquery here. Let’s run this modified query, and run a test with a temporary table one more time:

mysql> select
    ->     `test`.`TST`.`UID` AS `UID`,
    ->     `test`.`TST`.`BID` AS `BID`,
    ->     `test`.`TST`.`THING_NAME` AS `THING_NAME`,
    ->     `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`,
    ->     uuid() AS `DIST_UID`
    -> from
    ->     `test`.`TEST_SUB_PROBLEM` `TST`
    -> join
    -> (/* select#3 */
    ->     select
    ->         distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`,
    ->         `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME`
    ->     from
    ->         `test`.`TEST_SUB_PROBLEM`
    -> ) `vw`
    -> where (`vw`.`BID` = `test`.`TST`.`BID`)
    -> ;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | 12c5f554-b29f-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | 12c5f73a-b29f-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | 12c5f894-b29f-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | 12c5f9de-b29f-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | 12c5fb20-b29f-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | 12c5fc7d-b29f-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.01 sec)

This time the changed query result is no different to the one we received from the original one. Let’s manually replace the subquery with temporary tables, and check if it affects the result again.

mysql> create temporary table vw
    -> select
    ->     distinct `test`.`TEST_SUB_PROBLEM`.`BID` AS `BID`,
    ->     `test`.`TEST_SUB_PROBLEM`.`THING_NAME` AS `THING_NAME`
    -> from `test`.`TEST_SUB_PROBLEM`;
Query OK, 4 rows affected (0.01 sec)<br>Records: 4  Duplicates: 0  Warnings: 0
mysql> select * from vw;
+--------+------------+
| BID    | THING_NAME |
+--------+------------+
| thing1 | name1      |
| thing2 | name2      |
| thing3 | name3      |
| thing4 | name4      |
+--------+------------+
4 rows in set (0.00 sec)
mysql> select
    ->     `test`.`TST`.`UID` AS `UID`,
    ->     `test`.`TST`.`BID` AS `BID`,
    ->     `test`.`TST`.`THING_NAME` AS `THING_NAME`,
    ->     `test`.`TST`.`OTHER_IFO` AS `OTHER_IFO`,
    ->      uuid() AS `DIST_UID`
    -> from `test`.`TEST_SUB_PROBLEM` `TST`
    -> join vw where (`vw`.`BID` = `test`.`TST`.`BID`) ;
+-----+--------+------------+---------------------+--------------------------------------+
| UID | BID    | THING_NAME | OTHER_IFO           | DIST_UID                             |
+-----+--------+------------+---------------------+--------------------------------------+
|   1 | thing1 | name1      | look a chicken      | e11dbe61-b2a0-11e8-b0d7-0242673a86b2 |
|   2 | thing1 | name1      | look an airplane    | e11dc050-b2a0-11e8-b0d7-0242673a86b2 |
|   3 | thing2 | name2      | look a mouse        | e11dc1af-b2a0-11e8-b0d7-0242673a86b2 |
|   4 | thing3 | name3      | look a taperecorder | e11dc2be-b2a0-11e8-b0d7-0242673a86b2 |
|   5 | thing3 | name3      | look an explosion   | e11dc3a8-b2a0-11e8-b0d7-0242673a86b2 |
|   6 | thing4 | name4      | look at the stars   | e11dc4e9-b2a0-11e8-b0d7-0242673a86b2 |
+-----+--------+------------+---------------------+--------------------------------------+
6 rows in set (0.00 sec)

In this case, the temporary table contains the correct number of rows: 4, but the outer query calculates a 

UUID

  value for all rows in the table

TEST_SUB_PROBLEM

 . It does not take into account that the user initially asks for a unique

UUID

  for each unique

BID

  and not each unique

UID

 . Instead, it just moves a call of

UUID()

  function into the outer query, which creates a unique value for each row in the table

TEST_SUB_PROBLEM

 . It does not take into account that the temporary table contains only four rows. In this case, it would not be easy to build an effective query that generates distinct

UUID

  values for rows with different

BID

 ‘s and the same

UUID

  values for rows with the same

BID

 .

Case Study 3: a Query from Bug #91878

This query is supposed to calculate a number of rows based on complex conditions:

SELECT
Virtual_Table.T_FP AS T_FP,
(SELECT COUNT(Virtual_Table.T_FP) FROM t1 t WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685) AS Test_Value
FROM
(SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1) AS Virtual_Table
GROUP BY Virtual_Table.TV_FP
HAVING Test_Value > 0;

However, it returns no rows when it should return 22 (check the bug report for the full test case).

mysql> SELECT Virtual_Table.T_FP AS T_FP,
    -> (
    ->     SELECT
    ->         COUNT(Virtual_Table.T_FP)
    ->     FROM t1 t
    ->     WHERE
    ->         t.f1 = Virtual_Table.T_FP
    ->     AND
    ->         Virtual_Table.T_FP = 731834939448428685
    -> ) AS Test_Value
    -> FROM (
    ->     SELECT
    ->         t.f1 AS T_FP, tv.f1 AS TV_FP
    ->     FROM t1 AS t
    ->     JOIN t2 AS tv
    ->     ON t.f1 = tv.t1_f1
    -> ) AS Virtual_Table
    -> GROUP BY Virtual_Table.TV_FP
    -> HAVING Test_Value > 0;
Empty set (1.28 sec)

To find out why this happens let’s perform a temporary table check first.

mysql> create temporary table Virtual_Table SELECT t.f1 AS T_FP, tv.f1 AS TV_FP FROM t1 AS t JOIN t2 AS tv ON t.f1 = tv.t1_f1;
Query OK, 18722 rows affected (2.12 sec)
Records: 18722  Duplicates: 0  Warnings: 0
mysql> SELECT Virtual_Table.T_FP AS T_FP,
    -> (SELECT COUNT(Virtual_Table.T_FP) FROM t1 t
    -> WHERE t.f1 = Virtual_Table.T_FP AND Virtual_Table.T_FP = 731834939448428685) AS Test_Value
    -> FROM  Virtual_Table GROUP BY Virtual_Table.TV_FP HAVING Test_Value > 0;
+--------------------+------------+
| T_FP               | Test_Value |
+--------------------+------------+
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
+--------------------+------------+
22 rows in set (1.62 sec)

The rewritten query returned the correct result, as we expected.

To identify why the original query fails, let’s check how the optimizer rewrote it:

Note (Code 1003):
/* select#1 */
select
    `test`.`t`.`f1` AS `T_FP`,
    (/* select#2 */
        select
            count(`test`.`t`.`f1`)
        from
            `test`.`t1` `t`
        where
            (('731834939448428685' = 731834939448428685)
        and (`test`.`t`.`f1` = 731834939448428685))
    ) AS `Test_Value`
    from
        `test`.`t1` `t`
    join
        `test`.`t2` `tv`
    where
        (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`)
    group by `test`.`tv`.`f1`
    having (`Test_Value` > 0)

Interestingly, when I run this query on the original tables it returned all 18722 rows that exist in table

t2

 .

This output means that we cannot entirely rely on the 

EXPLAIN

  output. But still we can see the same symptoms:

  • Subquery uses a function to generate a value
  • Subquery in the
    FROM

      clause is converted into a 

    JOIN

    , and its values are accessible by an outer subquery

We also see that the query has

GROUP BY

  and

HAVING

  clauses, thus adding a complication.

The query is almost correct, but in this case, the optimizer mixed aliases: it uses the same alias in the internal query as in the external one. If you change the alias from

t

  to

t2

  in the subquery, the rewritten query starts returning correct results:

mysql> select
    ->     `test`.`t`.`f1` AS `T_FP`,
    -> (/* select#2 */
    ->     select
    ->         count(`test`.`t`.`f1`)
    ->     from
    ->         `test`.`t1` `t`
    ->     where (
    ->         ('731834939448428685' = 731834939448428685)
    ->     and
    ->         (`test`.`t`.`f1` = 731834939448428685)
    ->     )
    -> ) AS `Test_Value`
    -> from
    ->     `test`.`t1` `t`
    -> join
    ->     `test`.`t2` `tv`
    -> where
    ->     (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`)
    -> group by `test`.`tv`.`f1`
    -> having (`Test_Value` > 0);
...
| 731834939454553991 |          1 |
| 731834939453739998 |          1 |
+--------------------+------------+
18722 rows in set (0.49 sec)
mysql> select
    ->     `test`.`t`.`f1` AS `T_FP`,
    -> (/* select#2 */
    ->     select
    ->         count(`test`.`t`.`f1`)
    ->     from
    ->         `test`.`t1` `t2`
    ->     where (
    ->         (t2.f1=t.f1)
    ->     and
    ->         (`test`.`t`.`f1` = 731834939448428685)
    ->     )
    -> ) AS `Test_Value`
    -> from
    ->     `test`.`t1` `t`
    -> join
    ->     `test`.`t2` `tv`
    -> where
    ->     (`test`.`tv`.`t1_f1` = `test`.`t`.`f1`)
    -> group by `test`.`tv`.`f1`
    -> having (`Test_Value` > 0);
+--------------------+------------+
| T_FP               | Test_Value |
+--------------------+------------+
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
| 731834939448428685 |          1 |
+--------------------+------------+
22 rows in set (1.82 sec)

While the calculated value is not the reason why this query returns incorrect results, it is similar to the previous examples because the optimizer does not take in account that the value of

`test`.`t`.`f1`

  in the outer query is not necessarily equal to 731834939448428685.

Is also interesting that neither Oracle nor PostgreSQL accept such a query, and instead complain of improper use of the 

GROUP BY

 clause. Meanwhile, MySQL accepts this query even with SQL Mode set to

ONLY_FULL_GROUP_BY

 . Reported as bug #92020.

Conclusion and recommendations

While

derived_merge

  is a very effective optimization, it can rewrite queries destructively. Safety measures when using this optimization are:

  1. Make sure that you use the latest version of MySQL/Percona/MariaDB servers which include all of the new bug fixes.
  2. Generated values for the subquery results either constant or returned values of functions is the red flag.
  3. Relaxing SQL Mode
    ONLY_FULL_GROUP_BY

      is always dangerous and should not be used together with

    derived_merge

    .

As a last resort, you can consider rewriting queries to

JOIN

  manually or turning

derived_merge

  optimization

OFF

 .

 

The post Why Optimization derived_merge can Break Your Queries appeared first on Percona Database Performance Blog.

Sep
18
2017
--

Percona Live Europe Featured Talks: Debugging with Logs (and Other Events) Featuring Charity Majors

Percona Live Europe 2017

Percona Live EuropeWelcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Charity Majors, CEO/Cofounder of Honeycomb. Her talk is Debugging with Logs (and Other Events). Her presentation covers some of the lessons every engineer should know (and often learns the hard way): why good logging solutions are so expensive, why treating your logs as strings can be costly and dangerous, how logs can impact code efficiency and add/fix/change race conditions in your code. In our conversation, we discussed debugging your database environment:

Percona: How did you get into database technology? What do you love about it?

Charity: Oh dear, I don’t. I hate databases. Data is the scariest, hardest part of computing. The stakes are highest and the mistakes the most permanent. Data is where you can kill any company with the smallest number of errors. That’s why I always end up in charge of the databases – I just don’t trust anybody else with the power. (Also, I’m an adrenaline junkie who gets off on high stakes. I could gamble or I could do databases, and I know too much math to gamble.) Literally, nobody loves databases. If they tell you anything different, they are either lying to you or they’re nowhere near production.

I got into databases from operations. I’ve been on call since I was 17, over half my life. I am really stubborn, have an inflated sense of my own importance and like solving problems, so operations was a natural fit. I started diving on the databases grenades when I worked at Linden Lab and MySQL was repeatedly killing us. It seemed impossible, so I volunteered to own it. I’ve been doing that ever since.

Percona: You’re presenting a session called “Debugging with Logs (and Other Events)”. What is the importance of logs for databases and DBAs?

Charity: I mean, it’s not really about logs. I might change my title. It’s about understanding WTF is going on. Logs are one way of extracting events in a format that humans can understand. My startup is all about “what’s happening right now; what’s just happened?” Which is something we are pretty terrible at as an industry. Databases are just another big complex piece of software, and the only reason we have DBAs is because the tooling has historically been so bad that you had to specialize in this piece of software as your entire career.

The tooling is getting better. With the right tools, you don’t have to skulk around like a detective trying to model and predict what might be happening, as though it were a living organism. You can simply sum up the lock time being held, and show what actor is holding it. It’s extremely important that we move away from random samples and pre-aggregated metrics, toward dynamic sampling and rich events. That’s the only way you will ever truly understand what is happening under the hood in your database. That’s part of what my company was built to do.

Percona: How can logging be used in debugging to track down database issues? Can logging affect performance?

Charity: Of course logging can affect performance. For any high traffic website, you should really capture your logs (events) by streaming tcpdump over the wire. Most people know how to do only one thing with db logs: look for slow queries. But those slow queries can be actively misleading! A classic example is when somebody says “this query is getting slow” and they look at source control and the query hasn’t been modified in years. The query is getting slower either because the data volume is growing (or data shape is changing), or because reads can yield but writes can’t, and the write volume has grown to the point where reads are spending all their time waiting on the lock.

Yep, most db logs are terrible.

Percona: What do you want attendees to take away from your session? Why should they attend?

Charity: Lots of cynicism. Everything in computers is terrible, but especially so with data. Everything is a tradeoff, all you can hope to do is be aware of the tradeoffs you are making, and what costs you are incurring whenever you solve a given problem. Also, I hope people come away trembling at the thought of adding any more strings of logs to production. Structure your logs, people! Grep is not the answer to every single question! It’s 2017, nearly 2018, and unstructured logs do not belong anywhere near production.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Charity: My coauthor Laine and I are going to be signing copies of our book Database Reliability Engineering and giving a short keynote on the changes in our field. I love the db community, miss seeing Mark Callaghan and all my friends from the MongoDB and MySQL world, and cannot wait to laugh at them while they cry into their whiskey about locks or concurrency or other similar nonsense. Yay!

Want to find out more about Charity and database debugging? Register for Percona Live Europe 2017, and see her talk Debugging with Logs (and Other Events). Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

Dec
21
2016
--

Backtrace, a debugging startup led by former AppNexus engineers, raises $5M

Man coding on computer at night. Debugging startup Backtrace I/O was launched to solve a real problem that its founders faced when they were engineers at adtech company AppNexus — at least according to Backtrace CEO and co-founder Abel Mathew. Mathew told me Backtrace aims to “solve the process of debugging,” something that most companies tackle by “cobbling together very old, outdated solutions”… Read More

Jul
14
2015
--

MySQL QA Episode 6: Analyzing and Filtering

Welcome to MySQL QA Episode #6!

Today we will look into analyzing and filtering our QA run. We’ll use tools like pquery-prep-red.sh, pquery-clean-known.sh & pquery-results.sh

1. Quick re-cap and QA run setup
2. pquery-prep-red.sh
3. pquery-clean-known.sh
4. pquery-results.sh
5. Bonus: pquery reach – pquery-reach++.sh

We’ll also introduce the text_string.sh tool which extracts a most-specific text string from the error log in order to classify a bug/issue.

Full-screen viewing @ 720p resolution recommended

The post MySQL QA Episode 6: Analyzing and Filtering appeared first on MySQL Performance Blog.

Jul
12
2015
--

MySQL QA Episode 5: Preparing Your QA Run with pquery

Welcome to MySQL QA Episode #5! In this episode we’ll be setting up and running pquery for the first time… and I’ll also trigger some actual bugs (fun guaranteed)! I’ll also introduce you to mtr_to_sql.sh and pquery-run.sh.

pquery-run.sh (the main wrapper around pquery) is capable of generating 80-120 MySQL Server crashes – per hour! See how it all works in this episode…

Full-screen viewing @ 720p resolution recommended

The post MySQL QA Episode 5: Preparing Your QA Run with pquery appeared first on MySQL Performance Blog.

Jul
06
2015
--

MySQL QA Episode 3: How to use the debugging tool GDB

Welcome to MySQL QA Episode 3: “Debugging: GDB, Backtraces, Frames and Library Dependencies”

In this episode you’ll learn how to use debugging tool GDB. The following debugging topics are covered:

1. GDB Introduction
2. Backtrace, Stack trace
3. Frames
4. Commands & Logging
5. Variables
6. Library dependencies
7. c++filt
8. Handy references
– GDB Cheat sheet (page #2): https://goo.gl/rrmB9i
– From Crash to testcase: https://goo.gl/3aSvVW

Also expands on live debugging & more. In HD quality (set your player to 720p!)

The post MySQL QA Episode 3: How to use the debugging tool GDB appeared first on MySQL Performance Blog.

Jul
03
2009
0

Well-Intentioned Destruction

A friend at work forwarded me this article from 2007 titled Well-Intentioned Destruction.  Its a great article about maintaining legacy code and debugging a serious problem with seamingly random data deletes.

Written by in: Development,PHP,Web Development | Tags: ,

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com