Mar
29
2017
--

Aiden closes $750,000 seed round in its quest to amplify marketers

 Aiden, a London-based startup building a machine learning-powered personal assistant to save mobile marketers time and money, closed a $750,000 seed round today from Kima Ventures and a number of angels, including Nicolas Pinto, Pierre Valade and Jonathan Wolf. The team first demoed the capabilities of its service on the stage of TechCrunch Disrupt as a Battlefield finalist. Read More

Mar
29
2017
--

GE invests $2 million in Alchemist Accelerator to back industrial IoT startups

 Alchemist Accelerator has raised $2 million from GE Digital to start a new program for industrial IoT startups. Stanford lecturer Timothy Chou, formerly President of Oracle On Demand, will chair Alchemist’s new IIoT accelerator along with GE Digital’s West Coast group. In the past, enterprise hardware and software startups were seen as capital intensive, with the challenge of… Read More

Mar
28
2017
--

Cloud Foundry launches its developer certification program

a foundry for clouds Cloud Foundry, a massive open source project that allows enterprises to host their own platform-as-a-service for running cloud applications in their own data center or in a public cloud, today announced the launch of its “Cloud Foundry Certified Developer” program. The Cloud Foundry Foundation calls this “the world’s largest cloud-native developer certification… Read More

Mar
28
2017
--

Industrious buys PivotDesk, raises $25M to be WeWork without startup bros

 In the recruiting wars, a cool office is critical. Not just for scrappy startups, but big businesses with regional HQs, as well. So while WeWork signs questionable 20-year leases to provide desks for twenty-something engineers, Industrious is taking a more classy and conservative approach to coworking space. Read More

Mar
28
2017
--

Troubleshooting Issues with MySQL Character Sets Q & A

MySQL Character Sets

MySQL Character SetsIn this blog, I will provide answers to the Q & A for the Troubleshooting Issues with MySQL Character Sets webinar.

First, I want to thank everybody for attending the March 9 MySQL character sets troubleshooting webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: We’ve had some issues converting tables from

utf8

  to

utf8mb4

. Our issue was that the collation we wanted to use –

utf8mb4_unicode_520_ci

 – did not distinguish between spaces and ideographic (Japanese) spaces, so we were getting unique constraint violations for the 

varchar

 fields when two entries had the same text with different kinds of spaces. Have you seen this problem and is there a workaround? We were wondering if this was related to the mother-child character bug with this collation.

A: Unfortunately this issue exists for many languages. For example, in Russian you cannot distinguish “?” and “?” if you use

utf8

 or

utf8mb4

. However, there is hope for Japanese: Oracle announced that they will implement new language-specific

utf8mb4

 collations in MySQL 8.0. I already see 21 new collations in my 8.0.0 installation.

mysql> show collation like '%0900%';
+----------------------------+---------+-----+---------+----------+---------+
| Collation                  | Charset | Id  | Default | Compiled | Sortlen |
+----------------------------+---------+-----+---------+----------+---------+
| utf8mb4_0900_ai_ci         | utf8mb4 | 255 |         | Yes      |       8 |
| utf8mb4_cs_0900_ai_ci      | utf8mb4 | 266 |         | Yes      |       8 |
| utf8mb4_da_0900_ai_ci      | utf8mb4 | 267 |         | Yes      |       8 |
| utf8mb4_de_pb_0900_ai_ci   | utf8mb4 | 256 |         | Yes      |       8 |
| utf8mb4_eo_0900_ai_ci      | utf8mb4 | 273 |         | Yes      |       8 |
| utf8mb4_es_0900_ai_ci      | utf8mb4 | 263 |         | Yes      |       8 |
| utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 |         | Yes      |       8 |
| utf8mb4_et_0900_ai_ci      | utf8mb4 | 262 |         | Yes      |       8 |
| utf8mb4_hr_0900_ai_ci      | utf8mb4 | 275 |         | Yes      |       8 |
| utf8mb4_hu_0900_ai_ci      | utf8mb4 | 274 |         | Yes      |       8 |
| utf8mb4_is_0900_ai_ci      | utf8mb4 | 257 |         | Yes      |       8 |
| utf8mb4_la_0900_ai_ci      | utf8mb4 | 271 |         | Yes      |       8 |
| utf8mb4_lt_0900_ai_ci      | utf8mb4 | 268 |         | Yes      |       8 |
| utf8mb4_lv_0900_ai_ci      | utf8mb4 | 258 |         | Yes      |       8 |
| utf8mb4_pl_0900_ai_ci      | utf8mb4 | 261 |         | Yes      |       8 |
| utf8mb4_ro_0900_ai_ci      | utf8mb4 | 259 |         | Yes      |       8 |
| utf8mb4_sk_0900_ai_ci      | utf8mb4 | 269 |         | Yes      |       8 |
| utf8mb4_sl_0900_ai_ci      | utf8mb4 | 260 |         | Yes      |       8 |
| utf8mb4_sv_0900_ai_ci      | utf8mb4 | 264 |         | Yes      |       8 |
| utf8mb4_tr_0900_ai_ci      | utf8mb4 | 265 |         | Yes      |       8 |
| utf8mb4_vi_0900_ai_ci      | utf8mb4 | 277 |         | Yes      |       8 |
+----------------------------+---------+-----+---------+----------+---------+
21 rows in set (0,03 sec)

In 8.0.1 they promised new case-sensitive and Japanese collations. Please see this blog post for details. The note about the planned Japanese support is at the end.

Meanwhile, I can only suggest that you implement your own collation as described here. You may use

utf8_russian_ci

 collation from Bug #51976 as an example.

Although the user manual does not list

utf8mb4

 as a character set for which it’s possible to create new collations, you can actually do it. What you need to do is add a record about the character set

utf8mb4

 and the new collation into

Index.xml

, then restart the server.

<charset name="utf8mb4">
<collation name="utf8mb4_russian_ci" id="1033">
 <rules>
    <reset>u0415</reset><p>u0451</p><t>u0401</t>
  </rules>
</collaiton>
</charset>
mysql> show collation like 'utf8mb4_russian_ci';
+--------------------+---------+------+---------+----------+---------+
| Collation          | Charset | Id   | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_russian_ci | utf8mb4 | 1033 |         |          |       8 |
+--------------------+---------+------+---------+----------+---------+
1 row in set (0,03 sec)
mysql> create table test_yo(gen varchar(100) CHARACTER SET utf8mb4, yo varchar(100) CHARACTER SET utf8mb4 collate utf8mb4_russian_ci) engine=innodb default character set=utf8mb4;
Query OK, 0 rows affected (0,25 sec)
mysql> set names utf8mb4;
Query OK, 0 rows affected (0,02 sec)
mysql> insert into test_yo values('??', '??'), ('???', '???'), ('????', '????');
Query OK, 3 rows affected (0,05 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> insert into test_yo values('??', '??'), ('???', '???'), ('????', '????');
Query OK, 3 rows affected (0,06 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> select * from test_yo order by gen;
+----------+----------+
| gen      | yo       |
+----------+----------+
| ??       | ??       |
| ??       | ??       |
| ????     | ????     |
| ????     | ????     |
| ???      | ???      |
| ???      | ???      |
+----------+----------+
6 rows in set (0,00 sec)
mysql> select * from test_yo order by yo;
+----------+----------+
| gen      | yo       |
+----------+----------+
| ??       | ??       |
| ??       | ??       |
| ???      | ???      |
| ???      | ???      |
| ????     | ????     |
| ????     | ????     |
+----------+----------+
6 rows in set (0,00 sec)

Q: If receiving

utf8

 on

latin1

 charset it will be corrupted. Just want to confirm that you can reformat as

utf8

 and un-corrupt the data? Also, is there a time limit on how quickly this needs to be done?

A: It will be corrupted only if you store

utf8

 data in the 

latin1

 column. For example, if you have a table, defined as:

create table latin1(
  f1 varchar(100)
) engine=innodb default charset=latin1;

And then insert a word in

utf8

 format into it that contains characters that are not in the 

latin1

 character set:

mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> set sql_mode='';
Query OK, 0 rows affected, 1 warning (0,00 sec)
mysql> insert into latin1 values('Sveta'), ('?????');
Query OK, 2 rows affected, 1 warning (0,04 sec)
Records: 2  Duplicates: 0  Warnings: 1

The data in

UTF8

 will be corrupted and can never be recovered:

mysql> select * from latin1;
+-------+
| f1    |
+-------+
| Sveta |
| ????? |
+-------+
2 rows in set (0,00 sec)
mysql> select f1, hex(f1) from latin1;
+-------+------------+
| f1    | hex(f1)    |
+-------+------------+
| Sveta | 5376657461 |
| ????? | 3F3F3F3F3F |
+-------+------------+
2 rows in set (0,01 sec)

However, if your data is stored in the 

UTF8

 column and you use

latin1

 for a connection, you will only get a corrupted result set. The data itself will be left untouched:

mysql> create table utf8(f1 varchar(100)) engine=innodb character set utf8;
Query OK, 0 rows affected (0,18 sec)
mysql> insert into utf8 values('Sveta'), ('?????');
Query OK, 2 rows affected (0,15 sec)
Records: 2  Duplicates: 0  Warnings: 0
mysql> set names latin1;
Query OK, 0 rows affected (0,00 sec)
mysql> select f1, hex(f1) from utf8;
+-------+----------------------+
| f1    | hex(f1)              |
+-------+----------------------+
| Sveta | 5376657461           |
| ????? | D0A1D0B2D0B5D182D0B0 |
+-------+----------------------+
2 rows in set (0,00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> select f1, hex(f1) from utf8;
+------------+----------------------+
| f1         | hex(f1)              |
+------------+----------------------+
| Sveta      | 5376657461           |
| ?????      | D0A1D0B2D0B5D182D0B0 |
+------------+----------------------+
2 rows in set (0,00 sec)

Q: Can you discuss how charsets affect mysqldump? Specifically, how do we dump a database containing tables with different default charsets?

A: Yes, you can. MySQL can successfully convert data that uses different character sets, so your only job is to specify option

--default-character-set

 for

mysqldump

. In this case, strings in any character set you use can be converted to the character set specified. For example, if you use

cp1251

 and

latin1

, you may set option

--default-character-set

 to

cp1251

,

utf8

 and 

utf8mb4

. However, you cannot set it to

latin1

 because Cyrillic characters exist in the 

cp1251

 character set, but do not exist in

latin1

.

The default value for

mysqldump

 is

utf8

. You only need to change this default if you use values that are outside of the range supported by 

utf8

 (for example, the smileys in

utf8mb4

).

Q: But if you use the 

--single-transaction

 option for

mysqldump

, you can only specify one character set in the default?

A: Yes, and this is OK: all data will be converted into this character set. And then, when you will restore the dump, it will be converted back to the character set specified in column definitions.

Q: I noticed that MySQL doesn’t support case-sensitive

UTF-8

 character sets. What do you recommend for implementing case-sensitive

UTF-8

, if it’s at all possible?

A: In the link I provided earlier, Oracle promises to implement case-sensitive collations for

utf8mb4

 in version 8.0.1. Before that happens, I recommend you to implement your own case-sensitive collation.

Q: How are tools like

pt-table-checksum

 affected by charsets? Is it safe to use a 4-byte charset (like

utf8mb4

) as the default charset for all comparisons? Assuming our tables are a mix of

latin1

 ,

utf8

 and

utf8mb4

.

A: With this combination, you won’t have any issues:

pt-table-checksum

 uses a complicated set of functions that joins columns and calculates a 

crc32

 checksum on them. In your case, all data will be converted to

utf8mb4

 and no conflicts will happen.

However, if you use incompatible character sets in a single table, you may get the error

"Illegal mix of collations for operation 'concat_ws' "

:

mysql> create table cp1251(f1 varchar(100) character set latin1, f2 varchar(100) character set cp1251) engine=innodb;
Query OK, 0 rows affected (0,32 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)
mysql> insert into cp1251 values('Sveta', '?????');
Query OK, 1 row affected (0,07 sec)
sveta@Thinkie:~/build/mysql-8.0/mysql-test$ ~/build/percona-toolkit/bin/pt-table-checksum h=127.0.0.1,P=13000,u=root,D=test
Diffs cannot be detected because no slaves were found.  Please read the --recursion-method documentation for information.
03-18T03:51:58 Error executing EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/"] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 11351.
03-18T03:51:58 Error checksumming table db1.cp1251: Error executing checksum query: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*checksum table*/" with ParamValues: 0='db1', 1='cp1251', 2=1, 3=undef, 4=undef, 5=undef] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 10741.
TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-18T03:51:58      2      0        0       1       0   0.003 db1.cp1251
03-18T03:51:58      0      0        2       1       0   0.167 db1.latin1
03-18T03:51:58      0      0        6       1       0   0.198 db1.test_yo
...

The tool continues working, and will process the rest of your tables. I reported this behavior as Bug #1674266.

Thanks for attending the Troubleshooting Issues with MySQL Character Sets webinar.

Mar
28
2017
--

AWS launches Amazon Connect, productizes Amazon’s in-house contact center software

 AWS continues to add yet more software and services to build out its revenues and touchpoints with businesses that already use its cloud infrastructure for storage and to host and administer services and apps. The latest product, launching today, is Amazon Connect, a cloud-based contact center solution. AWS said it is based on the same tech that Amazon itself has built and uses in-house… Read More

Mar
27
2017
--

What’s Next for SQL Databases?

SQL Databases

SQL DatabasesIn this blog, I’ll go over my thoughts on what we can expect in the world of SQL databases.

After reading Baron’s prediction on databases, here:

https://www.xaprb.com/blog/defining-moments-in-database-history/

I want to provide my own view on what’s coming up next for SQL databases. I think we live in interesting times, when we can see the beginning of the next-generation of RDBMSs.

There are defining characteristics of such databases:

  1. Auto-scaling. The ability to add and use resources depending on the current load and database size. This is done transparently for users and DBAs.
  2. Auto-healing. The automatic handling of node failures.
  3. Multi-regional, cloud-agnostic, geo-distributed. The ability to support multiple data centers and multiple clouds, in different parts of the world.
  4. Transactional. All the above, with the ability to support multi-statements transactional workloads.
  5. Strong consistency. The full definition of strong consistency is pretty involved. For simplicity, let’s say it means that reads (in the absence of ongoing writes) will return the same data, despite what region or data center you are getting it from. A simple counter-example is the famous MySQL asynchronous replication, where (with the slave delay) reading the data on a slave can return very outdated data. I am focusing on reads, because in a distributed environment the consistent reads performance will be affected. This is where network latency (often limited by the speed of light) will define performance.
  6. SQL language. SQL, despite being old and widely criticized, is not going anywhere. This is a universal language for app developers to access data.

With this, I see following interesting projects:

  • Google Cloud Spanner (https://cloud.google.com/spanner/). Recently announced and still in the Beta stage. Definitely an interesting projects, with the obvious limitation of running only in Google Cloud.
  • FaunaDB (https://fauna.com/). Also very recently announced, so it is hard to say how it performs. The major downside I see is that it does not provide SQL access, but uses a custom language.
  • Two open source projects:
    • CockroachDB (https://www.cockroachlabs.com/). This is still in the Beta stage, but definitely an interesting project to follow. Initially, the project planned to support only key-value access, but later they made a very smart decision to provide SQL access via a PostgreSQL-compatible protocol.
    • TiDB (https://github.com/pingcap/tidb). Right now in RC stages, and the target is to provide SQL access over a MySQL compatible protocol (and later PostgreSQL protocol).

Protocol compatibility is a wise approach, although not strictly necessary. It lowers an entry barrier for the existing applications.

Both CockroachDB and TiDB, at the moment of this writing, still have rough edges and can’t be used in serious deployments (from my experience). I expect both projects will make a big progress in 2017.

What shared characteristics can we expect from these systems?

As I mentioned above, we may see that the read performance is degraded (as latency increases), and often it will be defined more by network performance than anything else. Storage IO and CPU cycles will be secondary factors. There will be more work on how to understand and tune the network traffic.

We may need to get used to the fact that point or small range selects become much slower. Right now, we see very fast point selects for traditional RDBM (MySQL, PostgreSQL, etc.).

Heavy writes will be problematic. The problem is that all writes will need to go through the consistency protocol. Write-optimized storage engines will help (both CockroachDB and TiDB use RocksDB in the storage layer).

The long transactions (let’s say changing 100000 or more rows) also will be problematic. There is just too much network round-trips and housekeeping work on each node, making long transactions an issue for distributed systems.

Another shared property (at least between CockroachDB and TiDB) is the active use of the Raft protocol to achieve consistency. So it will be important to understand how this protocol works to use it effectively. You can find a good overview of the Raft protocol here: http://container-solutions.com/raft-explained-part-1-the-consenus-problem/.

There probably are more NewSQL technologies than I have mentioned here, but I do not think any of them captured critical market- or mind-share. So we are at the beginning of interesting times . . .

What about MySQL? Can MySQL become the database that provides all these characteristics? It is possible, but I do not think it will happen anytime soon. MySQL would need to provide automatic sharding to do this, which will be very hard to implement given the current internal design. It may happen in the future, though it will require a lot of engineering efforts to make it work properly.

Mar
27
2017
--

Airobotics scores authorization to fly autonomous drones in Israel

 A startup based in Petah Tikva, Israel, Airobotics, has scored the right to fly drones autonomously for business purposes in Israel. The Civil Aviation Authority of Israel (CAAI) was the first in the world to authorize commercial, fully unmanned drone flights in their nation’s airspace. Read More

Mar
27
2017
--

Percona Toolkit 3.0.2 is now available

Percona Server for MongoDB

Percona ToolkitPercona announces the availability of Percona Toolkit 3.0.2 on March 27, 2017.

Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release includes the following changes:

New Features
  • PT-73: Added support for SSL connections to pt-mongodb-summary and pt-mongodb-query-digest
  • 1642751: Enabled gathering of information about locks and transactions by pt-stalk using Performance Schema if it is enabled (Thanks, Agustin Gallego)
Bug Fixes
  • PT-74: Fixed gathering of security settings when running pt-mongodb-summary on a mongod instance that is specified as the host
  • PT-75: Changed the default sort order in pt-mongodb-query-digest output to descending
  • PT-76: Added support of & and # symbols in passwords for pt-mysql-summary
  • PT-77: Updated Makefile to support new MongoDB tools
  • PT-89: Fixed pt-stalk to run top more than once to collect useful CPU usage
  • PT-93: Fixed pt-mongodb-query-digest to make query ID match query key (Thanks, Kamil Dziedzic)
  • PT-94: Fixed pt-online-schema-change to not make duplicate rows in _t_new when updating the primary key. Also, see 1646713.
  • PT-101: Fixed pt-table-checksum to correctly use the –slave-user and –slave-password options. Also, see 1651002.
  • PT-105: Fixed pt-table-checksum to continue running if a database is dropped in the process

You can find release details in the release notes. Bugs can be reported on Toolkit’s launchpad bug tracker.

Mar
25
2017
--

Eligible founder Katelyn Gleason’s plan to upend the billion dollar medical billing industry

 Medical billing is a largely untapped and lucrative industry, potentially pulling in $55 billion globally by 2020. But it’s inner workings are still very murky — most of the time it’s not clear how much something will cost and sometimes you don’t even get the (possibly whopping) bill until months down the road. Founder Katelyn Gleason wants to make it easier… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com