Dec
07
2018
--

IBM selling Lotus Notes/Domino business to HCL for $1.8B

IBM announced last night that it is selling the final components from its 1995 acquisition of Lotus to Indian firm HCL for $1.8 billion.

IBM paid $3.5 billion for Lotus back in the day. The big pieces here are Lotus Notes, Domino and Portal. These were a big part of IBM’s enterprise business for a long time, but last year Big Blue began to pull away, selling the development part to HCL, while maintaining control of sales and marketing.

This announcement marks the end of the line for IBM involvement. With the development of the platform out of its control, and in need of cash after spending $34 billion for Red Hat, perhaps IBM simply decided it no longer made sense to keep any part of this in-house.

As for HCL, it sees an opportunity to continue to build the Notes/Domino business, and it’s seizing it with this purchase. “The large-scale deployments of these products provide us with a great opportunity to reach and serve thousands of global enterprises across a wide range of industries and markets,” C Vijayakumar, president and CEO at HCL Technologies, said in a statement announcing the deal.

Alan Lepofsky, an analyst at Constellation Research who keeps close watch on the enterprise collaboration space, says the sale could represent a fresh start for software that IBM hasn’t really been paying close attention to for some time. “HCL is far more interested in Notes/Domino than IBM has been for a decade. They are investing heavily, trying to rejuvenate the brand,” Lepofsky told TechCrunch.

While this software may feel long in the tooth, Notes and Domino are still in use in many corners of the enterprise, and this is especially true in EMEA (Europe, Middle East and Africa) and AP (Asia Pacific), Lepofsky said.

He added that IBM appears to be completely exiting the collaboration space with this sale. “It appears that IBM is done with collaboration, out of the game,” he said.

This move makes sense for IBM, which is moving in a different direction as it develops its cloud business. The Red Hat acquisition in October, in particular, shows that the company wants to embrace private and hybrid cloud deployments, and older software like Lotus Notes and Domino don’t really play a role in that world.

The deal, which is subject to regulatory approval processes, is expected to close in the middle of next year.

Dec
07
2018
--

MySQL 8 and The FRM Drop… How To Recover Table DDL

MySQL 8 frm drop recover ddl

… or what I should keep in mind in case of disaster

MySQL 8 frm drop recover ddl

To retrieve and maintain in SQL format the definition of all tables in a database, is a best practice that we all should adopt. To have that under version control is also another best practice to keep in mind.

While doing that may seem redundant, it can become a life saver in several situations. From the need to review what has historically changed in a table, to knowing who changed what and why… to when you need to recover your data and have your beloved MySQL instance not start…

But let’s be honest, only a few do the right thing, and even fewer keep that information up to date. Given that’s the case, what can we do when we have the need to discover/recover the table structure?

From the beginning, MySQL has used some external files to describe its internal structure.

For instance, if I have a schema named windmills and a table named wmillAUTOINC1, on the file system I will see this:

-rw-r-----. 1 mysql mysql     8838 Mar 14 2018 wmillAUTOINC1.frm
-rw-r-----. 1 mysql mysql   131072 Mar 14 2018 wmillAUTOINC1.ibd

The ibd file contains the data, while the frm file contains the structure information.

Putting aside ANY discussion about if this is safe, if it’s transactional and more… when we’ve experienced some major crash and data corruption this approach has been helpful. Being able to read from the frm file was the easiest way to get the information we need.
Simple tools like DBSake made the task quite trivial, and allowed us to script table definition when needed to run long, complex tedious data recovery:

[root@master1 windmills]# /opt/tools/dbsake frmdump wmillAUTOINC1.frm
--
-- Table structure for table `wmillAUTOINC1`
-- Created with MySQL Version 5.7.20
--
CREATE TABLE `wmillAUTOINC1` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `uuid` char(36) COLLATE utf8_bin NOT NULL,
  `millid` smallint(6) NOT NULL,
  `kwatts_s` int(11) NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) COLLATE utf8_bin NOT NULL,
  `active` tinyint(2) NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC;

Of course, if the frm file was also corrupt, then we could try to get the information from the ibdata dictionary. If that is corrupted too (trust me I’ve seen all of these situations) … well a last resource was hoping the customer has a recent table definition stored somewhere, but as mentioned before, we are not so diligent, are we?

Now, though, in MySQL8 we do not have FRM files, they were dropped. Even more interesting is that we do not have the same dictionary, most of the things that we knew have changed, including the dictionary location. So what can be done?

Well Oracle have moved the FRM information—and more—to what is called Serialized Dictionary Information (SDI), the SDI is written INSIDE the ibd file, and represents the redundant copy of the information contained in the data dictionary.

The SDI is updated/modified by DDL operations on tables that reside in that tablespace. This is it: if you have one file per table normally, then you will have in that file ONLY the SDI for that table, but if you have multiple tables in a tablespace, the SDI information will refer to ALL of the tables.

To extract this information from the IBD files, Oracle provides a utility called ibd2sdi. This application parses the SDI information and reports a JSON file that can be easily manipulated to extract and build the table definition.

One exception is represented by Partitioned tables. The SDI information is contained ONLY in the first partition, and if you drop it, it is moved to the next one. I will show that later.

But let’s see how it works. In the next examples I will look for the table’s name, attributes, and datatype starting from the dictionary tables.

To obtain the info I will do this:

/opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/mysql.ibd |jq  '.[]?|.[]?|.dd_object?|("------------------------------------"?,"TABLE NAME = ",.name?,"****",(.columns?|.[]?|(.name?,.column_type_utf8?)))'

The result will be something like:

"------------------------------------"
"TABLE NAME = "
"tables"
"****"
"id"
"bigint(20) unsigned"
"schema_id"
"bigint(20) unsigned"
"name"
"varchar(64)"
"type"
"enum('BASE TABLE','VIEW','SYSTEM VIEW')"
"engine"
"varchar(64)"
"mysql_version_id"
"int(10) unsigned"
"row_format"
"enum('Fixed','Dynamic','Compressed','Redundant','Compact','Paged')"
"collation_id"
"bigint(20) unsigned"
"comment"
"varchar(2048)"
<snip>
"------------------------------------"
"TABLE NAME = "
"tablespaces"
"****"
"id"
"bigint(20) unsigned"
"name"
"varchar(259)"
"options"
"mediumtext"
"se_private_data"
"mediumtext"
"comment"
"varchar(2048)"
"engine"
"varchar(64)"
"DB_TRX_ID"
""
"DB_ROLL_PTR"
""

I cut the output for brevity, but if you run the above command yourself you’ll be able to see that this retrieves the information for ALL the tables residing in the IBD.

The other thing I hope you noticed is that I am NOT parsing ibdata, but mysql.ibd. Why? Because the dictionary was moved out from ibdata and is now in mysql.ibd.

Look what happens if I try to parse ibdata:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/ibdata1 |jq '.'
[INFO] ibd2sdi: SDI is empty.

Be very careful here to not mess up your mysql.ibd file.

Now what can I do to get information about my wmillAUTOINC1 table in MySQL8?

That is quite simple:

/opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINC.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1068,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINC",
        "mysql_version_id": 80011,
        "created": 20180925095853,
        "last_altered": 20180925095853,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
          {
            "name": "id",
            "type": 9,
            "is_nullable": false,
            "is_zerofill": false,
            "is_unsigned": false,
            "is_auto_increment": true,
            "is_virtual": false,
            "hidden": 1,
            "ordinal_position": 1,
            "char_length": 11,
            "numeric_precision": 19,
            "numeric_scale": 0,
            "numeric_scale_null": false,
            "datetime_precision": 0,
            "datetime_precision_null": 1,
            "has_no_default": false,
            "default_value_null": false,
            "srs_id_null": true,
            "srs_id": 0,
            "default_value": "AAAAAAAAAAA=",
            "default_value_utf8_null": true,
            "default_value_utf8": "",
            "default_option": "",
            "update_option": "",
            "comment": "",
            "generation_expression": "",
            "generation_expression_utf8": "",
            "options": "interval_count=0;",
            "se_private_data": "table_id=1838;",
            "column_key": 2,
            "column_type_utf8": "bigint(11)",
            "elements": [],
            "collation_id": 83,
            "is_explicit_collation": false
          },
<SNIP>
        "indexes": [
          {
            "name": "PRIMARY",
            "hidden": false,
            "is_generated": false,
            "ordinal_position": 1,
            "comment": "",
            "options": "flags=0;",
            "se_private_data": "id=2261;root=4;space_id=775;table_id=1838;trx_id=6585972;",
            "type": 1,
            "algorithm": 2,
            "is_algorithm_explicit": false,
            "is_visible": true,
            "engine": "InnoDB",
<Snip>
        ],
        "foreign_keys": [],
        "partitions": [],
        "collation_id": 83
      }
    }
  },
  {
    "type": 2,
    "id": 780,
    "object": {
      "mysqld_version_id": 80011,
      "dd_version": 80011,
      "sdi_version": 1,
      "dd_object_type": "Tablespace",
      "dd_object": {
        "name": "windmills/wmillAUTOINC",
        "comment": "",
        "options": "",
        "se_private_data": "flags=16417;id=775;server_version=80011;space_version=1;",
        "engine": "InnoDB",
        "files": [
          {
            "ordinal_position": 1,
            "filename": "./windmills/wmillAUTOINC.ibd",
            "se_private_data": "id=775;"
          }
        ]
      }
    }
  }
]

The JSON will contains:

  • A section describing the DB object at high level
  • Array of columns and related information
  • Array of indexes
  • Partition information (not here but in the next example)
  • Table space information

That is a lot more detail compared to what we had in the FRM, and it is quite relevant and interesting information as well.

Once you have extracted the SDI, any JSON parser tool script can generate the information for the SQL DDL.

I mention partitions, so let’s look at this a bit more, given they can be tricky.

As mentioned, the SDI information is present ONLY in the first partition. All other partitions hold ONLY the tablespace information. Given that, then the first thing to do is to identify which partition is the first… OR simply try to access all partitions, and when you are able to get the details, extract them.

The process is the same:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170301.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1460,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINCPART",
        "mysql_version_id": 80013,
        "created": 20181125110300,
        "last_altered": 20181125110300,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [<snip>
    	  "schema_ref": "windmills",
        "se_private_id": 18446744073709552000,
        "engine": "InnoDB",
        "last_checked_for_upgrade_version_id": 80013,
        "comment": "",
        "se_private_data": "autoinc=31080;version=2;",
        "row_format": 2,
        "partition_type": 7,
        "partition_expression": "to_days(`date`)",
        "partition_expression_utf8": "to_days(`date`)",
        "default_partitioning": 1,
        "subpartition_type": 0,
        "subpartition_expression": "",
        "subpartition_expression_utf8": "",
        "default_subpartitioning": 0,
       ],
<snip>
        "foreign_keys": [],
        "partitions": [
          {
            "name": "PT20170301",
            "parent_partition_id": 18446744073709552000,
            "number": 0,
            "se_private_id": 1847,
            "description_utf8": "736754",
            "engine": "InnoDB",
            "comment": "",
            "options": "",
            "se_private_data": "autoinc=0;version=0;",
            "values": [
              {
                "max_value": false,
                "null_value": false,
                "list_num": 0,
                "column_num": 0,
                "value_utf8": "736754"
              }
            ],

The difference, as you can see, is that the section related to partitions and sub partitions will be filled with all the details you might need to recreate the partitions.

We will have:

  • Partition type
  • Partition expression
  • Partition values
  • …more

Same for sub partitions.

Now again see what happens if I parse the second partition:

[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd |jq '.'
[
  "ibd2sdi",
  {
    "type": 2,
    "id": 790,
    "object": {
      "mysqld_version_id": 80011,
      "dd_version": 80011,
      "sdi_version": 1,
      "dd_object_type": "Tablespace",
      "dd_object": {
        "name": "windmills/wmillAUTOINCPART#P#PT20170401",
        "comment": "",
        "options": "",
        "se_private_data": "flags=16417;id=785;server_version=80011;space_version=1;",
        "engine": "InnoDB",
        "files": [
          {
            "ordinal_position": 1,
            "filename": "./windmills/wmillAUTOINCPART#P#PT20170401.ibd",
            "se_private_data": "id=785;"
          }
        ]
      }
    }
  }
]

I will get only the information about the tablespace, not the table.

As promised let me show you now what happens if I delete the first partition, and the second partition becomes the first:

(root@localhost) [windmills]>alter table wmillAUTOINCPART drop partition PT20170301;
Query OK, 0 rows affected (1.84 sec)
Records: 0  Duplicates: 0  Warnings: 0
[root@master1 ~]# /opt/mysql_templates/mysql-8P/bin/./ibd2sdi   /opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd |jq '.'|more
[
  "ibd2sdi",
  {
    "type": 1,
    "id": 1461,
    "object": {
      "mysqld_version_id": 80013,
      "dd_version": 80013,
      "sdi_version": 1,
      "dd_object_type": "Table",
      "dd_object": {
        "name": "wmillAUTOINCPART",
        "mysql_version_id": 80013,
        "created": 20181129130834,
        "last_altered": 20181129130834,
        "hidden": 1,
        "options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=1;row_type=2;stats_auto_recalc=0;stats_sample_pages=0;",
        "columns": [
          {
            "name": "id",
            "type": 9,
            "is_nullable": false,
            "is_zerofill": false,
            "is_unsigned": false,
            "is_auto_increment": true,
            "is_virtual": false,
            "hidden": 1,
            "ordinal_position": 1,

As I mentioned before, each DDL updates the SDI, and here we go: I will have all the information on what’s NOW the FIRST partition. Please note the value of the attribute “created” between the first time I queried the other partition, and the one that I have now:

/opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170301.ibd
       "created": 20181125110300,
/opt/mysql_instances/master8/data/windmills/wmillAUTOINCPART#P#PT20170401.ibd
       "created": 20181129130834,

To be clear the second created is NOW (PT20170401) from when I dropped the other partition (PT20170301).

Conclusions

In the end, this solution is definitely more powerful than the FRM files. It will allow us to parse the file and identify the table definition more easily, providing us with much more detail and information.

The problems will arise if and when the IBD file becomes corrupt.

As for the manual:  For InnoDB, an SDI record requires a single index page, which is 16KB in size by default. However, SDI data is compressed to reduce the storage footprint.

By which it means that for each table I have a page, if I associate record=table. Which means that in case of IBD corruption I should (likely) be able to read those pages. Unless I have bad (very bad) luck.

I still wonder how the dimension of an IBD affects the SDI retrieval, but given I have not tried it yet I will have to let you know.

As an aside, I am working on a script to facilitate the generation of the SQL, it’s not yet ready but you can find it here

Last note but keep this in mind! It is stated in the manual but in a hidden place and in small letters:
DDL operations take longer due to writing to storage, undo logs, and redo logs instead of .frm files.

References

https://stedolan.github.io/jq/

https://dev.mysql.com/doc/refman/8.0/en/ibd2sdi.html

https://dev.mysql.com/doc/refman/8.0/en/serialized-dictionary-information.html

https://dev.mysql.com/doc/refman/8.0/en/data-dictionary-limitations.html


Photo by chuttersnap on Unsplash

Dec
06
2018
--

Contentful raises $33.5M for its headless CMS platform

Contentful, a Berlin- and San Francisco-based startup that provides content management infrastructure for companies like Spotify, Nike, Lyft and others, today announced that it has raised a $33.5 million Series D funding round led by Sapphire Ventures, with participation from OMERS Ventures and Salesforce Ventures, as well as existing investors General Catalyst, Benchmark, Balderton Capital and Hercules. In total, the company has now raised $78.3 million.

It’s been less than a year since the company raised its Series C round and, as Contentful co-founder and CEO Sascha Konietzke told me, the company didn’t really need to raise right now. “We had just raised our last round about a year ago. We still had plenty of cash in our bank account and we didn’t need to raise as of now,” said Konietzke. “But we saw a lot of economic uncertainty, so we thought it might be a good moment in time to recharge. And at the same time, we already had some interesting conversations ongoing with Sapphire [formerly SAP Ventures] and Salesforce. So we saw the opportunity to add more funding and also start getting into a tight relationship with both of these players.”

The original plan for Contentful was to focus almost explicitly on mobile. As it turns out, though, the company’s customers also wanted to use the service to handle its web-based applications and these days, Contentful happily supports both. “What we’re seeing is that everything is becoming an application,” he told me. “We started with native mobile application, but even the websites nowadays are often an application.”

In its early days, Contentful focused only on developers. Now, however, that’s changing, and having these connections to large enterprise players like SAP and Salesforce surely isn’t going to hurt the company as it looks to bring on larger enterprise accounts.

Currently, the company’s focus is very much on Europe and North America, which account for about 80 percent of its customers. For now, Contentful plans to continue to focus on these regions, though it obviously supports customers anywhere in the world.

Contentful only exists as a hosted platform. As of now, the company doesn’t have any plans for offering a self-hosted version, though Konietzke noted that he does occasionally get requests for this.

What the company is planning to do in the near future, though, is to enable more integrations with existing enterprise tools. “Customers are asking for deeper integrations into their enterprise stack,” Konietzke said. “And that’s what we’re beginning to focus on and where we’re building a lot of capabilities around that.” In addition, support for GraphQL and an expanded rich text editing experience is coming up. The company also recently launched a new editing experience.

Dec
06
2018
--

Looker snags $103 million investment on $1.6 billion valuation

Looker has been helping customers visualize and understand their data for seven years, and today it got a big reward, a $103 million Series E investment on a $1.6 billion valuation.

The round was led by Premji Invest, with new investment from Cross Creek Advisors and participation from the company’s existing investors. With today’s investment, Looker has raised $280.5 million, according the company.

In spite of the large valuation, Looker CEO Frank Bien really wasn’t in the mood to focus on that particular number, which he said was arbitrary, based on the economic conditions at the time of the funding round. He said having an executive team old enough to remember the dot-com bubble from the late 1990s and the crash of 2008 keeps them grounded when it comes to those kinds of figures.

Instead, he preferred to concentrate on other numbers. He reported that the company has 1,600 customers now and just crossed the $100 million revenue run rate, a significant milestone for any enterprise SaaS company. What’s more, Bien reports revenue is still growing 70 percent year over year, so there’s plenty of room to keep this going.

He said he took such a large round because there was interest and he believed that it was prudent to take the investment as they move deeper into enterprise markets. “To grow effectively into enterprise customers, you have to build more product, and you have to hire sales teams that take longer to activate. So you look to grow into that, and that’s what we’re going to use this financing for,” Bien told TechCrunch.

He said it’s highly likely that this is the last private fundraising the company will undertake as it heads toward an IPO at some point in the future. “We would absolutely view this as our last round unless something drastic changed,” Bien said.

For now, he’s looking to build a mature company that is ready for the public markets whenever the time is right. That involves building internal processes of a public company even if they’re not there yet. “You create that maturity either way, and I think that’s what we’re doing. So when those markets look okay, you could look at that as another funding source,” he explained.

The company currently has around 600 employees. Bien indicated that they added 200 this year alone and expect to add additional headcount in 2019 as the business continues to grow and they can take advantage of this substantial cash infusion.

Dec
06
2018
--

LeanIX, the SaaS that lets enterprises map out their software architecture, closes $30M Series C

LeanIX, the Software-as-a-Service for “Enterprise Architecture Management,” has closed $30 million in Series C funding.

The round is led by Insight Venture Partners, with participation from previous investors Deutsche Telekom Capital Partners (DTCP), Capnamic Ventures and Iris Capital. It brings LeanIX’s total funding to nearly $40 million since the German company was founded in 2012.

Operating in the enterprise architecture space, previously the domain of a company’s IT team only, LeanIX’s SaaS might well be described as a “Google Maps for IT architectures.”

The software lets enterprises map out all of the legacy software or modern SaaS that the organisation is run on, including creating meta data on things like what business process it is used for or capable of supporting, what tech (and version) powers it, what teams are using or have access to it, who is responsible for it, as well as how the different architecture fits together.

From this vantage point, enterprises can not only keep a better handle on all of the software from different vendors they are buying in, including how that differs or might be better utilised across distributed teams, but also act in a more nimble way in terms of how they adopt new solutions or decommission legacy ones.

In a call with André Christ, co-founder and CEO, he described LeanIX as providing a “single source of truth” for an enterprise’s architecture. He also explained that the SaaS takes a semi-automatic approach to how it maps out that data. A lot of the initial data entry will need to be done manually, but this is designed to be done collaboratively across an organisation and supported by an “easy-to-use UX,” while LeanIX also extracts some data automatically via integrations with ServiceNow (e.g. scanning software on servers) or Signavio (e.g. how IT Systems are used in Business Processes).

More broadly, Christ tells me that the need for a solution like LeanIX is only increasing, as enterprise architecture has shifted away from monolithic vendors and software to the use of a sprawling array of cloud or on-premise software where each typically does one job or business process really well, rather than many.

“With the rising adoption of SaaS, multi-cloud and microservices, an agile management of the Enterprise Architecture is harder to achieve but more important than ever before,” he says. “Any company in any industry using more than a hundred applications is facing this challenge. That’s why the opportunity is huge for LeanIX to define and own this category.”

To that end, LeanIX says the investment will be used to accelerate growth in the U.S. and for continued product innovation. Meanwhile, the company says that in 2018 it achieved several major milestones, including doubling its global customer base, launching operations in Boston and expanding its global headcount with the appointment of several senior-level executives. Enterprises using LeanIX include Adidas, DHL, Merck and Santander, with strategic partnerships with Deloitte, ServiceNow and PwC, among others.

“For businesses today, effective enterprise architecture management is critical for driving digital transformation, and requires robust tools that enable collaboration and agility,” said Teddie Wardi, principal at Insight Venture Partners, in a statement. “LeanIX is a pioneer in the space of next-generation EA tools, achieved staggering growth over the last year, and is the trusted partner for some of today’s largest and most complex organizations. We look forward to supporting its continued growth and success as one of the world’s leading software solutions for the modernization of IT architectures.”

Dec
05
2018
--

Nondeterministic Functions in MySQL (i.e. rand) Can Surprise You

MySQL non deterministic functions rand

Working on a test case with sysbench, I encountered this:

mysql> select * from sbtest1 where id = round(rand()*10000, 0);
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|  179 | 499871 | 09833083632-34593445843-98203182724-77632394229-31240034691-22855093589-98577647071-95962909368-34814236148-76937610370 | 62233363025-41327474153-95482195752-11204169522-13131828192 |
| 1606 | 502031 | 81212399253-12831141664-41940957498-63947990218-16408477860-15124776228-42269003436-07293216458-45216889819-75452278174 | 25423822623-32136209218-60113604068-17409951653-00581045257 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
2 rows in set (0.30 sec)

I was really surprised. First, and the most important, id is a primary key and the rand() function should produce just one value. How come it returns two rows? Second, why is the response time 0.30 sec? That seems really high for a primary key access.

Looking further:

CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1
mysql> explain select * from sbtest1 where id = round(rand()*10000, 0);
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | sbtest1 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 986400 |    10.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------------+

So it is a primary key, but MySQL does not use an index, and it returns two rows. Is this a bug?

Deterministic vs nondeterministic functions

Turned out it is not a bug at all. It is pretty logical behavior from MySQL, but it is not what we would expect. First, why a full table scan? Well, rand() is nondeterministic function. That means we do not know what it will return ahead of time, and actually that is exactly the purpose of rand() – to return a random value. In this case, it is only logical to evaluate the function for each row, each time, and compare the results. i.e. in our case

  1. Read row 1, get the value of id, evaluate the value of RAND(), compare
  2. Proceed using the same algorithm with the remaining rows.

In other words, as the value of rand() is not known (not evaluated) beforehand, so we can’t use an index.

And in this case – rand() function – we have another interesting consequence. For larger tables with an auto_increment primary key, the probability of matching the rand() value and the auto_increment value is higher, so we can get multiple rows back. In fact, if we read the whole table from the beginning and keep comparing the auto_inc sequence with “the roll of the dice”, we can get many rows back.

That behavior is totally counter-intuitive. Nevertheless, to me, it’s also the only correct behavior.

We expect to have the rand() function evaluated before running the query.  This can actually be achieved by assigning rand() to a variable:

mysql> set @id=round(rand()*10000, 0); select @id; select * from sbtest1 where id = @id;
Query OK, 0 rows affected (0.00 sec)
+------+
| @id  |
+------+
| 6068 |
+------+
1 row in set (0.00 sec)
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id   | k      | c                                                                                                                       | pad                                                         |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 6068 | 502782 | 84971025350-12845068791-61736600622-38249796467-85706778555-74134284808-24438972515-17848828748-86869270666-01547789681 | 17507194006-70651503059-23792945260-94159543806-65683812344 |
+------+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain select * from sbtest1 where id = @id;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | sbtest1 | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)

This would meet our expectations.

There are (at least) two bug reports filed, with very interesting discussion:

  1. rand() used in scalar functions returns multiple rows
  2. SELECT on PK with ROUND(RAND()) give wrong errors

Other databases

I wanted to see how it works in other SQL databases. In PostgreSQL, the behavior is exactly the same as MySQL:

postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 4093 | asdasda
 9378 | asdasda
(2 rows)
postgres=# select * from t2 where id = cast(random()*10000 as int);
  id  |    c
------+---------
 5988 | asdasda
 6674 | asdasda
(2 rows)
postgres=# explain select * from t2 where id = cast(random()*10000 as int);
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..159837.60 rows=1 width=12)
   Filter: (id = ((random() * '10000'::double precision))::integer)
(2 rows)

And SQLite seems different, evaluating the random() function beforehand:

sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
16239|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
32910|asdsadasdsa
sqlite> select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
58658|asdsadasdsa
sqlite> explain select * from t2 where id = cast(abs(CAST(random() AS REAL))/92233720368547 as int);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    00  Start at 12
1     OpenRead       0     30182  0     2              00  root=30182 iDb=0; t2
2     Function0      0     0     3     random(0)      00  r[3]=func(r[0])
3     Cast           3     69    0                    00  affinity(r[3])
4     Function0      0     3     2     abs(1)         01  r[2]=func(r[3])
5     Divide         4     2     1                    00  r[1]=r[2]/r[4]
6     Cast           1     68    0                    00  affinity(r[1])
7     SeekRowid      0     11    1                    00  intkey=r[1]; pk
8     Copy           1     5     0                    00  r[5]=r[1]
9     Column         0     1     6                    00  r[6]=t2.c
10    ResultRow      5     2     0                    00  output=r[5..6]
11    Halt           0     0     0                    00
12    Transaction    0     0     2     0              01  usesStmtJournal=0
13    Int64          0     4     0     92233720368547  00 r[4]=92233720368547
14    Goto           0     1     0                    00

Conclusion

Be careful when using MySQL nondeterministic functions in  a “where” condition – rand() is the most interesting example – as their behavior may surprise you. Many people believe this to be a bug that should be fixed. Let me know in the comments: do you think it is a bug or not (and why)? I would also be interested to know how it works in other, non-opensource databases (Microsoft SQL Server, Oracle, etc)

PS: Finally, I’ve got a “clever” idea – what if I “trick” MySQL by using the deterministic keyword…

MySQL stored functions: deterministic vs not deterministic

So, I wanted to see how it works with MySQL stored functions if they are assigned “deterministic” and “not deterministic” keywords. First, I wanted to “trick” mysql and pass the deterministic to the stored function but use rand() inside. Ok, this is not what you really want to do!

DELIMITER $$
CREATE FUNCTION myrand() RETURNS INT
    DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;

From MySQL manual about MySQL stored routines we can read:

Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead to unexpected results by causing the optimizer to make incorrect execution plan choices. Declaring a deterministic routine as NONDETERMINISTIC might diminish performance by causing available optimizations not to be used.

The result is interesting:

mysql> select myrand();
+----------+
| myrand() |
+----------+
|     4202 |
+----------+
1 row in set (0.00 sec)
mysql> select myrand();
+----------+
| myrand() |
+----------+
|     7548 |
+----------+
1 row in set (0.00 sec)
mysql> explain select * from t2 where id = myrand()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Impossible WHERE noticed after reading const tables
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------+
| Level | Code | Message                                                                        |
+-------+------+--------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select '2745' AS `id`,'asasdas' AS `c` from `test`.`t2` where 0 |
+-------+------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 4202;
+------+---------+
| id   | c       |
+------+---------+
| 4202 | asasdas |
+------+---------+
1 row in set (0.00 sec)
mysql> select * from t2 where id = 2745;
+------+---------+
| id   | c       |
+------+---------+
| 2745 | asasdas |
+------+---------+
1 row in set (0.00 sec)

So MySQL optimizer detected the problem (somehow).

If I use the NOT DETERMINISTIC keyword, then MySQL works the same as when using the rand() function:

DELIMITER $$
CREATE FUNCTION myrand2() RETURNS INT
   NOT DETERMINISTIC
BEGIN
 RETURN round(rand()*10000, 0);
END$$
DELIMITER ;
mysql> explain select * from t2 where id = myrand2()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 262208
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

 


Photo by dylan nolte on Unsplash

Dec
05
2018
--

Workato raises $25M for its integration platform

Workato, a startup that offers an integration and automation platform for businesses that competes with the likes of MuleSoft, SnapLogic and Microsoft’s Logic Apps, today announced that it has raised a $25 million Series B funding round from Battery Ventures, Storm Ventures, ServiceNow and Workday Ventures. Combined with its previous rounds, the company has now received investments from some of the largest SaaS players, including Salesforce, which participated in an earlier round.

At its core, Workato’s service isn’t that different from other integration services (you can think of them as IFTTT for the enterprise), in that it helps you to connect disparate systems and services, set up triggers to kick off certain actions (if somebody signs a contract on DocuSign, send a message to Slack and create an invoice). Like its competitors, it connects to virtually any SaaS tool that a company would use, no matter whether that’s Marketo and Salesforce, or Slack and Twitter. And like some of its competitors, all of this can be done with a drag-and-drop interface.

What’s different, Workato founder and CEO Vijay Tella tells me, is that the service was built for business users, not IT admins. “Other enterprise integration platforms require people who are technical to build and manage them,” he said. “With the explosion in SaaS with lines of business buying them — the IT team gets backlogged with the various integration needs. Further, they are not able to handle all the workflow automation needs that businesses require to streamline and innovate on the operations.”

Battery Ventures’ general partner Neeraj Agrawal also echoed this. “As we’ve all seen, the number of SaaS applications run by companies is growing at a very rapid clip,” he said. “This has created a huge need to engage team members with less technical skill-sets in integrating all these applications. These types of users are closer to the actual business workflows that are ripe for automation, and we found Workato’s ability to empower everyday business users super compelling.”

Tella also stressed that Workato makes extensive use of AI/ML to make building integrations and automations easier. The company calls this Recipe Q. “Leveraging the tens of billions of events processed, hundreds of millions of metadata elements inspected and hundreds of thousands of automations that people have built on our platform — we leverage ML to guide users to build the most effective integration/automation by recommending next steps as they build these automations,” he explained. “It recommends the next set of actions to take, fields to map, auto-validates mappings, etc. The great thing with this is that as people build more automations — it learns from them and continues to make the automation smarter.”

The AI/ML system also handles errors and offers features like sentiment analysis to analyze emails and detect their intent, with the ability to route them depending on the results of that analysis.

As part of today’s announcement, the company is also launching a new AI-enabled feature: Automation Editions for sales, marketing and HR (with editions for finance and support coming in the future). The idea here is to give those departments a kit with pre-built workflows that helps them to get started with the service without having to bring in IT.

Dec
05
2018
--

Camunda hauls in $28M investment as workflow automation remains hot

Camunda, a Berlin-based company that builds open-source workflow automation software, announced a €25 million (approximately $28 million) investment from Highland Europe today.

This is the company’s first investment in its 10-year history. CEO and co-founder Jakob Freund says the company has been profitable since Day One, but decided to bring in outside capital now to take on a more aggressive international expansion.

The company launched in 2008 and for the first five years offered business process management consulting services, but they found traditional offerings from companies like Oracle, IBM and Pega weren’t encouraging software developers to really embrace BPM and build new applications.

In 2013 the company decided to solve that problem and began a shift from consulting to software. “We launched our own open-source project, Camunda BPM, in 2013. We also offered a commercial distribution, obviously, because that’s where the revenue came from,” Freund explained.

The project took off and they flipped their revenue sources from 80 percent consulting/20 percent software to 90 percent software/10 percent consulting in the five years since first creating the product. They boast 200 paying customers and have built out an entire stack of products since their initial product launch.

The company expanded from 13 employees in 2013 to 100 today, with offices in Berlin and San Francisco. Freund wants to open more offices and to expand the head count. To do that, he felt the time was right to go out and get some outside money. He said they continue to be profitable and more than doubled their ARR (annual recurring revenue) in the last 12 months, but knowing they wanted to expand quickly, they wanted the investment as a hedge in case revenue slowed down during the expansion.

“However, we also want to invest heavily right now and build up the team very quickly over the next couple of years. And we want to do that in such a quick way that we want to make sure that if the revenue growth doesn’t happen as quickly as the headcount building, we’re not getting any situation where we would then need to go look funding,” he explained. Instead, they struck while the company and the overall workflow automation space is hot.

He says they want to open more sales and support offices on the east coast of the U.S. and move into Asia, as well. Further, they want to keep investing in the open-source products, and the new money gives them room to do all of this.

Dec
05
2018
--

Pindrop raises $90M to bring its voice-fraud prevention to IoT devices and Europe

When it comes to how humans communicate with each other or with machines, voice is a major interface, with growth in the latter fuelled by the rise of artificial intelligence, faster computing technology and an explosion of new devices — some of which only, or primarily, work with voice commands. But the supreme reign of voice has also opened a window of opportunity for malicious hackers — specifically, in the area of voice fraud.

Now, a security startup called Pindrop is announcing that it has raised $90 million to tackle this with a platform that it says can identify even the most sophisticated impersonations and hacking attempts, by analysing nearly 1,400 acoustic attributes to verify if a caller or a voice command is legit.

“We live in a brave new world where everything you thought you knew about security needs to be challenged,” said Vijay Balasubramaniyan, co-founder, CEO and CTO of Pindrop, who built the company (with co-founders Ahamad Mustaque and Paul Judge) originally out of his PhD thesis.

The funding is a growth round aimed specifically at two areas. First, taking US-based Pindrop into more international markets, starting with Europe — Vijay spoke to me in London — and coming soon to Asia. And second, to expand from customer service scenarios — the vast majority of its business today — into any applications that use voice interfaces, such as connected car platforms, home security devices, smart offices and smart home speakers.

To that end, this Series D includes a mix of strategic and financial investors: led by London’s Vitruvian Partners, it also includes Allegion Ventures (the corporate venture arm of the security giant), Cross Creek, systems integrator Dimension Data (“As you grow you want to be able to sell through partners,” Balasubramaniyan says), Singapore-based EDBI (to help with its push into Asia), and Goldman Sachs. Google’s CapitalG, IVP, Andreessen Horowitz, GV and Citi Ventures — all previous investors — were also in this round.

(The latter group of investors also has at least one strategic name in it: Pindrop is already working with Google, the CEO said.)

Valuation is not being disclosed, but in Pindrop’s Series C round in 2017, the company was valued at $600 million post-mioney, according to PitchBook, and the valuation now is “much higher,” Balasubramaniyan said with a laugh. The company’s raised $212 million to date.

The crux of what Pindrop has built is a platform that makes a voice “fingerprint” that identifies not just the specific tone you emit, but how you speak, where you are typically calling from and the sounds of that space, and even your regular device — something we can do now with the rise of smartphones that we typically don’t share with others — with each handset having a unique acoustic profile. Matching all these against what is determined to be your “normal” circumstances helps to start to build verification, Balasubramaniyan explained.

Founded in 2011 in Atlanta, GA, most of Pindrop’s business today has been built around helping to prevent voice fraud in customer service engagements. That business, Balasubramaniyan said, is on the path to profitability by the first quarter of 2019 and continues to grow well, with a voice fraud problem in the space that costs the industry $22 billion ($14 billion in fraud, $8 billion in time and systems wasted on security questions). (Pindrop claims it has stopped over $350 million in voice-based fraud and attacks so far  in 2018.)

Current customers include eight of the 10 largest banks and five largest insurance companies in the U.S., with more than 200 million consumer accounts protected at the moment. 

“There are 3.6 million agents in customer service jobs in the UK, with one in every 89 people in the US in this role,” he noted. “But last year, there there were 4.4 million new assistants added to the market,” referring to all the devices, apps and services that have hit us, “and that’s where we realised that it’s about expansion for us.”

In cases like connected home or office scenarios, some of the ways that these might get hacked are only starting to become apparent.

Balasubramaniyan noted that it can be something as innocent as a little girl ordering an expensive doll house while playing with Alexa (Pindrop is also now starting to work with Amazon, too, as it happens), or something more nefarious like a fraudster calling your answering machine to command your smart home hub to unlock your front door.

But we are unlikely to turn away from voice interfaces, and that is where a company like Pindrop (as well as competitors like Verint) come in.

“Voice-enabled interfaces are expanding how consumers interact with IoT devices in their everyday lives – as well as IoT manufacturers’ ability to offer smarter and stronger solutions,” said Allegion Ventures President Rob Martens, in a statement. “We’re excited about the future of voice technology and see Pindrop as a pioneer in the space. We look forward to working with Vijay and his team to accelerate the adoption of voice technology into new markets.”

More generally, as we see the rise of more voice services it’s only natural that we will start to see more ways of trying to hack them. Pindrop puts an interesting focus on the aural details of an experience as a way of helping to fight that. It’s detail that we often overlook in today’s very visual culture, but it’s also in a way a return to more analogue days.

Balasubramaniyan said one of his inspirations for the startup was a story he read as a child in 2600, the Hacker publication, that stuck with him, about Bell Labs. There, they had a team of blind engineers who could identify problems on a phone line by listening to the dial tone. “They had golden hearing,” he said.

 

Dec
05
2018
--

Salesforce wants to deliver more automated field service using IoT data

Salesforce has been talking about the Internet of Things for some time as a way to empower field service workers. Today, the company announced Field Service Lightning, a new component designed to deliver automated IoT data to service technicians in the field on their mobile devices.

Once you connect sensors in the field to Service Cloud, you can make this information available in an automated fashion to human customer service agents and pull in other data about the customer from Salesforce’s CRM system to give the CSR a more complete picture of the customer.

“Drawing on IoT signals surfaced in the Service Cloud console, agents can gauge whether device failure is imminent, quickly determine the source of the problem (often before the customer is even aware a problem exists) and dispatch the right mobile worker with the right skill set,” Salesforce’s SVP and GM for Salesforce Field Service Lightning Paolo Bergamo wrote in a blog post introducing the new feature.

The field service industry has been talking for years about using IoT data from the field to deliver more proactive service and automate the customer service and repair process. That’s precisely what this new feature is designed to do. Let’s say you have a “smart home” with a heating and cooling system that can transmit data to the company that installed your equipment. With a system like this in place, the sensors could tell your HVAC dealer that a part is ready to break down and automatically start a repair process (that would presumably include calling the customer to tell them about it). When a CSR determines a repair visit is required, the repair technician would receive all the details on their smart phone.

Customer Service Console view. Gif: SalesforceIt also could provide a smoother experience because the repair technician can prepare before he or she leaves for the visit with the right equipment and parts for the job and a better understanding of what needs to be done before arriving at the customer location. This should theoretically lead to more efficient service calls.

All of this is in line with a vision the field service industry has been talking about for some time that you could sell a subscription to a device like an air conditioning system instead of the device itself. This would mean that the dealer would be responsible for keeping it up and running and having access to data like this could help that vision to become closer to reality.

In reality, most companies are probably not ready to implement a system like this and most equipment in the field has not been fit with sensors to deliver this information to the Service Cloud. Still, companies like Salesforce, ServiceNow and ServiceMax (owned by GE) want to release products like this for early adopters and to have something in place as more companies look to put smarter systems in place in the field.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com