Dec
19
2018
--

These ten enterprise M&A deals totaled over $87 billion this year

M&A activity was brisk in the enterprise market this year with 10 high-profile deals totaling almost $88 billion. Companies were opening up their wallets and pouring money into mega acquisitions. It’s worth noting that the $88 billion figure doesn’t include Dell paying investors over $23 billion for VMware tracking stock to take the company public again or several other deals of over a billion dollars that didn’t make our list.

Last year’s big deals included Intel buying MobileEye for $15 billion and Cisco getting AppDynamics for $3.7 billion, but there were not as many big ones. Adobe, which made two large acquisitions this year was mostly quiet last year only make a minor purchase. Salesforce too was mostly quiet in 2017, only buying a digital creative agency, after an active 2016. SAP also made only one purchase in 2017, paying $350 million for Gigya. Microsoft was active buying 9 companies, but these were primarily minor. Perhaps everyone was saving their pennies for 2018.

This year by contrast was go big or go home, and we saw action across the board from the usual suspects. Large companies looking to change their fortunes or grow their markets went shopping and came home with some expensive trinkets for their collections. Some of the deals are still waiting to pass regulatory hurdles and won’t be closing until 2019. Regardless, it’s too soon to judge whether these big-bucks ventures will pay the dividends that the their buyers hope, or if they end up being M&A dust in the wind.

IBM acquires Red Hat for $34 billion

By far the biggest and splashiest deal of the year goes to IBM, which bet the farm to acquire Red Hat for a staggering $34 billion. IBM sees this acquisition as a way to build out its hybrid cloud business. It’s a huge bet and one that could determine the success of Big Blue as an organization in the coming years.

Broadcom nets CA Technologies for $18.5 billion

This deal was unexpected as Broadcom, a chip maker, spent the second largest amount of money in a year of big spending. What Broadcom got for its many billions was an old school IT management and software solutions provider. Perhaps Broadcom felt it needed it to branch out beyond pure chip making and CA offered a way to do it, albeit a rather expensive one.

SAP buys Qualtrics for $8 billion

While not anywhere close to the money IBM or Broadcom spent, SAP went out and nabbed Qualtrics last month just before the company was about to IPO, still paying a healthy $8 billion. The company believes that the new company could help build a bridge between SAP operational data inside its back-end ERP systems and Qualtrics customer data on the front end. Time will tell if they are right.

Microsoft gets Github for $7.5 billion

In June, Microsoft swooped in and bought Github, giving it a key developer code repository. It was a lot of money to pay, and Diane Greene expressed regret that Google hadn’t been able to get it. That’s because cloud companies are working hard to win developer hearts and minds. Microsoft has a chance to push Github users toward its products, but it has to tread carefully because they will balk if Microsoft goes too far.

Salesforce snares Mulesoft for $6.5 billion

Salesforce wasn’t about to be left out of the party in 2018 and in March, the CRM giant announced it was buying API integration vendor, Mulesoft for a cool $6.5 billion. It was a big deal for Salesforce, which tends to be acquisitive, but typically on smaller deals. This one was a key purchase though because it gives the company the ability to access data wherever it lives, on premises or in the cloud, and that could be key for them moving forward.

Adobe snags Marketo for $4.75 billion

Adobe has built a strong company primarily on the strength of its Creative Cloud, but it has been trying to generate more revenue on the marketing side of the business. To that end, it acquired Marketo for $4.75 billion and immediately boosted its marketing business, especial when combined with the $1.68 billion Magento purchase earlier in the year.

SAP acquires CallidusCloud for $2.4 billion

SAP doesn’t do as many acquisitions as some of its fellow large tech companies mentioned here, but this year it did two. Not only did it buy Qualtrics for $8 billion, it also grabbed CallidusCloud for $2.4 billion. SAP is best known for managing back office components with its ERP software, but this adds a cloud-based, front-office sales process piece to the mix.

Cisco grabs Duo Security for $2.35 billion

Cisco has been hard at work buying up a variety of software services over the years, and this year it added to its security portfolio when it acquired Duo Security for $2.35 billion. The Michigan-based company helps companies secure applications using their own mobile devices and could be a key part of the Cisco security strategy moving forward.

Twilio buys SendGrid for $2 billion

Twilio got into the act this year too. While not in the same league as the other large tech companies on this list, it saw a piece it felt would enhance its product set and it was willing to spend big to get it. Twilio, which made its name as a communications API company, saw a kindred spirit in SendGrid, spending $2 billion to get the API-based email service.

Vista snares Apttio for $1.94 billion

Vista Equity Partners is the only private equity firm on the list, but it’s one with an appetite for enterprise technology. With Apttio, it gets a company that can help companies understand their cloud assets alongside their on-prem ones. The company had been public before Vista bought it for $1.94 billion last month.

Dec
19
2018
--

Microsoft launches a new app to make using Office easier

Microsoft today announced a new Office app that’s now available to Windows Insiders and that will soon roll out to all Windows 10 users. The new Office app will replace the existing My Office app (yeah, those names…). While the existing app was mostly about managing Office 365 subscriptions, the new app provides significantly more features and will essentially become the central hub for Office users to switch between apps, see their pinned documents and access other Office features.

The company notes that this launch is part of its efforts to make using Office easier and help users “get the most out of Office and getting them back into their work quickly.” For many Office users, Outlook, Word, PowerPoint and Excel are basically their central tools for getting work done, so it makes sense to give them a single app that combines in a single place all the information about their work.

Using the app, users can switch between apps, see everything they’ve been working on, as well as recommended documents based on what I assume is data from the Microsoft Graph. There’s also an integrated search feature and admins will be able to customize the app with other line of business applications and their company’s branding.

The app is free and will be available in the oft-forgotten Microsoft Store. It’ll work for all users with Office 365 subscriptions or access to Office 2019, Office 2016 or Office Online.

Dec
19
2018
--

Google’s Cloud Spanner database adds new features and regions

Cloud Spanner, Google’s globally distributed relational database service, is getting a bit more distributed today with the launch of a new region and new ways to set up multi-region configurations. The service is also getting a new feature that gives developers deeper insights into their most resource-consuming queries.

With this update, Google is adding to the Cloud Spanner lineup Hong Kong (asia-east2), its newest data center location. With this, Cloud Spanner is now available in 14 out of 18 Google Cloud Platform (GCP) regions, including seven the company added this year alone. The plan is to bring Cloud Spanner to every new GCP region as they come online.

The other new region-related news is the launch of two new configurations for multi-region coverage. One, called eur3, focuses on the European Union, and is obviously meant for users there who mostly serve a local customer base. The other is called nam6 and focuses on North America, with coverage across both costs and the middle of the country, using data centers in Oregon, Los Angeles, South Carolina and Iowa. Previously, the service only offered a North American configuration with three regions and a global configuration with three data centers spread across North America, Europe and Asia.

While Cloud Spanner is obviously meant for global deployments, these new configurations are great for users who only need to serve certain markets.

As far as the new query features are concerned, Cloud Spanner is now making it easier for developers to view, inspect and debug queries. The idea here is to give developers better visibility into their most frequent and expensive queries (and maybe make them less expensive in the process).

In addition to the Cloud Spanner news, Google Cloud today announced that its Cloud Dataproc Hadoop and Spark service now supports the R language, in addition to Python 3.7 support on App Engine.

Dec
19
2018
--

Sprout Social nabs $40.5M on an $800M valuation, doubles down on social tools for businesses

Sprout Social, a social media monitoring, marketing and analytics service with 25,000 business customers that helps these organizations manage their public profiles and interact with customers across Twitter, Facebook, Instagram, LinkedIn, Pinterest and Google+ (soon to RIP), has raised $40.5 million in funding in order expand its business internationally and add more functionality to its platform.

The money — a Series D led by Future Fund with participation from Goldman Sachs and New Enterprise Associates — brings the total raised by Sprout to $103.5 million to date. We’ve confirmed directly with the CEO Justyn Howard that the valuation is now around $800 million. For some context, Sprout last raised in 2016 — $42 million also from Goldman Sachs and NEA — and at the time it had a post-money valuation of $253 million, according to PitchBook, so this is a very healthy leap.

But between then and now, there have been some interesting developments that could have shifted that price in either direction.

On one side, multiple sources have told us that social media platforms were being courted by Microsoft for acquisition at one point (Microsoft declined to comment on the rumor when we looked into it).

On the other, one of Sprout’s biggest competitors, Hootsuite (with 15 million users, paid and free), has been rumored to be shopping itself for about $750 million, or potentially going public, while smaller competitors have moved in on some consolidation to bulk up their own presence in the field.

In the meantime, Sprout itself has been growing. The company’s 25,000 customers are up from 16,000 two years ago, with current users including Microsoft, NBCUniversal, the Denver Nuggets and Grubhub and MTV.

One of the reasons for the growth is the larger shift we’ve seen in how businesses interact with the outside world.

Social media is today perhaps the most important platform for businesses to communicate with their users: not only has social media helped customers circumvent the often frustrating spaghetti that lies behind the deceptive phrase “contact us” on websites, but social media has become a spotlight, which businesses have to watch lest a sticky situation snowballs into a public relations disaster.

Platforms like Twitter and Facebook, to grow their revenues, have ramped up their efforts to work on social media campaigns and interactions directly with organizations. But there is still a place for third parties like Sprout Social to manage work that goes across a number of social sites, and to address services that the social platforms themselves do not necessarily want to invest in building directly.

“I think there are a bunch of reasons why we don’t build bot experience ourselves,” Jeff Lesser, who heads up product marketing for Twitter Business Messaging, told me when Sprout launched a “bot builder” to be used on Twitter, and I asked him why Sprout shouldn’t worry about Twitter cannibalizing its product. “There are millions of types of businesses that can use our platform, so we’re letting the ecosystem build the solutions that they need. We are focusing on building the canvas for them to do that.”

In other words, while Sprout (and competitors) should always be a little wary of platform players who may decide to simply kick them off in the name of business, there are always going to be opportunities if they have the resources double down on more tech to solve a different problem, or simply execute on fixing an existing problem better.

“Social marketing and social data have become mission-critical to virtually all aspects of business. Sprout’s relentless focus on quality and customer success have made us the top customer-rated platform in every category and segment,” said Justyn Howard, CEO of Sprout, in a statement. “In many ways, social is still in its infancy, and we’re fortunate to help so many great customers navigate this evolving set of challenges.”

Dec
19
2018
--

Percona Server for MySQL 5.7.24-27 Is Now Available

Percona Server for MySQL

Percona Server for MySQLPercona announces the release of Percona Server for MySQL 5.7.24-27 on December 19, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.24, including all the bug fixes in it. Percona Server for MySQL 5.7.24-27 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

If you’re currently using Percona Server for MySQL 5.7, Percona recommends upgrading to this version of 5.7 prior to upgrading to Percona Server for MySQL 8.0.

Bugs Fixed:

  • When uninstalling Percona Server for MySQL packages on CentOS 7 default configuration file my.cnf would get removed as well. This fix makes the backup of the configuration file instead of removing it. Bug fixed #5092.

Find the release notes for Percona Server for MySQL 5.7.24-27 in our online documentation. Report bugs in the Jira bug tracker.

Dec
19
2018
--

Dataiku raises $101 million for its collaborative data science platform

Dataiku wants to turn buzzwords into an actual service. The company has been focused on data tools for many years, before everybody started talking about big data, data science and machine learning.

And the company just raised $101 million in a round led by Iconiq Capital, with Alven Capital, Battery Ventures, Dawn Capital and FirstMark Capital also participating.

If you’re generating a lot of data, Dataiku helps you find a meaning behind data sets. First, you import your data by connecting Dataiku to your storage system. The platform supports dozens of database formats and sources — Hadoop, NoSQL, images, you name it.

You can then use Dataiku to visualize your data, clean your data set, run some algorithms on your data in order to build a machine learning model, deploy it and more. Dataiku has a visual coding tool, or you can use your own code.

But Dataiku isn’t just a tool for data scientists. Even if you’re a business analyst, you can visualize and extract data from Dataiku directly. And because of its software-as-a-service approach, your entire team of data scientists and data analysts can collaborate on Dataiku.

Clients use it to track churn, detect fraud, forecast demand, optimize lifetime values and more. Customers include General Electric, Sephora, Unilever, KUKA, FOX and BNP Paribas.

With today’s funding round, the company plans to double its staff. The company currently works with 200 people in New York, Paris and London. It plans to open offices in Singapore and Sydney, as well.

Dec
19
2018
--

Using Partial and Sparse Indexes in MongoDB

MongoDb using partial sparse indexes

MongoDb using partial sparse indexesIn this article I’m going to talk about partial and sparse indexes in MongoDB® and Percona Server for MongoDB®. I’ll show you how to use them, and look at cases where they can be helpful. Prior to discussing these indexes in MongoDB in detail, though, let’s talk about an issue on a relational database like MySQL®.

The boolean issue in MySQL

Consider you have a very large table in MySQL with a boolean column. Typically you created a ENUM(‘T’,’F’) field to store the boolean information or a TINYINT column to store only 1s and 0s. This is good so far. But think now what you can do if you need to run a lot of queries on the table, with a condition on the boolean field, and no other relevant conditions on other indexed columns are used to filter the examined rows.

Why not create and index on the boolean field? Well, yes, you can, but in some cases this solution will be completely useless and will introduce an overhead for the index maintenance.

Think about if you have an even distribution of true and false values in the table, in more or less a 50:50 split. In this situation, the index on the boolean column cannot be used because MySQL will prefer to do a full scan of the large table instead of selecting half of rows using the BTREE entries. We can say that a boolean field like this one has a low cardinality, and it’s not highly selective.

Consider now the case in which you don’t have an even distribution of the values, let’s say 2% of the rows contain false and the remaining 98% contain true. In such a situation, a query to select the false values will most probably use the index. The queries to select the true values won’t use the index, for the same reason we have discussed previously. In this second case the index is very useful, but only for selecting the great minority of rows. The remaining 98% of the entries in the index are completely useless. This represents a great waste of disk space and resources, because the index must be maintained for each write.

It’s not just booleans that can have this problem in relation to index usage, but any field with a low cardinality.

Note: there are several workarounds to deal with this problem, I know. For example, you can create a multi-column index using a more selective field and the boolean. Or you could design your database differently. Here, I’m illustrating the nature of the problem in order to explain a MongoDB feature in a context. 

The boolean issue in MongoDB

How about MongoDB? Does MongoDB have the same problem?  The answer is: yes, MongoDB has the same problem. If you have a lot of documents in a collection with a boolean field or a low cardinality field, and you create an index on it, then you will have a very large index that’s not really useful. But more importantly you will have writes degradation for the index maintenance.

The only difference is that MongoDB will tend to use the index anyway, instead of doing the entire collection scan, but the execution time will be of the same magnitude as doing the COLLSCAN. In the case of very large indexes, a COLLSCAN should be preferable.

Fortunately MongoDB has an option that you can specify during index creation to define a Partial Index. Let’s see.

Partial Index

A partial index is an index that contains only a subset of values based on a filter rule. So, in the case of the unevenly distributed boolean field, we can create an index on it specifying that we want to consider only the false values. This way we avoid recording the remaining 98% of useless true entries. The index will be smaller, we’ll save disk and memory space, and the most frequent writes – when entering the true values – won’t initiate the index management activity. As a result, we won’t have lots of penalties during writes but we’ll have a useful index when searching the false values.

Let’s say that, when you have an uneven distribution, the most relevant searches are the ones for the minority of the values. This is in general the scenario for real applications.

Let’s see now how to create a Partial Index.

First, let’s create a collection with one million random documents. Each document contains a boolean field generated by the javascript function randomBool(). The function generates a false value in 5% of the documents, in order to have an uneven distribution. Then, test the number of false values in the collection.

> function randomBool() { var bool = true; var random_boolean = Math.random() >= 0.95; if(random_boolean) { bool = false }; return bool; }
> for (var i = 1; i <= 1000000; i++) { db.test.insert( { _id: i, name: "name"+i, flag: randomBool() } ) }
WriteResult({ "nInserted" : 1 })
> db.test.find().count()
1000000
> db.test.find( { flag: false } ).count()
49949

Create the index on the flag field and look at the index size using db.test.stats().

> db.test.createIndex( { flag: 1 } )
{ "createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1 }
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 4575232 }

The index we created is 4575232 bytes.

Test some simple queries to extract the documents based on the flag value and take a look at the index usage and the execution times. (For this purpose, we use an explainable object)

// create the explainable object
> var exp = db.test.explain( "executionStats" )
// explain the complete collection scan
> exp.find( {  } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 1000000,
		"executionTimeMillis" : 250,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"nReturned" : 1000000,
			"executionTimeMillisEstimate" : 200,
			"works" : 1000002,
			"advanced" : 1000000,
			"needTime" : 1,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents flag=true
> exp.find( { flag: true } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 1028,
		"totalKeysExamined" : 950051,
		"totalDocsExamined" : 950051,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 990,
			"works" : 950052,
			"advanced" : 950051,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 7422,
			"restoreState" : 7422,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 950051,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 950051,
				"executionTimeMillisEstimate" : 350,
				"works" : 950052,
				"advanced" : 950051,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 7422,
				"restoreState" : 7422,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[true, true]"
					]
				},
				"keysExamined" : 950051,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// find the documents with flag=false
> exp.find( { flag: false } )
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 83,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 70,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 10,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

As expected, MongoDB does a COLLSCAN when looking for db.test.find( {} ). The important thing here is that it takes 250 milliseconds for the entire collection scan.

In both the other cases – find ({flag:true}) and find({flag:false}) – MongoDB uses the index. But let’s have a look at the execution times:

  • for db.test.find({flag:true}) is 1028 milliseconds. The execution time is more than the COLLSCAN. The index in this case is not useful. COLLSCAN should be preferable.
  • for db.test.find({flag:false}) is 83 milliseconds. This is good. The index in this case is very useful.

Now, create the partial index on the flag field. To do it we must use the PartialFilterExpression option on the createIndex command.

// drop the existing index
> db.test.dropIndex( { flag: 1} )
{ "nIndexesWas" : 2, "ok" : 1 }
// create the partial index only on the false values
> db.test.createIndex( { flag : 1 }, { partialFilterExpression :  { flag: false }  } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
// get the index size
> db.test.stats().indexSizes
{ "_id_" : 13103104, "flag_1" : 278528 }
// create the explainalbe object
> var exp = db.test.explain( "executionStats" )
// test the query for flag=false
> exp.find({ flag: false  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : false
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 49949,
		"executionTimeMillis" : 80,
		"totalKeysExamined" : 49949,
		"totalDocsExamined" : 49949,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 49949,
			"executionTimeMillisEstimate" : 80,
			"works" : 49950,
			"advanced" : 49949,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 390,
			"restoreState" : 390,
			"isEOF" : 1,
			"invalidates" : 0,
			"docsExamined" : 49949,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 49949,
				"executionTimeMillisEstimate" : 40,
				"works" : 49950,
				"advanced" : 49949,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 390,
				"restoreState" : 390,
				"isEOF" : 1,
				"invalidates" : 0,
				"keyPattern" : {
					"flag" : 1
				},
				"indexName" : "flag_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"flag" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"flag" : [
						"[false, false]"
					]
				},
				"keysExamined" : 49949,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}
// test the query for flag=true
> exp.find({ flag: true  })
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.test",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"flag" : {
				"$eq" : true
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 950051,
		"executionTimeMillis" : 377,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"flag" : {
					"$eq" : true
				}
			},
			"nReturned" : 950051,
			"executionTimeMillisEstimate" : 210,
			"works" : 1000002,
			"advanced" : 950051,
			"needTime" : 49950,
			"needYield" : 0,
			"saveState" : 7812,
			"restoreState" : 7812,
			"isEOF" : 1,
			"invalidates" : 0,
			"direction" : "forward",
			"docsExamined" : 1000000
		}
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

We can notice the following:

  • db.test.find({flag:false}) uses the index and the execution time is more or less the same as before
  • db.test.find({flag:true}) doesn’t use the index. MongoDB does the COLLSCAN and the execution is better than before
  • note the index size is only 278528 bytes. now A great saving in comparison to the complete index on flag. There won’t be overhead during the writes in the great majority of the documents.

Partial option on other index types

You can use the partialFilterExpression option even in compound indexes or other index types. Let’s see an example of a compound index.

Insert some documents in the students collection

db.students.insert( [
{ _id:1, name: "John", class: "Math", grade: 10 },
{ _id: 2, name: "Peter", class: "English", grade: 6 },
{ _id: 3, name: "Maria" , class: "Geography", grade: 8 },
{ _id: 4, name: "Alex" , class: "Geography", grade: 5},
{ _id: 5, name: "George" , class: "Math", grade: 7 },
{ _id: 6, name: "Tony" , class: "English", grade: 9 },
{ _id: 7, name: "Sam" , class: "Math", grade: 6 },
{ _id: 8, name: "Tom" , class: "English", grade: 5 }
])

Create a partial compound index on name and class fields for the grade greater or equal to 8.

> db.students.createIndex( { name: 1, class: 1  }, { partialFilterExpression: { grade: { $gte: 8} } } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}

Notice that the grade field doesn’t necessarily need to be part of the index.

Query coverage

Using the students collection, we want now to show when a partial index can be used.

The important thing to remember is that a partial index is “partial”. It means that it doesn’t contain all the entries.

In order for MongoDB to use it the conditions in the query must include an expression on the filter field and the selected documents must be a subset of the index.

Let’s see some examples.

The following query can use the index because we are selecting a subset of the partial index.

> db.students.find({name:"Tony", grade:{$gt:8}})
{ "_id" : 6, "name" : "Tony", "class" : "English", "grade" : 9 }
// let's look at the explain
> db.students.find({name:"Tony", grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 8
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"name" : 1,
					"class" : 1
				},
				"indexName" : "name_1_class_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"name" : [ ],
					"class" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : true,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"name" : [
						"[\"Tony\", \"Tony\"]"
					],
					"class" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

The following query cannot use the index because the condition on grade > 5 is not selecting a subset of the partial index. So the COLLSCAN is needed.

> db.students.find({name:"Tony", grade:{$gt:5}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$and" : [
				{
					"name" : {
						"$eq" : "Tony"
					}
				},
				{
					"grade" : {
						"$gt" : 5
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$and" : [
					{
						"name" : {
							"$eq" : "Tony"
						}
					},
					{
						"grade" : {
							"$gt" : 5
						}
					}
				]
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Even the following query cannot use the index. As we said the grade field is not part of the index. The simple condition on grade is not sufficient.

> db.students.find({grade:{$gt:8}}).explain()
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.students",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"grade" : {
				"$gt" : 8
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"grade" : {
					"$gt" : 8
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "ip-172-30-2-181",
		"port" : 27017,
		"version" : "4.0.4",
		"gitVersion" : "f288a3bdf201007f3693c58e140056adf8b04839"
	},
	"ok" : 1
}

Sparse Index

A sparse index is an index that contains entries only for the documents that have the indexed field.

Since MongoDB is a schemaless database, not all the documents in a collection will necessarily contain the same fields. So we have two options when creating an index:

  • create a regular “non-sparse” index
    • the index contains as many entries as the documents
    • the index contains entries as null for all the documents without the indexed field
  • create a sparse index
    • the index contains as many entries as the documents with the indexed field

We call it “sparse” because it doesn’t contain all the documents of the collection.

The main advantage of the sparse option is to reduce the index size.

Here’s how to create a sparse index:

db.people.createIndex( { city: 1 }, { sparse: true } )

Sparse indexes are a subset of partial indexes. In fact you can emulate a sparse index using the following definition of a partial.

db.people.createIndex(
{city:  1},
{ partialFilterExpression: {city: {$exists: true} } }
)

For this reason partial indexes are preferred over sparse indexes.

Conclusions

Partial indexing is a great feature in MongoDB. You should consider using it to achieve the following advantages:

  • have smaller indexes
  • save disk and memory space
  • improve writes performance

You are strongly encouraged to consider partial indexes if you have one or more of these use cases:

  • you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
  • you have a low cardinality field and the majority of the queries look for a subset of the values
  • the majority of the queries look for a limited subset of the values in a field
  • you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache

Further readings

Partial indexes: https://docs.mongodb.com/manual/core/index-partial/

Sparse indexes: https://docs.mongodb.com/manual/core/index-sparse/

Articles on query optimization and investigation:


Photo by Mike Greer from Pexels

Dec
18
2018
--

Percona Server for MongoDB 4.0.4-1 GA Is Now Available

Percona Server for MongoDB Operator

Percona announces the GA release of Percona Server for MongoDB 4.0.4-1 on December 18, 2018. Download the latest version from the Percona website or the Percona software repositories.

Date: December 18, 2018
Download: Percona website
Installation: Installing Percona Server for MongoDB

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.0 Community Edition. It supports MongoDB 4.0 protocols and drivers.

Percona Server for MongoDB extends the functionality of the MongoDB 4.0 Community Edition by including the Percona Memory Engine storage engine, encrypted WiredTiger storage engine, audit logging, SASL authentication, hot backups, and enhanced query profilingPercona Server for MongoDB requires no changes to MongoDB applications or code.

This release includes all features of MongoDB 4.0 Community Edition 4.0. Most notable among these are:

Note that the MMAPv1 storage engine is deprecated in MongoDB 4.0 Community Edition 4.0.

In Percona Server for MongoDB 4.0.4-1, data at rest encryption is considered BETA quality. Do not use this feature in a production environment.

Bugs Fixed

  • PSMDB-235: In some cases, hot backup did not back up the keydb directory; mongod could crash after restore.
  • PSMDB-233: When starting Percona Server for MongoDB with WiredTiger encryption options but using a different storage engine, the server started normally and produced no warnings that these options had been ignored
  • PSMDB-239: The WiredTiger encryption was not disabled when using the Percona Memory Engine storage engine.
  • PSMDB-241: WiredTiger per database encryption keys were not purged when the database was deleted
  • PSMDB-243: A log message was added to indicate that the server is running with encryption
  • PSMDB-245: KeyDB’s WiredTiger logs were not properly rotated without restarting the server.
  • PSMDB-266: When running the server with the --directoryperdb option, the user could add arbitrary collections to the keydb directory which is designated for data encryption.

Due to the fix of bug PSMDB-266, it is not possible to downgrade from version 4.0.4-1 to version 3.6.8-2.0 of  Percona Server for MongoDB if using data at rest encryption (it will be possible to downgrade to PSMDB 3.6 as soon as PSMDB-266 is ported to that version).

Dec
18
2018
--

Ex-Googlers meld humans & machines at new cobotics startup Formant

Our distinct skill sets and shortcomings mean people and robots will join forces for the next few decades. Robots are tireless, efficient and reliable, but in a millisecond through intuition and situational awareness, humans can make decisions machine can’t. Until workplace robots are truly autonomous and don’t require any human thinking, we’ll need software to supervise them at scale. Formant comes out of stealth today to “help people speak robot,” says co-founder and CEO Jeff Linnell. “What’s really going to move the needle in the innovation economy is using humans as an empowering element in automation.”

Linnell learned the grace of uniting flesh and steel while working on the movie Gravity. “We put cameras and Sandra Bullock on dollies,” he bluntly recalls. Artistic vision and robotic precision combined to create gorgeous zero-gravity scenes that made audiences feel weightless. Google bought his startup Bot & Dolly, and Linnell spent four years there as a director of robotics while forming his thesis.

Now with Formant, he wants to make hybrid workforce cooperation feel frictionless.

The company has raised a $6 million seed round from SignalFire, a data-driven VC fund with software for recruiting engineers. Formant is launching its closed beta that equips businesses with cloud infrastructure for collecting, making sense of and acting on data from fleets of robots. It allows a single human to oversee 10, 20 or 100 machines, stepping in to clear confusion when they aren’t sure what to do.

“The tooling is 10 years behind the web,” Linnell explains. “If you build a data company today, you’ll use AWS or Google Cloud, but that simply doesn’t exist for robotics. We’re building that layer.”

A beautiful marriage

“This is going to sound completely bizarre,” Formant CTO Anthony Jules warns me. “I had a recurring dream [as a child] in which I was a ship captain and I had a little mechanical parrot on my should that would look at situations and help me decide what to do as we’d sail the seas trying to avoid this octopus. Since then I knew that building intelligent machines is what I would do in this world.”

So he went to MIT, left a robotics PhD program to build a startup called Sapient Corporation that he built into a 4,000-employee public company, and worked on the Tony Hawk video games. He too joined Google through an acquisition, meeting Linnell after Redwood Robotics, where he was COO, got acquired. “We came up with some similar beliefs. There are a few places where full autonomy will actually work, but it’s really about creating a beautiful marriage of what machines are good at and what humans are good at,” Jules tells me.

Formant now has SaaS pilots running with businesses in several verticals to make their “robot-shaped data” usable. They range from food manufacturing to heavy infrastructure inspection to construction, and even training animals. Linnell also foresees retail increasingly employing fleets of robots not just in the warehouse but on the showroom floor, and they’ll require precise coordination.

What’s different about Formant is it doesn’t build the bots. Instead, it builds the reins for people to deftly control them.

First, Formant connects to sensors to fill up a cloud with LiDAR, depth imagery, video, photos, log files, metrics, motor torques and scalar values. The software parses that data and when something goes wrong or the system isn’t sure how to move forward, Formant alerts the human “foreman” that they need to intervene. It can monitor the fleet, sniff out the source of errors, and suggest options for what to do next.

For example, “when an autonomous digger encounters an obstacle in the foundation of a construction site, an operator is necessary to evaluate whether it is safe for the robot to proceed or stop,” Linnell writes. “This decision is made in tandem: the rich data gathered by the robot is easily interpreted by a human but difficult or legally questionable for a machine. This choice still depends on the value judgment of the human, and will change depending on if the obstacle is a gas main, a boulder, or an electrical wire.”

Any single data stream alone can’t reveal the mysteries that arise, and people would struggle to juggle the different feeds in their minds. But not only can Formant align the data for humans to act on, it also can turn their choices into valuable training data for artificial intelligence. Formant learns, so next time the machine won’t need assistance.

The industrial revolution, continued

With rock-star talent poached from Google and tides lifting all automated boats, Formant’s biggest threat is competition from tech giants. Old engineering companies like SAP could try to adapt to the new real-time data type, yet Formant hopes to out-code them. Google itself has built reliable cloud scaffolding and has robotics experience from Boston Dynamics, plus buying Linnell’s and Jules’ companies. But the enterprise customization necessary to connect with different clients isn’t typical for the search juggernaut.

Linnell fears that companies that try to build their own robot management software could get hacked. “I worry about people who do homegrown solutions or don’t have the experience we have from being at a place like Google. Putting robots online in an insecure way is a pretty bad problem.” Formant is looking to squash any bugs before it opens its platform to customers in 2019.

With time, humans will become less and less necessary, and that will surface enormous societal challenges for employment and welfare. “It’s in some ways a continuation of the industrial revolution,” Jules opines. “We take some of this for granted but it’s been happening for 100 years. Photographer — that’s a profession that doesn’t exist without the machine that they use. We think that transformation will continue to happen across the workforce.”

Dec
18
2018
--

Box releases Skills, which lets developers apply AI and machine learning to Box content

When you have as much data under management as Box does, you have the key ingredient for artificial intelligence and machine learning, which feeds on copious amounts of data. Box is giving developers access to this data, while letting them choose the AI and machine learning algorithms they want to use. Today, the company announced the general availability of the Box Skills SDK, originally announced at BoxWorks a year ago.

Jeetu Patel, Box’s chief product officer and chief strategy officer, says beta customers have been focusing on use cases specific to each company. They have been pulling information from different classes of content that matter most to them to bring an element of automation to their content management. “If there’s a way to bring a level of automation with machine learning, rather than doing it manually, that would meaningfully change the way that business processes can function,” Patel told TechCrunch.

Among the use cases Box has been seeing with the 300 beta testers is using artificial intelligence to recognize the contents of a photo for the purpose of auto tagging, thereby eliminating the need for humans to do that tagging. Another example is in contract management, where the terms are pulled automatically from the contract, saving the legal team from having to do this.

Where this can get really powerful though is that the Skills SDK can drive a more complex automated workflow inside of Box. If, for example, Skills is driving the creation of automated metadata, that can in turn drive a workflow, Patel said.

Box is providing the means to ingest Box data into a given AI or machine learning algorithm, but instead of trying to create those on its own, it’s been relying on partners that have more specific expertise, such as IBM Watson, Microsoft Azure, Google Cloud Platform and Amazon Web Services. In fact, Box says it is working with dozens of AI and machine learning partners.

For customers that aren’t comfortable doing any of this on their own, Box is also providing a consulting service, where it can come into a customer and help work through a set of requirements and choose the best algorithm for the job.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com