Jul
28
2020
--

Hevo draws in $8 million Series A for its no-code data pipeline service

Hevo founders Manish Jethani and Sourabh Agarwal.

According to data pipeline startup Hevo, many small to medium-sized companies juggle more than 40 different applications to manage sales, marketing, finance, customer support and other operations. All of these applications are important sources of data that can be analyzed to improve a company’s performance. That data often remains separate, however, making it difficult for different teams to collaborate.

Hevo enables its clients’ employees to integrate data from more than 150 different sources, including enterprise software from Salesforce and Oracle, even if they don’t have any technical experience. The company announced today that it has raised an $8 million Series A round led by Singapore-based venture capital firm Qualgro and Lachy Groom, a former executive at payments company Stripe.

The round, which brings Hevo’s total raised so far to $12 million, also included participation from returning investors Chiratae Ventures and Sequoia Capital India’s early-stage startup program Surge. The company was first covered by TechCrunch when it raised seed funding in 2017.

Hevo’s Series A will be used to increase the number of integrations available on its platform, and hire sales and marketing teams in more countries, including the United States and Singapore. The company currently has clients in 16 markets, including the U.S., India, France, Australia and Hong Kong, and counts payments company Marqeta among its customers.

In a statement, Puneet Bysani, tech lead manager at Marqeta, said, “Hevo saved us many engineering hours, and our data teams could focus on creating meaningful KPIs that add value to Marqeta’s business. With Hevo’s pre-built connectors, we were able to get data from many sources into Redshift and Snowflake very quickly.”

Based in Bangalore and San Francisco, Hevo was founded in 2017 by chief executive officer Manish Jethani and chief technology officer Sourabh Agarwal. The two previously launched SpoonJoy, a food delivery startup that was acquired by Grofers, one of India’s largest online grocery delivery services, in 2015. Jethani and Agarwal spent a year working at Grofers before leaving to start Hevo.

Hevo originated in the challenges Jethani and Agarwal faced while developing tech for SpoonJoy’s order and delivery system.

“All of our team members would come to us and say, ‘hey, we want to look at these metrics,’ or we would ask our teams questions if something wasn’t working. Oftentimes, they would not have the data available to answer those questions,” Jethani told TechCrunch.

Then at Grofers, Jethani and Agarwal realized that even large companies face the same challenges. They decided to work on a solution to allow companies to quickly integrate data sources.

For example, a marketing team at an e-commerce company might have data about its advertising on social media platforms, and how much traffic campaigns bring to their website or app. But they might not have access to data about how many of those visitors actually make purchases, or if they become repeat customers. By building a data pipeline with Hevo, they can bring all that information together.

Hevo is designed to serve all sectors, including e-commerce, healthcare and finance. In order to use it, companies sign up for Hevo’s services on its website and employees enter their credentials for software supported by the platform. Then Hevo automatically extracts and organizes the data from those sources and prepares it for cloud-based data warehouses, such as Amazon Redshift and Snowflake. A user dashboard allows companies to customize integrations or hide sensitive data.

Hevo is among several “no code, low code” startups that have recently raised venture capital funding for building tools that enable non-developers to add features to their existing software. The founders say its most direct competitor is Fivetran, an Oakland, California-based company that also builds pipelines to move data to warehouses and prepare it for analysis.

Jethani said Hevo differentiates by “optimizing our product for non-technical users.”

“The number of companies who need to use data is very high and there is not enough talent available in the market. Even if it is available, it is very competitive and expensive to hire that engineering talent because big companies like Google and Amazon are also competing for the same talent,” he added. “So we felt that there has to be some democratization of who can use this technology.”

Hevo also focuses on integrating data in real time, which is especially important for companies that provide on-demand deliveries or services. During the COVID-19 pandemic, Jethani says e-commerce clients have used Hevo to manage an influx in orders as people under stay-at-home orders purchase more items online. Companies are also relying on Hevo to help organize and manage data as their employees continue to work remotely.

In a statement about the funding, Qualgro managing partner Heang Chhor said, “Hevo provides a truly innovative solution for extracting and transforming data across multiple data sources — in real time with full automation. This helps enterprises to fully capture the benefit of data flowing though the many databases and software they currently use. Hevo’s founders are the type of globally-minded entrepreneurs that we like to support.”

Feb
13
2020
--

Google closes $2.6B Looker acquisition

When Google announced that it was acquiring data analytics startup Looker for $2.6 billion, it was a big deal on a couple of levels. It was a lot of money and it represented the first large deal under the leadership of Thomas Kurian. Today, the company announced that deal has officially closed and Looker is part of the Google Cloud Platform.

While Kurian was happy to announce that Looker was officially part of the Google family, he made it clear in a blog post that the analytics arm would continue to support multiple cloud vendors beyond Google.

“Google Cloud and Looker share a common philosophy around delivering open solutions and supporting customers wherever they are—be it on Google Cloud, in other public clouds, or on premises. As more organizations adopt a multi-cloud strategy, Looker customers and partners can expect continued support of all cloud data management systems like Amazon Redshift, Azure SQL, Snowflake, Oracle, Microsoft SQL Server and Teradata,” Kurian wrote.

As is typical in a deal like this, Looker CEO Frank Bien sees the much larger Google giving his company the resources to grow much faster than it could have on its own. “Joining Google Cloud provides us better reach, strengthens our resources, and brings together some of the best minds in both analytics and cloud infrastructure to build an exciting path forward for our customers and partners. The mission that we undertook seven years ago as Looker takes a significant step forward beginning today,” Bien wrote in his post.

At the time the deal was announced in June, the company shared a slide, which showed where Looker fits in what they call their “Smart Analytics Platform,” which provides ways to process, understand, analyze and visualize data. Looker fills in a spot in the visualization stack while continuing to support other clouds.

Slide: Google

Looker was founded in 2011 and raised more than $280 million, according to Crunchbase. Investors included Redpoint, Meritech Capital Partners, First Round Capital, Kleiner Perkins, CapitalG and PremjiInvest. The last deal before the acquisition was a $103 million Series E investment on a $1.6 billion valuation in December 2018.

Jan
22
2020
--

Placer.ai, a location data analytics startup, raises $12 million Series A

Placer.ai, a startup that analyzes location and foot traffic analytics for retailers and other businesses, announced today that it has closed a $12 million Series A. The round was led by JBV Capital, with participation from investors including Aleph, Reciprocal Ventures and OCA Ventures.

The funding will be used on research and development of new features and to expand Placer.ai’s operation in the United States.

Launched in 2016, Placer.ai’s SaaS platform gives its clients real-time data that helps them make decisions like where to rent or buy properties, when to hold sales and promotions and how to manage assets.

Placer.ai analyzes foot traffic and also creates consumer profiles to help clients make marketing and ad spending decisions. It does this by collecting geolocation and proximity data from devices that are enabled to share that information. Placer.ai’s co-founder and CEO Noam Ben-Zvi says the company protects privacy and follows regulation by displaying aggregated, anonymous data and does not collect personally identifiable data. It also does not sell advertising or raw data.

The company currently serves clients in the retail (including large shopping centers), commercial real estate and hospitality verticals, including JLL, Regency, SRS, Brixmor, Verizon* and Caesars Entertainment.

“Up until now, we’ve been heavily focused on the commercial real estate sector, but this has very organically led us into retail, hospitality, municipalities and even [consumer packaged goods],” Ben-Zvi told TechCrunch in an email. “This presents us with a massive market, so we’re just focused on building out the types of features that will directly address the different needs of our core audience.”

He adds that lack of data has hurt retail businesses with major offline operations, but that “by effectively addressing this gap, we’re helping drive more sustainable growth or larger players or minimizing the risk for smaller companies to drive expansion plans that are strategically aggressive.”

Others startups in the same space include Dor, Aislelabs, RetailNext, ShopperTrak and Density. Ben-Zvi says Placer.ai wants to differentiate by providing more types of real-time data analysis.

While there are a lot of companies touching the location analytics space, we’re in a unique situation as the only company providing these deep and actionable insights for any location in the country in a real-time platform with a wide array of functionality,” he said.

*Disclosure: Verizon Media is the parent company of TechCrunch.

Nov
15
2017
--

Microsoft makes Databricks a first-party service on Azure

 Databricks has made a name for itself as one of the most popular commercial services around the Apache Spark data analytics platform (which, not coincidentally, was started by the founders of Databricks). Now it’s coming to Microsoft’s Azure platform in the form of a preview of the imaginatively named “Azure Databricks.” Read More

Sep
06
2017
--

Dataiku to enhance data tools with $28 million investment led by Battery Ventures

 Dataiku, a French startup that helps data analysts communicate with data scientists to build more meaningful data applications, announced a significant funding round today. The company scored a $28 million Series B investment led by Battery Ventures with help from FirstMark, Serena Capital and Alven. Today’s money brings the total raised to almost $45 million. Its most recent prior round… Read More

Aug
29
2017
--

Meltwater acquires Algo, an AI-based news and data tracker

 Meltwater, a company originally founded in Norway that provides data to more than 25,000 businesses to track where and how they are mentioned in media and other public platforms, has acquired a startup to double down on how it uses machine learning and artificial intelligence to do its job. The company has acquired Algo, a startup that has built a data analytics platform for real-time… Read More

Jun
12
2017
--

Microsoft’s Power BI business analytics tool learns new tricks

 Back in 2015, Microsoft launched a highly visual Power BI data exploration and interactive reporting tool into general availability. The service is now in active use at 200,000 different organizations and the team has shipped 400 updates and new features over the course of the last two years. Current users span a wide gamut and include the likes of the Seattle Seahawks, CA Technologies,… Read More

Jan
11
2017
--

FarmLogs raises $22 million to make agriculture a more predictable business

Jesse Vollmar Farmlogs - 06 Ann Arbor, Michigan-based FarmLogs has raised $22 million in a Series C round of funding for technology that helps farmers monitor and measure their crops, predict profits, manage risks from weather and pests and more. Naspers Ventures led the round, joined by the company’s earlier backers Drive Capital, Huron River Ventures, Hyde Park Venture Partners, SV Angel and individual… Read More

Aug
03
2016
--

Panoply.io raises $7M Series A for its data analytics and warehousing platform

Panoply.io, a startup that wants to make setting up a data warehousing and analytics infrastructure as easy as spinning up an AWS server, today announced that it has raised a $7 million Series A round led by Intel Capital, with participation from previous investor Blumberg Capital. This follows Panoply’s $1.3 million seed round from late last year. Read More

Jun
02
2014
--

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala.

This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the MySQL server to execute queries. This allows for an easy “migration” of existing analytical tools. The results are quite interesting and promising.

Quick How-To

The InfiniDB documentation is not very clear on step-by-step instructions so I’ve created this quick guide:

  1. Install Hadoop cluster (minimum install will work). I’ve used Cloudera Manager (CDH5) to compare the speed of InfiniDB to Cloudera Impala. Install the tools in the “Pre-requirements” sections of InfiniDB for Hadoop Manual
  2. Install the InfiniDB for Hadoop binaries on 1 Hadoop node (you can choose any node).  This will install InfiniDB and its version of MySQL (based on MySQL 5.1).
  3. After installation it will tell you the variables to set and run the postConfigure script. Example:
    export JAVA_HOME=/usr/java/jdk1.6.0_31
    export LD_LIBRARY_PATH=/usr/java/jdk1.6.0_31/jre/lib/amd64/server
    . /root/setenv-hdfs-20
    /usr/local/Calpont/bin/postConfigure
  4. The postConfigure script will ask the questions. Couple imfortant notes:
  • Make sure to use HDFS as a “type of Data Storage”.
  • The performance module 1 (pm1) should point to the host (hostname and IP) you are running the postConfigure script on. Other pm(s) should point to other Hadoop nodes

When installation is finished you will be able to login into MySQL server, it uses script called ibdmysql which will call mysql cli with the correct socket and port. Check that the infiniDB is enabled by running “show engines”, InfiniDB should be in the list.

The next step will be importing data.

Data import

First we will need to create a MySQL table with “engine=InfiniDB”:

CREATE TABLE `ontime` (
  `YearD` int(11) NOT NULL,
  `Quarter` tinyint(4) DEFAULT NULL,
  `MonthD` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `FlightDate` date DEFAULT NULL,
...
) ENGINE=InfiniDB DEFAULT CHARSET=latin1

Second,  I’ve used the cpimport to load the data. It turned out it is much more efficient and easier to load 1 big file rather than 20×12 smaller files (original “ontime” data is 1 file per month), so I’ve exported the “Ontime” data from MySQL table and created 1 big file “ontime.psv”.

I used the following command to export data into InfiniDB:

[root@n0 ontime]# /usr/local/Calpont/bin/cpimport -s '|' ontime ontime ontime.psv
2014-05-20 15:12:58 (18787) INFO : Running distributed import (mode 1) on all PMs...
2014-05-20 15:25:28 (18787) INFO : For table ontime.ontime: 155083620 rows processed and 155083620 rows inserted.
2014-05-20 15:25:28 (18787) INFO : Bulk load completed, total run time : 751.561 seconds

The data is stored in Hadoop:

[root@n0 ontime]# hdfs dfs -du -h /usr/local/Calpont
1.4 G /usr/local/Calpont/data1
1.4 G /usr/local/Calpont/data2
1.4 G /usr/local/Calpont/data3
1.4 G /usr/local/Calpont/data4
1.4 G /usr/local/Calpont/data5
1.4 G /usr/local/Calpont/data6
1.4 G /usr/local/Calpont/data7
1.4 G /usr/local/Calpont/data8

The total size of the data is 8×1.4G = 11.2G (compressed). To compare the size of the same dataset in Impala Parquet format is 3.6G. Original size was ~60G.

[root@n0 ontime]# hdfs dfs -du -h /user/hive/warehouse
3.6 G /user/hive/warehouse/ontime_parquet_snappy

Now we can run the 2 queries I’ve tested before:

1. Simple group-by

mysql> select yeard, count(*) from ontime group by yeard order by yeard;
+-------+----------+
| yeard | count(*) |
+-------+----------+
|  1988 |  5202096 |
|  1989 |  5041200 |
|  1990 |  5270893 |
|  1991 |  5076925 |
|  1992 |  5092157 |
|  1993 |  5070501 |
|  1994 |  5180048 |
|  1995 |  5327435 |
|  1996 |  5351983 |
|  1997 |  5411843 |
|  1998 |  5384721 |
|  1999 |  5527884 |
|  2000 |  5683047 |
|  2001 |  5967780 |
|  2002 |  5271359 |
|  2003 |  6488540 |
|  2004 |  7129270 |
|  2005 |  7140596 |
|  2006 |  7141922 |
|  2007 |  7455458 |
|  2008 |  7009726 |
|  2009 |  6450285 |
|  2010 |  6450117 |
|  2011 |  6085281 |
|  2012 |  6096762 |
|  2013 |  6369482 |
|  2014 |  1406309 |
+-------+----------+
27 rows in set (0.22 sec)

2. The complex query from my original post:

mysql> select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(if(ArrDelayMinutes>30, 1, 0)) as flights_delayed, round(sum(if(ArrDelayMinutes>30, 1, 0))/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC, cnt desc LIMIT  1000;
+------------+------------+---------+----------+-----------------+------+
| min(yeard) | max(yeard) | Carrier | cnt      | flights_delayed | rate |
+------------+------------+---------+----------+-----------------+------+
|       2003 |       2009 | EV      |  1454777 |          237698 | 0.16 |
|       2003 |       2009 | FL      |  1082489 |          158748 | 0.15 |
|       2006 |       2009 | YV      |   740608 |          110389 | 0.15 |
|       2006 |       2009 | XE      |  1016010 |          152431 | 0.15 |
|       2003 |       2009 | B6      |   683874 |          103677 | 0.15 |
|       2001 |       2009 | MQ      |  3238137 |          448037 | 0.14 |
|       2003 |       2005 | DH      |   501056 |           69833 | 0.14 |
|       2004 |       2009 | OH      |  1195868 |          160071 | 0.13 |
|       2003 |       2006 | RU      |  1007248 |          126733 | 0.13 |
|       1988 |       2009 | UA      |  9593284 |         1197053 | 0.12 |
|       2003 |       2006 | TZ      |   136735 |           16496 | 0.12 |
|       1988 |       2001 | TW      |  2656286 |          280283 | 0.11 |
|       1988 |       2009 | AA      | 10568437 |         1183786 | 0.11 |
|       1988 |       2009 | CO      |  6023831 |          673354 | 0.11 |
|       1988 |       2009 | DL      | 11866515 |         1156048 | 0.10 |
|       2003 |       2009 | OO      |  2654259 |          257069 | 0.10 |
|       1988 |       2009 | AS      |  1506003 |          146920 | 0.10 |
|       2007 |       2009 | 9E      |   577244 |           59440 | 0.10 |
|       1988 |       2009 | US      | 10276862 |          990995 | 0.10 |
|       1988 |       2009 | NW      |  7601727 |          725460 | 0.10 |
|       1988 |       2005 | HP      |  2607603 |          235675 | 0.09 |
|       1988 |       2009 | WN      | 12722174 |         1107840 | 0.09 |
|       2005 |       2009 | F9      |   307569 |           28679 | 0.09 |
|       1988 |       1991 | PA      |   203401 |           19263 | 0.09 |
+------------+------------+---------+----------+-----------------+------+
24 rows in set (0.86 sec)

The same query in impala (on the same hardware) runs for 7.18 seconds:

[n8.local:21000] > select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(if(ArrDelayMinutes>30, 1, 0)) as flights_delayed, round(sum(if(ArrDelayMinutes>30, 1, 0))/count(*),2) as rate FROM ontime_parquet_snappy WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC LIMIT  1000;
Query: select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(if(ArrDelayMinutes>30, 1, 0)) as flights_delayed, round(sum(if(ArrDelayMinutes>30, 1, 0))/count(*),2) as rate FROM ontime_parquet_snappy WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC LIMIT  1000
  +------------+------------+---------+----------+-----------------+------+
  | min(yeard) | max(yeard) | carrier | cnt      | flights_delayed | rate |
  +------------+------------+---------+----------+-----------------+------+
  | 2003       | 2009       | EV      | 1454777  | 237698          | 0.16 |
  | 2003       | 2009       | FL      | 1082489  | 158748          | 0.15 |
  | 2006       | 2009       | XE      | 1016010  | 152431          | 0.15 |
  | 2006       | 2009       | YV      | 740608   | 110389          | 0.15 |
  | 2003       | 2009       | B6      | 683874   | 103677          | 0.15 |
  | 2001       | 2009       | MQ      | 3238137  | 448037          | 0.14 |
  | 2003       | 2005       | DH      | 501056   | 69833           | 0.14 |
  | 2004       | 2009       | OH      | 1195868  | 160071          | 0.13 |
  | 2003       | 2006       | RU      | 1007248  | 126733          | 0.13 |
  | 1988       | 2009       | UA      | 9593284  | 1197053         | 0.12 |
  | 2003       | 2006       | TZ      | 136735   | 16496           | 0.12 |
  | 1988       | 2001       | TW      | 2656286  | 280283          | 0.11 |
  | 1988       | 2009       | CO      | 6023831  | 673354          | 0.11 |
  | 1988       | 2009       | AA      | 10568437 | 1183786         | 0.11 |
  | 1988       | 2009       | US      | 10276862 | 990995          | 0.10 |
  | 2007       | 2009       | 9E      | 577244   | 59440           | 0.10 |
  | 1988       | 2009       | DL      | 11866515 | 1156048         | 0.10 |
  | 2003       | 2009       | OO      | 2654259  | 257069          | 0.10 |
  | 1988       | 2009       | NW      | 7601727  | 725460          | 0.10 |
  | 1988       | 2009       | AS      | 1506003  | 146920          | 0.10 |
  | 1988       | 1991       | PA      | 203401   | 19263           | 0.09 |
  | 1988       | 2009       | WN      | 12722174 | 1107840         | 0.09 |
  | 1988       | 2005       | HP      | 2607603  | 235675          | 0.09 |
  | 2005       | 2009       | F9      | 307569   | 28679           | 0.09 |
  +------------+------------+---------+----------+-----------------+------+
  Returned 24 row(s) in 7.18s

Conclusion and charts

To summaries I’ve created the following charts:

Simple query:

As we can see InfiniDB looks pretty good here. It also uses MySQL protocol, so existing application which uses MySQL will be able to work here without any additional “connectors”.

One note regarding my query example: the “complex” query is designed in a way that will make it hard to use any particular set of index; this query will have to scan the >70% of the table to generate the resultset. That is why it is so slow in MySQL compared to columnar databases. Another “issue” is that the table is very wide and most of the columns are declared as varchar (table is not normalized), which makes it large in MySQL. All this will make it ideal for columnar storage and compression. Other cases may not show that huge of a difference.

So far I was testing with small data (60G), I will plan to run big data benchmark next.

The post Using InfiniDB MySQL server with Hadoop cluster for data analytics appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com