Oct
13
2022
--

Using ClickHouse as an Analytic Extension for MySQL

ClickHouse as an Analytic Extension for MySQL

ClickHouse as an Analytic Extension for MySQLMySQL is an outstanding database for online transaction processing. With suitable hardware, it is easy to execute more than 1M queries per second and handle tens of thousands of simultaneous connections. Many of the most demanding web applications on the planet are built on MySQL. With capabilities like that, why would MySQL users need anything else?  

Well, analytic queries for starters. Analytic queries answer important business questions like finding the number of unique visitors to a website over time or figuring out how to increase online purchases. They scan large volumes of data and compute aggregates, including sums, averages, and much more complex calculations besides. The results are invaluable but can bog down online transaction processing on MySQL. 

Fortunately, there’s ClickHouse: a powerful analytic database that pairs well with MySQL. Percona is working closely with our partner Altinity to help users add ClickHouse easily to existing MySQL applications. You can read more about our partnership in our recent press release as well as about our joint MySQL-to-ClickHouse solution. 

This article provides tips on how to recognize when MySQL is overburdened with analytics and can benefit from ClickHouse’s unique capabilities. We then show three important patterns for integrating MySQL and ClickHouse. The result is more powerful, cost-efficient applications that leverage the strengths of both databases. 

Signs that indicate MySQL needs analytic help

Let’s start by digging into some obvious signs that your MySQL database is overburdened with analytics processing. 

Huge tables of immutable data mixed in with transaction tables 

Tables that drive analytics tend to be very large, rarely have updates, and may also have many columns. Typical examples are web access logs, marketing campaign events, and monitoring data. If you see a few outlandishly large tables of immutable data mixed with smaller, actively updated transaction processing tables, it’s a good sign your users may benefit from adding an analytic database. 

Complex aggregation pipelines

Analytic processing produces aggregates, which are numbers that summarize large datasets to help users identify patterns. Examples include unique site visitors per week, average page bounce rates, or counts of web traffic sources. MySQL may take minutes or even hours to compute such values. To improve performance it is common to add complex batch processes that precompute aggregates. If you see such aggregation pipelines, it is often an indication that adding an analytic database can reduce the labor of operating your application as well as deliver faster and more timely results for users. 

MySQL is too slow or inflexible to answer important business questions

A final clue is the in-depth questions you don’t ask about MySQL-based applications because it is too hard to get answers. Why don’t users complete purchases on eCommerce sites?  Which strategies for in-game promotions have the best payoff in multi-player games? Answering these questions directly from MySQL transaction data often requires substantial time and external programs. It’s sufficiently difficult that most users simply don’t bother. Coupling MySQL with a capable analytic database may be the answer. 

Why is ClickHouse a natural complement to MySQL? 

MySQL is an outstanding database for transaction processing. Yet the features of MySQL that make it work well–storing data in rows, single-threaded queries, and optimization for high concurrency–are exactly the opposite of those needed to run analytic queries that compute aggregates on large datasets.

ClickHouse on the other hand is designed from the ground up for analytic processing. It stores data in columns, has optimizations to minimize I/O, computes aggregates very efficiently, and parallelizes query processing. ClickHouse can answer complex analytic questions almost instantly in many cases, which allows users to sift through data quickly. Because ClickHouse calculates aggregates so efficiently, end users can pose questions in many ways without help from application designers. 

These are strong claims. To understand them it is helpful to look at how ClickHouse differs from MySQL. Here is a diagram that illustrates how each database pulls in data for a query that reads all values of three columns of a table. 

MySQL stores table data by rows. It must read the whole row to get data for just three columns. MySQL production systems also typically do not use compression, as it has performance downsides for transaction processing. Finally, MySQL uses a single thread for query processing and cannot parallelize work. 

By contrast, ClickHouse reads only the columns referenced in queries. Storing data in columns enables ClickHouse to compress data at levels that often exceed 90%. Finally, ClickHouse stores tables in parts and scans them in parallel.

The amount of data you read, how greatly it is compressed, and the ability to parallelize work make an enormous difference. Here’s a picture that illustrates the reduction in I/O for a query reading three columns.  

MySQL and ClickHouse give the same answer. To get it, MySQL reads 59 GB of data, whereas ClickHouse reads only 21 MB. That’s close to 3000 times less I/O, hence far less time to access the data. ClickHouse also parallelizes query execution very well, further improving performance. It is little wonder that analytic queries run hundreds or even thousands of times faster on ClickHouse than on MySQL. 

ClickHouse also has a rich set of features to run analytic queries quickly and efficiently. These include a large library of aggregation functions, the use of SIMD instructions where possible, the ability to read data from Kafka event streams, and efficient materialized views, just to name a few. 

There is a final ClickHouse strength: excellent integration with MySQL. Here are a few examples. 

  • ClickHouse can ingest mysqldump and CSV data directly into ClickHouse tables. 
  • ClickHouse can perform remote queries on MySQL tables, which provides another way to explore as well as ingest data quickly. 
  • The ClickHouse query dialect is similar to MySQL, including system commands like SHOW PROCESSLIST, for example. 
  • ClickHouse even supports the MySQL wire protocol on port 3306. 

For all of these reasons, ClickHouse is a natural choice to extend MySQL capabilities for analytic processing. 

Why is MySQL a natural complement to ClickHouse? 

Just as ClickHouse can add useful capabilities to MySQL, it is important to see that MySQL adds useful capabilities to ClickHouse. ClickHouse is outstanding for analytic processing but there are a number of things it does not do well. Here are some examples.  

  • Transaction processing – ClickHouse does not have full ACID transactions. You would not want to use ClickHouse to process online orders. MySQL does this very well. 
  • Rapid updates on single rows – Selecting all columns of a single row is inefficient in ClickHouse, as you must read many files. Updating a single row may require rewriting large amounts of data. You would not want to put eCommerce session data in ClickHouse. It is a standard use case for MySQL. 
  • Large numbers of concurrent queries – ClickHouse queries are designed to use as many resources as possible, not share them across many users. You would not want to use ClickHouse to hold metadata for microservices, but MySQL is commonly used for such purposes. 

In fact, MySQL and ClickHouse are highly complementary. Users get the most powerful applications when ClickHouse and MySQL are used together. 

Introducing ClickHouse to MySQL integration

There are three main ways to integrate MySQL data with ClickHouse analytic capabilities. They build on each other. 

  • View MySQL data from ClickHouse. MySQL data is queryable from ClickHouse using native ClickHouse SQL syntax. This is useful for exploring data as well as joining on data where MySQL is the system of record.
  • Move data permanently from MySQL to ClickHouse. ClickHouse becomes the system of record for that data. This unloads MySQL and gives better results for analytics. 
  • Mirror MySQL data in ClickHouse. Make a snapshot of data into ClickHouse and keep it updated using replication. This allows users to ask complex questions about transaction data without burdening transaction processing. 

Viewing MySQL data from ClickHouse

ClickHouse can run queries on MySQL data using the MySQL database engine, which makes MySQL data appear as local tables in ClickHouse. Enabling it is as simple as executing a single SQL command like the following on ClickHouse:

CREATE DATABASE sakila_from_mysql
ENGINE=MySQLDatabase('mydb:3306', 'sakila', 'user', 'password')

Here is a simple illustration of the MySQL database engine in action. 

The MySQL database engine makes it easy to explore MySQL tables and make copies of them in ClickHouse. ClickHouse queries on remote data may even run faster than in MySQL! This is because ClickHouse can sometimes parallelize queries even on remote data. It also offers more efficient aggregation once it has the data in hand. 

Moving MySQL data to ClickHouse

Migrating large tables with immutable records permanently to ClickHouse can give vastly accelerated analytic query performance while simultaneously unloading MySQL. The following diagram illustrates how to migrate a table containing web access logs from ClickHouse to MySQL. 

On the ClickHouse side, you’ll normally use MergeTree table engine or one of its variants such as ReplicatedMergeTree. MergeTree is the go-to engine for big data on ClickHouse. Here are three important features that will help you get the most out of ClickHouse. 

  1. Partitioning – MergeTree divides tables into parts using a partition key. Access logs and other big data tend to be time ordered, so it’s common to divide data by day, week, or month. For best performance, it’s advisable to pick a number that results in 1000 parts or less. 
  2. Ordering – MergeTree sorts rows and constructs an index on rows to match the ordering you choose. It’s important to pick a sort order that gives you large “runs” when scanning data. For instance, you could sort by tenant followed by time. That would mean a query on a tenant’s data would not need to jump around to find rows related to that tenant. 
  3. Compression and codecs – ClickHouse uses LZ4 compression by default but also offers ZSTD compression as well as codecs. Codecs reduce column data before turning it over to compression. 

These features can make an enormous difference in performance. We cover them and add more performance tips in Altinity videos (look here and here.) as well as blog articles.

The ClickHouse MySQL database engine can also be very useful in this scenario. It enables ClickHouse to “see” and select data from remote transaction tables in MySQL. Your ClickHouse queries can join local tables on transaction data whose natural home is MySQL. Meanwhile, MySQL handles transactional changes efficiently and safely. 

Migrating tables to ClickHouse generally proceeds as follows. We’ll use the example of the access log shown above. 

  1. Create a matching schema for the access log data on ClickHouse. 
  2. Dump/load data from MySQL to ClickHouse using any of the following tools:
    1. Mydumper – A parallel dump/load utility that handles mysqldump and CSV formats. 
    2. MySQL Shell – A general-purpose utility for managing MySQL that can import and export tables. 
    3. Copy data using SELECT on MySQL database engine tables.
    4. Native database commands – Use MySQL SELECT OUTFILE to dump data to CSV and read back in using ClickHouse INSERT SELECT FROM file(). ClickHouse can even read mysqldump format.  
  3. Check performance with suitable queries; make adjustments to the schema and reload if necessary. 
  4. Adapt front end and access log ingest to ClickHouse. 
  5. Run both systems in parallel while testing. 
  6. Cut over from MySQL only to MySQL + ClickHouse Extension. 

Migration can take as little as a few days but it’s more common to take weeks to a couple of months in large systems. This helps ensure that everything is properly tested and the roll-out proceeds smoothly.  

Mirroring MySQL Data in ClickHouse

The other way to extend MySQL is to mirror the data in ClickHouse and keep it up to date using replication. Mirroring allows users to run complex analytic queries on transaction data without (a) changing MySQL and its applications or (b) affecting the performance of production systems. 

Here are the working parts of a mirroring setup. 

ClickHouse has a built-in way to handle mirroring: the experimental MaterializedMySQL database engine, which reads binlog records directly from the MySQL primary and propagates data into ClickHouse tables. The approach is simple but is not yet recommended for production use. It may eventually be important for 1-to-1 mirroring cases but needs additional work before it can be widely used. 

Altinity has developed a new approach to replication using Debezium, Kafka-compatible event streams, and the Altinity Sink Connector for ClickHouse. The mirroring configuration looks like the following. 

The externalized approach has a number of advantages. They include working with current ClickHouse releases, taking advantage of fast dump/load programs like mydumper or direct SELECT using MySQL database engine, support for mirroring into replicated tables, and simple procedures to add new tables or reset old ones. Finally, it can extend to multiple upstream MySQL systems replicating to a single ClickHouse cluster. 

ClickHouse can mirror data from MySQL thanks to the unique capabilities of the ReplacingMergeTree table. It has an efficient method of dealing with inserts, updates, and deletes that is ideally suited for use with replicated data. As mentioned already, ClickHouse cannot update individual rows easily, but it inserts data extremely quickly and has an efficient process for merging rows in the background. ReplicatingMergeTree builds on these capabilities to handle changes to data in a “ClickHouse way.”  

Replicated table rows use version and sign columns to represent the version of changed rows as well as whether the change is an insert or delete. The ReplacingMergeTree will only keep the last version of a row, which may in fact be deleted. The sign column lets us apply another ClickHouse trick to make those deleted rows inaccessible. It’s called a row policy. Using row policies we can make any row where the sign column is negative disappear.  

Here’s an example of ReplacingMergeTree in action that combines the effect of the version and sign columns to handle mutable data. 

Mirroring data into ClickHouse may appear more complex than migration but in fact is relatively straightforward because there is no need to change MySQL schema or applications and the ClickHouse schema generation follows a cookie-cutter pattern. The implementation process consists of the following steps. 

  1. Create schema for replicated tables in ClickHouse. 
  2. Configure and start replication from MySQL to ClickHouse. 
  3. Dump/load data from MySQL to ClickHouse using the same tools as migration. 

At this point, users are free to start running analytics or build additional applications on ClickHouse whilst changes replicate continuously from MySQL. 

Tooling improvements are on the way!

MySQL to ClickHouse migration is an area of active development both at Altinity as well as the ClickHouse community at large.  Improvements fall into three general categories.  

Dump/load utilities – Altinity is working on a new utility to move data that reduces schema creation and transfer of data to a single.  We will have more to say on this in a future blog article. 

Replication – Altinity is sponsoring the Sink Connector for ClickHouse, which automates high-speed replication, including monitoring as well as integration into Altinity.Cloud. Our goal is similarly to reduce replication setup to a single command. 

ReplacingMergeTree – Currently users must include the FINAL keyword on table names to force the merging of row changes. It is also necessary to add a row policy to make deleted rows disappear automatically.  There are pull requests in progress to add a MergeTree property to add FINAL automatically in queries as well as make deleted rows disappear without a row policy. Together they will make handling of replicated updates and deletes completely transparent to users. 

We are also watching carefully for improvements on MaterializedMySQL as well as other ways to integrate ClickHouse and MySQL efficiently. You can expect further blog articles in the future on these and related topics. Stay tuned!

Wrapping up and getting started

ClickHouse is a powerful addition to existing MySQL applications. Large tables with immutable data, complex aggregation pipelines, and unanswered questions on MySQL transactions are clear signs that integrating ClickHouse is the next step to provide fast, cost-efficient analytics to users. 

Depending on your application, it may make sense to mirror data onto ClickHouse using replication or even migrate some tables into ClickHouse. ClickHouse already integrates well with MySQL and better tooling is arriving quickly. Needless to say, all Altinity contributions in this area are open source, released under Apache 2.0 license. 

The most important lesson is to think in terms of MySQL and ClickHouse working together, not as one being a replacement for the other. Each database has unique and enduring strengths. The best applications will build on these to provide users with capabilities that are faster and more flexible than using either database alone. 

Percona, well-known experts in open source databases, partners with Altinity to deliver robust analytics for MySQL applications. If you would like to learn more about MySQL integration with ClickHouse, feel free to contact us or leave a message on our forum at any time. 

Aug
04
2021
--

FullStory raises $103M at a $1.8B valuation to combat rage clicks on websites and apps

Even with all the years of work that have been put into improving how screen-based interfaces work, our experiences with websites, mobile apps and any other interactive service you might use still often come up short: we can’t find what we want, we’re bombarded with exactly what we don’t need or the flow is just buggy in one way or another.

Now, FullStory, one of the startups that’s built a platform to identify when all of the above happens and provide suggestions to publishers for fixing it — it’s obsessed enough with the issue that it went so far as to trademark the phrase “Rage Clicks”, the focus of its mission — is announcing a big round of funding, a sign of its success and ambitions to do more.

The Atlanta-based company has closed a Series D round of $103 million, an oversubscribed round that actually was still growing between me interviewing the company and publishing this story (when we talked last week the figure was $100 million). Permira’s growth fund — which has previously invested in other customer experience startups like Klarna and Nexthink — is leading this round, with previous investors Kleiner Perkins, GV, Stripes, Dell Technologies Capital, Salesforce Ventures and Glynn Capital also participating.

FullStory, which has raised close to $170 million to date, has confirmed that the investment values the company at $1.8 billion.

Scott Voigt, FullStory’s founder and CEO, tells me that FullStory currently has some 3,100 paying customers on its books across verticals like retail, SaaS, finance and travel (customers include Peloton, the Financial Times, VMware and JetBlue), which collectively are on course to rack up more than 15 billion user sessions this year — working out to 1 trillion interactions involving clicks, navigations, highlights, scrolls and frustration signals. It says that annual recurring revenue has to date risen by more than 70% year-on-year.

The plan now will be to continue investing in R&D to bring more real-time intelligence into its products, “and pass those insights on to customers,” and also to “move more aggressively into Europe and Asia Pacific,” he added.

FullStory competes with others like Glassbox and Decibel, although it also claims its tools have more presence on websites than its three biggest competitors combined.

Working across different divisions like product, customer success and marketing, and engineering, FullStory uses machine learning algorithms to analyze how people navigate websites and other digital interfaces.

If approved as part of the “consent gate” you might encounter because of, say, GDPR regulations, it then tracks things like when people are clicking in areas excessively over a short period of time because of delays (the so-called “rage clicks”); or when a click leads nowhere because of, for example, a blip in a piece of JavaScript; or when a person is just scrolling or moving their mouse or cursor or finger in a frustrated (fast) way — again with little or no subsequent activity (or activity from the customer ceasing altogether) resulting from it. It doesn’t use — nor does it have plans to — use eye tracking, or anything like sentiment analysis around data that customers put into, say, customer response windows.

FullStory then packages up the insights that it does collect into data streams that can be used with various visualization tools (having Salesforce as a strategic backer is interesting in this regard, given that it owns Tableau), or spreadsheets, or whatever a customer chooses to put them into. While it doesn’t offer direct remediation (perhaps an area it could tackle in the future), it does offer suggestions for alternative actions to fix whatever problems are arising.

Part of what has given FullStory a big boost in recent times (this round is by far the biggest fundraise the company has ever done) is the fact that, in today’s world, digital business has become the centerpiece of all business. Because of COVID-19 and the need for social distancing that have taken away some of the traffic of in-person experiences like going to stores, organizations that have natively or built experiences online are seeing unprecedented amounts of traffic; and they are now joined by organizations that have shifted into digital experiences simply to stay in business.

All of that has contributed to a huge amount of content online, and a big shift in mindset to making it better (and in the most urgent of cases, even more basically, simply usable), and that has resulted in the stars aligning for companies like FullStory.

“The category was so nascent to begin with that we had to explain the concept to customers,” Voigt told me of the company’s early days, where selling meant selling would-be customers on to the very idea of digital experience insights. “But digital experience, in the wake of COVID-19, suddenly mattered more than it ever has before, and the continued amount of inbound interest has been afterburner for us.” He noted that demand is increasing among mid-market and enterprise organizations, and something that has also helped FullStory grow is the general movement of talent in the industry.

“Our customers tend to take their tools with them when they change their jobs,” he said. Those tools include FullStory’s analytics.

The evolution of bringing more AI into the world of basically structuring what might otherwise be unstructured data has been a big boost to the world of analytics, and investors are interested in FullStory because of how it’s taken that trend and grown its business on top of it.

“We are very excited to partner with the FullStory team as they continue to expand and build a truly extraordinary technology brand that improves the digital experience for all stakeholders,” said Alex Melamud, who led the transaction on behalf of Permira Growth, in a statement.

“Traditional analytics have been upended by AI- and ML-enabled approaches that can instantly uncover nuanced patterns and anomalies in customer behavior,” said Bruce Chizen, a senior advisor at Permira, in a statement. “Leveraging both structured and unstructured data, FullStory has rapidly established itself as the market and technology leader in DXI and is now the fastest-growing company in the category and the de facto system of record for all digital experience data.” Chizen is joining the FullStory Board with this round.

Jun
24
2021
--

Firebolt raises $127M more for its new approach to cheaper and more efficient Big Data analytics

Snowflake changed the conversation for many companies when it comes to the potentials of data warehousing. Now one of the startups that’s hoping to disrupt the disruptor is announcing a big round of funding to expand its own business.

Firebolt, which has built a new kind of cloud data warehouse that promises much more efficient, and cheaper, analytics around whatever is stored within it, is announcing a major Series B of $127 million on the heels of huge demand for its services.

The company, which only came out of stealth mode in December, is not disclosing its valuation with this round, which brings the total raised by the Israeli company to $164 million. New backers Dawn Capital and K5 Global are in this round, alongside previous backers Zeev Ventures, TLV Partners, Bessemer Venture Partners and Angular Ventures.

Nor is it disclosing many details about its customers at the moment. CEO and co-founder Eldad Farkash told me in an interview that most of them are U.S.-based, and that the numbers have grown from the dozen or so that were using Firebolt when it was still in stealth mode (it worked quietly for a couple of years building its product and onboarding customers before finally launching six months ago). They are all migrating from existing data warehousing solutions like Snowflake or BigQuery. In other words, its customers are already cloud-native, Big Data companies: it’s not trying to proselytize on the basic concept but work with those who are already in a specific place as a business.

“If you’re not using Snowflake or BigQuery already, we prefer you come back to us later,” he said. Judging by the size and quick succession of the round, that focus is paying off.

The challenge that Firebolt set out to tackle is that while data warehousing has become a key way for enterprises to analyze, update and manage their big data stores — after all, your data is only as good as the tools you have to parse it and keep it secure — typically data warehousing solutions are not efficient, and they can cost a lot of money to maintain.

The challenge was seen firsthand by the three founders of Firebolt, Farkash (CEO), Saar Bitner (COO) and Ariel Yaroshevich (CTO) when they were at a previous company, the business intelligence powerhouse Sisense, where respectively they were one of its co-founders and two members of its founding team. At Sisense, the company continually came up against an issue: When you are dealing in terabytes of data, cloud data warehouses were straining to deliver good performance to power their analytics and other tools, and the only way to potentially continue to mitigate that was by piling on more cloud capacity. And that started to become very expensive.

Firebolt set out to fix that by taking a different approach, rearchitecting the concept. As Farkash sees it, while data warehousing has indeed been a big breakthrough in Big Data, it has started to feel like a dated solution as data troves have grown.

“Data warehouses are solving yesterday’s problem, which was, ‘How do I migrate to the cloud and deal with scale?’” he told me back in December. Google’s BigQuery, Amazon’s RedShift and Snowflake are fitting answers for that issue, he believes, but “we see Firebolt as the new entrant in that space, with a new take on design on technology. We change the discussion from one of scale to one of speed and efficiency.”

The startup claims that its performance is up to 182 times faster than that of other data warehouses with a SQL-based system that works on academic research that had yet to be applied anywhere, around how to handle data in a lighter way, using new techniques in compression and how data is parsed. Data lakes in turn can be connected with a wider data ecosystem, and what it translates to is a much smaller requirement for cloud capacity. And lower costs.

Fast forward to today, and the company says the concept is gaining a lot of traction with engineers and developers in industries like business intelligence, customer-facing services that need to parse a lot of information to serve information to users in real time and back-end data applications. That is proving out what investors suspected would be a shift before the startup even launched, stealthily or otherwise.

“I’ve been an investor at Firebolt since their Series A round and before they had any paying customers,” said Oren Zeev of Zeev Ventures. “What had me invest in Firebolt is mostly the team. A group of highly experienced executives mostly from the big data space who understand the market very well, and the pain organizations are experiencing. In addition, after speaking to a few of my portfolio companies and Firebolt’s initial design partners, it was clear that Firebolt is solving a major pain, so all in all, it was a fairly easy decision. The market in which Firebolt operates is huge if you consider the valuations of Snowflake and Databricks. Even more importantly, it is growing rapidly as the migration from on-premise data warehouse platforms to the cloud is gaining momentum, and as more and more companies rely on data for their operations and are building data applications.”

Apr
30
2021
--

Analytics as a service: Why more enterprises should consider outsourcing

With an increasing number of enterprise systems, growing teams, a rising proliferation of the web and multiple digital initiatives, companies of all sizes are creating loads of data every day. This data contains excellent business insights and immense opportunities, but it has become impossible for companies to derive actionable insights from this data consistently due to its sheer volume.

According to Verified Market Research, the analytics-as-a-service (AaaS) market is expected to grow to $101.29 billion by 2026. Organizations that have not started on their analytics journey or are spending scarce data engineer resources to resolve issues with analytics implementations are not identifying actionable data insights. Through AaaS, managed services providers (MSPs) can help organizations get started on their analytics journey immediately without extravagant capital investment.

MSPs can take ownership of the company’s immediate data analytics needs, resolve ongoing challenges and integrate new data sources to manage dashboard visualizations, reporting and predictive modeling — enabling companies to make data-driven decisions every day.

AaaS could come bundled with multiple business-intelligence-related services. Primarily, the service includes (1) services for data warehouses; (2) services for visualizations and reports; and (3) services for predictive analytics, artificial intelligence (AI) and machine learning (ML). When a company partners with an MSP for analytics as a service, organizations are able to tap into business intelligence easily, instantly and at a lower cost of ownership than doing it in-house. This empowers the enterprise to focus on delivering better customer experiences, be unencumbered with decision-making and build data-driven strategies.

Organizations that have not started on their analytics journey or are spending scarce data engineer resources to resolve issues with analytics implementations are not identifying actionable data insights.

In today’s world, where customers value experiences over transactions, AaaS helps businesses dig deeper into their psyche and tap insights to build long-term winning strategies. It also enables enterprises to forecast and predict business trends by looking at their data and allows employees at every level to make informed decisions.

Mar
23
2021
--

Orca Security raises $210M Series C at a unicorn valuation

Orca Security, an Israeli cybersecurity startup that offers an agent-less security platform for protecting cloud-based assets, today announced that it has raised a $210 million Series C round at a $1.2 billion valuation. The round was led by Alphabet’s independent growth fund CapitalG and Redpoint Ventures. Existing investors GGV Capital, ICONIQ Growth and angel syndicate Silicon Valley CISO Investment also participated. YL Ventures, which led Orca’s seed round and participated in previous rounds, is not participating in this round — and it’s worth noting that the firm recently sold its stake in Axonius after that company reached unicorn status.

If all of this sounds familiar, that may be because Orca only raised its $55 million Series B round in December, after it announced its $20.5 million Series A round in May. That’s a lot of funding rounds in a short amount of time, but something we’ve been seeing more often in the last year or so.

Orca Security co-founders Gil Geron (left) and Avi Shua (right). Image Credits: Orca Security

As Orca co-founder and CEO Avi Shua told me, the company is seeing impressive growth and it — and its investors — want to capitalize on this. The company ended last year beating its own forecast from a few months before, which he noted was already aggressive, by more than 50%. Its current slate of customers includes Robinhood, Databricks, Unity, Live Oak Bank, Lemonade and BeyondTrust.

“We are growing at an unprecedented speed,” Shua said. “We were 20-something people last year. We are now closer to a hundred and we are going to double that by the end of the year. And yes, we’re using this funding to accelerate on every front, from dramatically increasing the product organization to add more capabilities to our platform, for post-breach capabilities, for identity access management and many other areas. And, of course, to increase our go-to-market activities.”

Shua argues that most current cloud security tools don’t really work in this new environment. Many, because they are driven by metadata, can only detect a small fraction of the risks, and agent-based solutions may take months to deploy and still not cover a business’ entire cloud estate. The promise of Orca Security is that it can not only cover a company’s entire range of cloud assets but that it is also able to help security teams prioritize the risks they need to focus on. It does so by using what the company calls its “SideScanning” technology, which allows it to map out a company’s entire cloud environment and file systems.

“Almost all tools are essentially just looking at discrete risk trees and not the forest. The risk is not just about how pickable the lock is, it’s also where the lock resides and what’s inside the box. But most tools just look at the issues themselves and prioritize the most pickable lock, ignoring the business impact and exposure — and we change that.”

It’s no secret that there isn’t a lot of love lost between Orca and some of its competitors. Last year, Palo Alto Networks sent Orca Security a sternly worded letter (PDF) to stop it from comparing the two services. Shua was not amused at the time and decided to fight it. “I completely believe there is space in the markets for many vendors, and they’ve created a lot of great products. But I think the thing that simply cannot be overlooked, is a large company that simply tries to silence competition. This is something that I believe is counterproductive to the industry. It tries to harm competition, it’s illegal, it’s unconstitutional. You can’t use lawyers to take your competitors out of the media.”

Currently, though, it doesn’t look like Orca needs to worry too much about the competition. As GGV Capital managing partner Glenn Solomon told me, as the company continues to grow and bring in new customers — and learn from the data it pulls in from them — it is also able to improve its technology.

“Because of the novel technology that Avi and [Orca Security co-founder and CPO] Gil [Geron] have developed — and that Orca is now based on — they see so much. They’re just discovering more and more ways and have more and more plans to continue to expand the value that Orca is going to provide to customers. They sit in a very good spot to be able to continue to leverage information that they have and help DevOps teams and security teams really execute on good hygiene in every imaginable way going forward. I’m super excited about that future.”

As for this funding round, Shua noted that he found CapitalG to be a “huge believer” in this space and an investor that is looking to invest into the company for the long run (and not just trying to make a quick buck). The fact that CapitalG is associated with Alphabet was obviously also a draw.

“Being associated with Alphabet, which is one of the three major cloud providers, allowed us to strengthen the relationship, which is definitely a benefit for Orca,” he said. “During the evaluation, they essentially put Orca in front of the security leadership at Google. Definitely, they’ve done their own very deep due diligence as part of that.”


Early Stage is the premier ‘how-to’ event for startup entrepreneurs and investors. You’ll hear first-hand how some of the most successful founders and VCs build their businesses, raise money and manage their portfolios. We’ll cover every aspect of company-building: Fundraising, recruiting, sales, product market fit, PR, marketing and brand building. Each session also has audience participation built-in – there’s ample time included for audience questions and discussion. Use code “TCARTICLE” at checkout to get 20 percent off tickets right here.

Mar
22
2021
--

No-code business intelligence service y42 raises $2.9M seed round

Berlin-based y42 (formerly known as Datos Intelligence), a data warehouse-centric business intelligence service that promises to give businesses access to an enterprise-level data stack that’s as simple to use as a spreadsheet, today announced that it has raised a $2.9 million seed funding round led by La Famiglia VC. Additional investors include the co-founders of Foodspring, Personio and Petlab.

The service, which was founded in 2020, integrates with more than 100 data sources, covering all the standard B2B SaaS tools, from Airtable to Shopify and Zendesk, as well as database services like Google’s BigQuery. Users can then transform and visualize this data, orchestrate their data pipelines and trigger automated workflows based on this data (think sending Slack notifications when revenue drops or emailing customers based on your own custom criteria).

Like similar startups, y42 extends the idea data warehouse, which was traditionally used for analytics, and helps businesses operationalize this data. At the core of the service is a lot of open source and the company, for example, contributes to GitLabs’ Meltano platform for building data pipelines.

y42 founder and CEO Hung Dang

y42 founder and CEO Hung Dang. Image Credits: y42

“We’re taking the best of breed open-source software. What we really want to accomplish is to create a tool that is so easy to understand and that enables everyone to work with their data effectively,” Y42 founder and CEO Hung Dang told me. “We’re extremely UX obsessed and I would describe us as a no-code/low-code BI tool — but with the power of an enterprise-level data stack and the simplicity of Google Sheets.”

Before y42, Vietnam-born Dang co-founded a major events company that operated in more than 10 countries and made millions in revenue (but with very thin margins), all while finishing up his studies with a focus on business analytics. And that in turn led him to also found a second company that focused on B2B data analytics.

Image Credits: y42

Even while building his events company, he noted, he was always very product- and data-driven. “I was implementing data pipelines to collect customer feedback and merge it with operational data — and it was really a big pain at that time,” he said. “I was using tools like Tableau and Alteryx, and it was really hard to glue them together — and they were quite expensive. So out of that frustration, I decided to develop an internal tool that was actually quite usable and in 2016, I decided to turn it into an actual company. ”

He then sold this company to a major publicly listed German company. An NDA prevents him from talking about the details of this transaction, but maybe you can draw some conclusions from the fact that he spent time at Eventim before founding y42.

Given his background, it’s maybe no surprise that y42’s focus is on making life easier for data engineers and, at the same time, putting the power of these platforms in the hands of business analysts. Dang noted that y42 typically provides some consulting work when it onboards new clients, but that’s mostly to give them a head start. Given the no-code/low-code nature of the product, most analysts are able to get started pretty quickly — and for more complex queries, customers can opt to drop down from the graphical interface to y42’s low-code level and write queries in the service’s SQL dialect.

The service itself runs on Google Cloud and the 25-people team manages about 50,000 jobs per day for its clients. The company’s customers include the likes of LifeMD, Petlab and Everdrop.

Until raising this round, Dang self-funded the company and had also raised some money from angel investors. But La Famiglia felt like the right fit for y42, especially due to its focus on connecting startups with more traditional enterprise companies.

“When we first saw the product demo, it struck us how on top of analytical excellence, a lot of product development has gone into the y42 platform,” said Judith Dada, general partner at LaFamiglia VC. “More and more work with data today means that data silos within organizations multiply, resulting in chaos or incorrect data. y42 is a powerful single source of truth for data experts and non-data experts alike. As former data scientists and analysts, we wish that we had y42 capabilities back then.”

Dang tells me he could have raised more but decided that he didn’t want to dilute the team’s stake too much at this point. “It’s a small round, but this round forces us to set up the right structure. For the Series A, which we plan to be towards the end of this year, we’re talking about a dimension which is 10x,” he told me.

Mar
19
2021
--

It’s time to abandon business intelligence tools

Organizations spend ungodly amounts of money — millions of dollars — on business intelligence (BI) tools. Yet, adoption rates are still below 30%. Why is this the case? Because BI has failed businesses.

Logi Analytics’ 2021 State of Analytics: Why Users Demand Better survey showed that knowledge workers spend more than five hours a day in analytics, and more than 99% consider analytics very to extremely valuable when making critical decisions. Unfortunately, many are dissatisfied with their current tools due to the loss of productivity, multiple “sources of truth,” and the lack of integration with their current tools and systems.

A gap exists between the functionalities provided by current BI and data discovery tools and what users want and need.

Throughout my career, I’ve spoken with many executives who wonder why BI continues to fail them, especially when data discovery tools like Qlik and Tableau have gained such momentum. The reality is, these tools are great for a very limited set of use cases among a limited audience of users — and the adoption rates reflect that reality.

Data discovery applications allow analysts to link with data sources and perform self-service analysis, but still come with major pitfalls. Lack of self-service customization, the inability to integrate into workflows with other applications, and an overall lack of flexibility seriously impacts the ability for most users (who aren’t data analysts) to derive meaningful information from these tools.

BI platforms and data discovery applications are supposed to launch insight into action, informing decisions at every level of the organization. But many are instead left with costly investments that actually create inefficiencies, hinder workflows and exclude the vast majority of employees who could benefit from those operational insights. Now that’s what I like to call a lack of ROI.

Business leaders across a variety of industries — including “legacy” sectors like manufacturing, healthcare and financial services — are demanding better and, in my opinion, they should have gotten it long ago.

It’s time to abandon BI — at least as we currently know it.

Here’s what I’ve learned over the years about why traditional BI platforms and newer tools like data discovery applications fail and what I’ve gathered from companies that moved away from them.

The inefficiency breakdown is killing your company

Traditional BI platforms and data discovery applications require users to exit their workflow to attempt data collection. And, as you can guess, stalling teams in the middle of their workflow creates massive inefficiencies. Instead of having the data you need to make a decision readily available to you, instead, you have to exit the application, enter another application, secure the data and then reenter the original application.

According to the 2021 State of Analytics report, 99% of knowledge workers had to spend additional time searching for information they couldn’t easily locate in their analytics solution.

Mar
16
2021
--

Noogata raises $12M seed round for its no-code enterprise AI platform

Noogata, a startup that offers a no-code AI solution for enterprises, today announced that it has raised a $12 million seed round led by Team8, with participation from Skylake Capital. The company, which was founded in 2019 and counts Colgate and PepsiCo among its customers, currently focuses on e-commerce, retail and financial services, but it notes that it will use the new funding to power its product development and expand into new industries.

The company’s platform offers a collection of what are essentially pre-built AI building blocks that enterprises can then connect to third-party tools like their data warehouse, Salesforce, Stripe and other data sources. An e-commerce retailer could use this to optimize its pricing, for example, thanks to recommendations from the Noogata platform, while a brick-and-mortar retailer could use it to plan which assortment to allocate to a given location.

Image Credits: Noogata

“We believe data teams are at the epicenter of digital transformation and that to drive impact, they need to be able to unlock the value of data. They need access to relevant, continuous and explainable insights and predictions that are reliable and up-to-date,” said Noogata co-founder and CEO Assaf Egozi. “Noogata unlocks the value of data by providing contextual, business-focused blocks that integrate seamlessly into enterprise data environments to generate actionable insights, predictions and recommendations. This empowers users to go far beyond traditional business intelligence by leveraging AI in their self-serve analytics as well as in their data solutions.”

Image Credits: Noogata

We’ve obviously seen a plethora of startups in this space lately. The proliferation of data — and the advent of data warehousing — means that most businesses now have the fuel to create machine learning-based predictions. What’s often lacking, though, is the talent. There’s still a shortage of data scientists and developers who can build these models from scratch, so it’s no surprise that we’re seeing more startups that are creating no-code/low-code services in this space. The well-funded Abacus.ai, for example, targets about the same market as Noogata.

“Noogata is perfectly positioned to address the significant market need for a best-in-class, no-code data analytics platform to drive decision-making,” writes Team8 managing partner Yuval Shachar. “The innovative platform replaces the need for internal build, which is complex and costly, or the use of out-of-the-box vendor solutions which are limited. The company’s ability to unlock the value of data through AI is a game-changer. Add to that a stellar founding team, and there is no doubt in my mind that Noogata will be enormously successful.”


Early Stage is the premier “how-to” event for startup entrepreneurs and investors. You’ll hear firsthand how some of the most successful founders and VCs build their businesses, raise money and manage their portfolios. We’ll cover every aspect of company building: Fundraising, recruiting, sales, product-market fit, PR, marketing and brand building. Each session also has audience participation built-in — there’s ample time included for audience questions and discussion. Use code “TCARTICLE at checkout to get 20% off tickets right here.

Feb
17
2021
--

TigerGraph raises $105M Series C for its enterprise graph database

TigerGraph, a well-funded enterprise startup that provides a graph database and analytics platform, today announced that it has raised a $105 million Series C funding round. The round was led by Tiger Global and brings the company’s total funding to over $170 million.

“TigerGraph is leading the paradigm shift in connecting and analyzing data via scalable and native graph technology with pre-connected entities versus the traditional way of joining large tables with rows and columns,” said TigerGraph founder and CEO, Yu Xu. “This funding will allow us to expand our offering and bring it to many more markets, enabling more customers to realize the benefits of graph analytics and AI.”

Current TigerGraph customers include the likes of Amgen, Citrix, Intuit, Jaguar Land Rover and UnitedHealth Group. Using a SQL-like query language (GSQL), these customers can use the company’s services to store and quickly query their graph databases. At the core of its offerings is the TigerGraphDB database and analytics platform, but the company also offers a hosted service, TigerGraph Cloud, with pay-as-you-go pricing, hosted either on AWS or Azure. With GraphStudio, the company also offers a graphical UI for creating data models and visually analyzing them.

The promise for the company’s database services is that they can scale to tens of terabytes of data with billions of edges. Its customers use the technology for a wide variety of use cases, including fraud detection, customer 360, IoT, AI and machine learning.

Like so many other companies in this space, TigerGraph is facing some tailwind thanks to the fact that many enterprises have accelerated their digital transformation projects during the pandemic.

“Over the last 12 months with the COVID-19 pandemic, companies have embraced digital transformation at a faster pace driving an urgent need to find new insights about their customers, products, services, and suppliers,” the company explains in today’s announcement. “Graph technology connects these domains from the relational databases, offering the opportunity to shrink development cycles for data preparation, improve data quality, identify new insights such as similarity patterns to deliver the next best action recommendation.”

Nov
19
2020
--

Amazon S3 Storage Lens gives IT visibility into complex S3 usage

As your S3 storage requirements grow, it gets harder to understand exactly what you have, and this especially true when it crosses multiple regions. This could have broad implications for administrators, who are forced to build their own solutions to get that missing visibility. AWS changed that this week when it announced a new product called Amazon S3 Storage Lens, a way to understand highly complex S3 storage environments.

The tool provides analytics that help you understand what’s happening across your S3 object storage installations, and to take action when needed. As the company describes the new service in a blog post, “This is the first cloud storage analytics solution to give you organization-wide visibility into object storage, with point-in-time metrics and trend lines as well as actionable recommendations,” the company wrote in the post.

Amazon S3 Storage Lens Console

Image Credits: Amazon

The idea is to present a set of 29 metrics in a dashboard that help you “discover anomalies, identify cost efficiencies and apply data protection best practices,” according to the company. IT administrators can get a view of their storage landscape and can drill down into specific instances when necessary, such as if there is a problem that requires attention. The product comes out of the box with a default dashboard, but admins can also create their own customized dashboards, and even export S3 Lens data to other Amazon tools.

For companies with complex storage requirements, as in thousands or even tens of thousands of S3 storage instances, who have had to kludge together ways to understand what’s happening across the systems, this gives them a single view across it all.

S3 Storage Lens is now available in all AWS regions, according to the company.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com