Data virtualization service Varada raises $12M

Varada, a Tel Aviv-based startup that focuses on making it easier for businesses to query data across services, today announced that it has raised a $12 million Series A round led by Israeli early-stage fund MizMaa Ventures, with participation by Gefen Capital.

“If you look at the storage aspect for big data, there’s always innovation, but we can put a lot of data in one place,” Varada CEO and co-founder Eran Vanounou told me. “But translating data into insight? It’s so hard. It’s costly. It’s slow. It’s complicated.”

That’s a lesson he learned during his time as CTO of LivePerson, which he described as a classic big data company. And just like at LivePerson, where the team had to reinvent the wheel to solve its data problems, again and again, every company — and not just the large enterprises — now struggles with managing their data and getting insights out of it, Vanounou argued.

varada architecture diagram

Image Credits: Varada

The rest of the founding team, David Krakov, Roman Vainbrand and Tal Ben-Moshe, already had a lot of experience in dealing with these problems, too, with Ben-Moshe having served at the chief software architect of Dell EMC’s XtremIO flash array unit, for example. They built the system for indexing big data that’s at the core of Varada’s platform (with the open-source Presto SQL query engine being one of the other cornerstones).

Image Credits: Varada

Essentially, Varada embraces the idea of data lakes and enriches that with its indexing capabilities. And those indexing capabilities is where Varada’s smarts can be found. As Vanounou explained, the company is using a machine learning system to understand when users tend to run certain workloads, and then caches the data ahead of time, making the system far faster than its competitors.

“If you think about big organizations and think about the workloads and the queries, what happens during the morning time is different from evening time. What happened yesterday is not what happened today. What happened on a rainy day is not what happened on a shiny day. […] We listen to what’s going on and we optimize. We leverage the indexing technology. We index what is needed when it is needed.”

That helps speed up queries, but it also means less data has to be replicated, which also brings down the cost. As MizMaa’s Aaron Applbaum noted, since Varada is not a SaaS solution, the buyers still get all of the discounts from their cloud providers, too.

In addition, the system can allocate resources intelligently so that different users can tap into different amounts of bandwidth. You can tell it to give customers more bandwidth than your financial analysts, for example.

“Data is growing like crazy: in volume, in scale, in complexity, in who requires it and what the business intelligence uses are, what the API uses are,” Applbaum said when I asked him why he decided to invest. “And compute is getting slightly cheaper, but not really, and storage is getting cheaper. So if you can make the trade-off to store more stuff, and access things more intelligently, more quickly, more agile — that was the basis of our thesis, as long as you can do it without compromising performance.”

Varada, with its team of experienced executives, architects and engineers, ticked a lot of the company’s boxes in this regard, but he also noted that unlike some other Israeli startups, the team understood that it had to listen to customers and understand their needs, too.

“In Israel, you have a history — and it’s become less and less the case — but historically, there’s a joke that it’s ‘ready, fire, aim.’ You build a technology, you’ve got this beautiful thing and you’re like, ‘alright, we did it,’ but without listening to the needs of the customer,” he explained.

The Varada team is not afraid to compare itself to Snowflake, which at least at first glance seems to make similar promises. Vananou praised the company for opening up the data warehousing market and proving that people are willing to pay for good analytics. But he argues that Varada’s approach is fundamentally different.

“We embrace the data lake. So if you are Mr. Customer, your data is your data. We’re not going to take it, move it, copy it. This is your single source of truth,” he said. And in addition, the data can stay in the company’s virtual private cloud. He also argues that Varada isn’t so much focused on the business users but the technologists inside a company.



Microsoft partners with Redis Labs to improve its Azure Cache for Redis

For a few years now, Microsoft has offered Azure Cache for Redis, a fully managed caching solution built on top of the open-source Redis project. Today, it is expanding this service by adding Redis Enterprise, Redis Lab’s commercial offering, to its platform. It’s doing so in partnership with Redis Labs and while Microsoft will offer some basic support for the service, Redis Labs will handle most of the software support itself.

Julia Liuson, Microsoft’s corporate VP of its developer tools division, told me that the company wants to be seen as a partner to open-source companies like Redis Labs, which was among the first companies to change its license to prevent cloud vendors from commercializing and repackaging their free code without contributing back to the community. Last year, Redis Labs partnered with Google Cloud to bring its own fully managed service to its platform and so maybe it’s no surprise that we are now seeing Microsoft make a similar move.

Liuson tells me that with this new tier for Azure Cache for Redis, users will get a single bill and native Azure management, as well as the option to deploy natively on SSD flash storage. The native Azure integration should also make it easier for developers on Azure to integrate Redis Enterprise into their applications.

It’s also worth noting that Microsoft will support Redis Labs’ own Redis modules, including RediSearch, a Redis-powered search engine, as well as RedisBloom and RedisTimeSeries, which provide support for new datatypes in Redis.

“For years, developers have utilized the speed and throughput of Redis to produce unbeatable responsiveness and scale in their applications,” says Liuson. “We’ve seen tremendous adoption of Azure Cache for Redis, our managed solution built on open source Redis, as Azure customers have leveraged Redis performance as a distributed cache, session store, and message broker. The incorporation of the Redis Labs Redis Enterprise technology extends the range of use cases in which developers can utilize Redis, while providing enhanced operational resiliency and security.”


Amazon acquires flash-based cloud storage startup E8 Storage

Amazon has acquired Israeli storage tech startup E8 Storage, as first reported by Reuters, CNBC and Globes and confirmed by TechCrunch. The acquisition will bring the team and technology from E8 in to Amazon’s existing Amazon Web Services center in Tel Aviv, per reports.

E8 Storage’s particular focus was on building storage hardware that employs flash-based memory to deliver faster performance than competing offerings, according to its own claims. How exactly AWS intends to use the company’s talent or assets isn’t yet known, but it clearly lines up with their primary business.

AWS acquisitions this year include TSO Logic, a Vancouver-based startup that optimizes data center workload operating efficiency, and Israel-based CloudEndure, which provides data recovery services in the event of a disaster.


48-hour, buy-one-get-one free — TC Sessions: Enterprise 2019

Every startupper we’ve ever met loves a great deal, and so do we. That’s why we’re celebrating Prime day with a 48-hour flash sale on tickets to TC Sessions: Enterprise 2019, which takes place September 5 at the Yerba Buena Center for the Arts in San Francisco.

We’re talking a classic BOGO — buy-one-get-one — deal that starts today and ends tomorrow, July 16, at 11:59 p.m. (PT). Buy one early-bird ticket ($249) and you get a second ticket for free. But this BOGO goes bye-bye in just 48 hours, so don’t wait. Buy your TC Sessions: Enterprise tickets now and save.

Get ready to join more than 1,000 attendees for a day-long, intensive experience exploring the enterprise colossus — a tech category that generates hundreds of new startups, along with a steady stream of multibillion-dollar acquisitions, every year.

What can you expect at TC Sessions: Enterprise? For starters, you’ll hear TechCrunch editors interview enterprise software leaders, including tech titans, rising founders and boundary-breaking VCs.

One such titan, George Brady — Capital One’s executive VP in charge of tech operations — will join us to discuss how the financial institution left legacy hardware and software behind to embrace the cloud. Quite a journey in such a highly regulated industry.

Our growing speaker roster features other enterprise heavy-hitters, including Aaron Levie, Box co-founder and CEO; Aparna Sinha, Google’s director of product management for Kubernetes and Anthos; Jim Clarke, Intel’s director of quantum hardware; and Scott Farquhar, co-founder and co-CEO of Atlassian.

Looking for in-depth information on technical enterprise topics? You’ll find them in our workshops and breakout sessions. Check out the exhibiting early-stage enterprise startups focused on disrupting, well, everything. Enjoy receptions and world-class networking with other founders, investors and technologists actively building the next generation of enterprise services.

TC Sessions: Enterprise 2019 takes place September 5, and we pack a lot of value into a single day. Double your ROI and take advantage of our 48-hour BOGO sale. Buy your ticket before July 16 at 11:59 p.m. (PT) and get another ticket free. That’s two tickets for one early-bird price. And if that’s not enough value, get this: we’ll register you for a free Expo-only pass to Disrupt SF 2019 for every TC Sessions: Enterprise ticket you purchase (mic drop).

Interested in sponsoring TC Sessions: Enterprise? Fill out this form and a member of our sales team will contact you.


The Slack origin story

Let’s rewind a decade. It’s 2009. Vancouver, Canada.

Stewart Butterfield, known already for his part in building Flickr, a photo-sharing service acquired by Yahoo in 2005, decided to try his hand — again — at building a game. Flickr had been a failed attempt at a game called Game Neverending followed by a big pivot. This time, Butterfield would make it work.

To make his dreams a reality, he joined forces with Flickr’s original chief software architect Cal Henderson, as well as former Flickr employees Eric Costello and Serguei Mourachov, who like himself, had served some time at Yahoo after the acquisition. Together, they would build Tiny Speck, the company behind an artful, non-combat massively multiplayer online game.

Years later, Butterfield would pull off a pivot more massive than his last. Slack, born from the ashes of his fantastical game, would lead a shift toward online productivity tools that fundamentally change the way people work.

Glitch is born

In mid-2009, former TechCrunch reporter-turned-venture-capitalist M.G. Siegler wrote one of the first stories on Butterfield’s mysterious startup plans.

“So what is Tiny Speck all about?” Siegler wrote. “That is still not entirely clear. The word on the street has been that it’s some kind of new social gaming endeavor, but all they’ll say on the site is ‘we are working on something huge and fun and we need help.’”

Siegler would go on to invest in Slack as a general partner at GV, the venture capital arm of Alphabet .

“Clearly this is a creative project,” Siegler added. “It almost sounds like they’re making an animated movie. As awesome as that would be, with people like Henderson on board, you can bet there’s impressive engineering going on to turn this all into a game of some sort (if that is in fact what this is all about).”

After months of speculation, Tiny Speck unveiled its project: Glitch, an online game set inside the brains of 11 giants. It would be free with in-game purchases available and eventually, a paid subscription for power users.


Using NVMe Command Line Tools to Check NVMe Flash Health


NVMEIn this blog post, I’ll look at the types of NVMe flash health information you can get from using the NVMe command line tools.

Checking SATA-based drive health is easy. Whether it’s an SSD or older spinning drive, you can use the


 command to get a wealth of information about the device’s performance and health. As an example:

root@blinky:/var/lib/mysql# smartctl -A /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-62-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
 1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
 5 Reallocated_Sector_Ct   0x0032   100   100   010    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       41
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   065   059   000    Old_age   Always       -       35 (Min/Max 21/41)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0030   100   100   001    Old_age   Offline      -       0
206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0
246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       145599393
247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       4550280
248 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       582524
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       1260
210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0



 might not know all vendor-specific smart values, typically you can Google the drive model along with “smart attributes” and find documents like this to get more details.

If you move to newer generation NVMe-based flash storage,


 won’t work anymore – at least it doesn’t work for the packages available for Ubuntu 16.04 (what I’m running). It looks like support for NVMe in Smartmontools is coming, and it would be great to get a single tool that supports both  SATA and NVMe flash storage.

In the meantime, you can use the


 tool available from the nvme-cli package. It provides some basic information for NVMe devices.

To get information about the NVMe devices installed:

root@alex:~# nvme list
Node             SN                   Model                                    Version  Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     S3EVNCAHB01861F      Samsung SSD 960 PRO 1TB                  1.2      1         689.63  GB /   1.02  TB    512   B +  0 B   1B6QCXP7

To get SMART information:

root@alex:~# nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 34 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 3,465,389
data_units_written                  : 9,014,689
host_read_commands                  : 89,719,366
host_write_commands                 : 134,671,295
controller_busy_time                : 310
power_cycles                        : 11
power_on_hours                      : 21
unsafe_shutdowns                    : 8
media_errors                        : 0
num_err_log_entries                 : 1
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1                : 34 C
Temperature Sensor 2                : 47 C
Temperature Sensor 3                : 0 C
Temperature Sensor 4                : 0 C
Temperature Sensor 5                : 0 C
Temperature Sensor 6                : 0 C

To get additional SMART information (not all devices support it):

root@ts140i:/home/pz/workloads/1m# nvme smart-log-add /dev/nvme0
Additional Smart Log for NVME device:nvme0 namespace-id:ffffffff
key                               normalized raw
program_fail_count              : 100%       0
erase_fail_count                : 100%       0
wear_leveling                   :  62%       min: 1114, max: 1161, avg: 1134
end_to_end_error_detection_count: 100%       0
crc_error_count                 : 100%       0
timed_workload_media_wear       : 100%       37.941%
timed_workload_host_reads       : 100%       51%
timed_workload_timer            : 100%       446008 min
thermal_throttle_status         : 100%       0%, cnt: 0
retry_buffer_overflow_count     : 100%       0
pll_lock_loss_count             : 100%       0
nand_bytes_written              : 100%       sectors: 16185227
host_bytes_written              : 100%       sectors: 6405605

Some of this information is self-explanatory, and some of it isn’t. After looking at the NVMe specification document, here is my read on some of the data:

Available Spare. Contains a normalized percentage (0 to 100%) of the remaining spare capacity that is available.

Available Spare Threshold. When the Available Spare capacity falls below the threshold indicated in this field, an asynchronous event completion can occur. The value is indicated as a normalized percentage (0 to 100%).

(Note: I’m not quite sure what the practical meaning of “asynchronous event completion” is, but it looks like something to avoid!)

Percentage Used. Contains a vendor specific estimate of the percentage of the NVM subsystem life used, based on actual usage and the manufacturer’s prediction of NVM life.

(Note: the number can be more than 100% if you’re using storage for longer than its planned life.)

Data Units Read/Data Units Written. This is the number of 512-byte data units that are read/written, but it is measured in an unusual way. The first value corresponds to 1000 of the 512-byte units. So you can multiply this value by 512000 to get value in bytes. It does not include meta-data accesses.

Host Read/Write Commands. The number of commands of the appropriate type issued. Using this value, as well as one below, you can compute the average IO size for “physical” reads and writes.

Controller Busy Time. Time in minutes that the controller was busy servicing commands. This can be used to gauge long-term storage load trends.

Unsafe Shutdowns. The number of times a power loss happened without a shutdown notification being sent. Depending on the NVMe device you’re using, an unsafe shutdown might corrupt user data.

Warning Temperature Time/Critical Temperature Time. The time in minutes a device operated above a warning or critical temperature. It should be zeroes.

Wear_Leveling. This shows how much of the rated cell life was used, as well as the min/max/avg write count for different cells. In this case, it looks like the cells are rated for 1800 writes and about 1100 on average were used

Timed Workload Media Wear. The media wear by the current “workload.” This device allows you to measure some statistics from the time you reset them (called the “workload”) in addition to showing the device lifetime values.

Timed Workload Host Reads. The percentage of IO operations that were reads (since the workload timer was reset).

Thermal Throttle Status. This shows if the device is throttled due to overheating, and when there were throttling events in the past.

Nand Bytes Written. The bytes written to NAND cells. For this device, the measured unit seems to be in 32MB values. It might be different for other devices.

Host Bytes Written. The bytes written to the NVMe storage from the system. This unit also is in 32MB values. The scale of these values is not very important, as they are the most helpful for finding the write amplification of your workload. This ratio is measured in writes to NAND and writes to HOST. For this example, the Write Amplification Factor (WAF) is 16185227 / 6405605 = 2.53  

As you can see, the NVMe command line tools provide a lot of good information for understanding the health and performance of NVMe devices. You don’t need to use vendor specific tools (like isdct).

Powered by WordPress | Theme: Aeros 2.0 by