Sep
16
2021
--

Fiberplane nabs €7.5M seed to bring Google Docs-like collaboration to incident response

Fiberplane, an Amsterdam-based early-stage startup that is building collaborative notebooks for SREs (site reliability engineers) to collaborate around an incident in a similar manner to group editing in a Google Doc, announced a ??€7.5 million (approximately $8.8 million USD) seed round today.

The round was co-led by Crane Venture Partners and Notion Capital, with participation from Northzone, System.One and Basecase Capital.

Micha Hernandez van Leuffen (known as Mies) is founder and CEO at Fiberplane. When his previous startup, Werker, was sold to Oracle in 2017, Hernandez van Leuffen became part of a much larger company where he saw people struggling to deal with outages (which happen at every company).

“We were always going back and forth between metrics, logs and traces, what I always call this sort of treasure hunt, and figuring out what was the underlying root cause of an outage or downtime,” Hernandez van Leuffen told me.

He said that this experience led to a couple of key insights about incident response: First, you needed a centralized place to pull all the incident data together, and secondly that as a distributed team managing a distributed system you needed to collaborate in real time, often across different time zones.

When he left Oracle in August 2020, he began thinking about the idea of giving DevOps teams and SREs the same kind of group editing capabilities that other teams inside an organization have with tools like Google Docs or Notion and an idea for his new company began to take shape.

What he created with Fiberplane is a collaborative notebook for SRE’s to pull in the various data types and begin to work together to resolve the incident, while having a natural audit trail of what happened and how they resolved the issue. Different people can participate in this notebook, just as multiple people can edit a Google Doc, fulfilling that original vision.

Fiberplane incident response notebook with various types of data about the incident.

Fiberplane collaborative notebook example with multiple people involved. Image Credit: Fiberplane

He doesn’t plan to stop there though. The longer-term vision is an operational platform for SREs and DevOps teams to deal with every aspect of an outage. “This is our starting point, but we are planning to expand from there as more I would say an SRE workbench, where you’re also able to command and control your infrastructure,” he said.

Today the company has 13 employees and is growing, and as they do, they are exploring ways to make sure they are building a diverse company, looking at concrete strategies to find more diverse candidates.

“To hire diversely, we’re re-examining our top of the funnel processes. Our efforts include posting our jobs in communities of underrepresented people, running our job descriptions through a gender decoder and facilitating a larger time frame for jobs to remain open,” Elena Boroda, marketing manager at Fiberplane said.

While Hernandez van Leuffen is based in Amsterdam, the company has been hiring people in the U.K., Berlin, Copenhagen and the U.S., he said. The plan is to have Amsterdam as a central hub when offices reopen as the majority of employees are located there.

Jul
08
2021
--

Rootly nabs $3.2M seed to build SRE incident management solution inside Slack

As companies look for ways to respond to incidents in their complex microservices-driven software stacks, SREs — site reliability engineers — are left to deal with the issues involved in making everything work and keeping the application up and running. Rootly, a new early-stage startup wants to help by building an incident-response solution inside of Slack.

Today the company emerged from stealth with a $3.2 million seed investment. XYZ Venture Capital led the round with participation from 8VC, Y Combinator and several individual tech executives.

Rootly co-founder and CEO Quentin Rousseau says that he cut his SRE teeth working at Instacart. When he joined in 2015, the company was processing hundreds of orders a day, and when he left in 2018 it was processing thousands. It was his job to make sure the app was up and running for shoppers, consumers and stores even as it scaled.

He said that while he was at Instacart, he learned to see patterns in the way people responded to an issue and he had begun working on a side project after he left looking to bring the incident response process under control inside of Slack. He connected with co-founder JJ Tang, who had started at Instacart after Rousseau left in 2018, and the two of them decided to start Rootly to help solve these unique problems that SREs face around incident response.

“Basically we want people to manage and resolve incidents directly in Slack. We don’t want to add another layer of complexity on top of that. We feel like there are already so many tools out there and when things are chaotic and things are on fire, you really want to focus quickly on the resolution part of it. So we’re really trying to be focused on the Slack experience,” Rousseau explained.

The Rootly solution helps SREs connect quickly to their various tools inside Slack, whether that’s Jira or Zendesk or DataDog or PagerDuty, and it compiles an incident report in the background based on the conversation that’s happening inside of Slack around resolving the incident. That will help when the team meets for an incident post-mortem after the issue is resolved.

The company is small at the moment with fewer than 10 employees, but it plans to hire some engineers and sales people over the next year as they put this capital to work.

Tang says that they have built diversity as a core component of the company culture, and it helps that they are working with investor Ross Fubini, managing partner at lead investor XYZ Venture Capital. “That’s also one of the reasons why we picked Ross as our lead investor. [His firm] has probably one of the deepest focuses around [diversity], not only as a fund, but also how they influence their portfolio companies,” he said.

Fubini says there are two main focuses in building diverse companies including building a system to look for diverse pools of talent, and then building an environment to help people from underrepresented groups feel welcome once they are hired.
“One of our early conversations we had with Rootly was how do we both bring a diverse group in and benefit from a diverse set of people, and what’s going to both attract them, and when they come in make them feel like this is a place that they belong,” Fubini explained.

The company is fully remote right now with Rousseau in San Francisco and Tang in Toronto, and the plan is to remain remote whenever offices can fully reopen. It’s worth noting that Rousseau and Tang are members of the current Y Combinator batch.

 

Dec
07
2020
--

Jeli.io announces $4M seed to build incident analysis platform

When one of AWS’s east coast data centers went down at the end of last month, it had an impact on countless companies relying on its services, including Roku, Adobe and Shipt. When the incident was resolved, the company had to analyze what happened. For most companies, that involves manually pulling together information from various internal tools, not a focused incident platform.

Jeli.io wants to change that by providing one central place for incident analysis, and today the company announced a $4 million seed round led by Boldstart Ventures with participation by Harrison Metal and Heavybit.

Jeli CEO and founder Nora Jones knows a thing or two about incident analysis. She helped build the chaos engineering tools at Netflix, and later headed chaos engineering at Slack. While chaos engineering helps simulate possible incidents by stress-testing systems, incidents still happen, of course. She knew that there was a lot to learn from them, but there wasn’t a way to pull together all of the data around an incident automatically. She created Jeli to do that.

“While I was at Netflix pre-pandemic, I discovered the secret that looking at incidents when they happen — like when Netflix goes down, when Slack goes down or when any other organization goes down — that’s actually a catalyst for understanding the delta between how you think your org works and how your org actually works,” Jones told me.

She began to see that there would be great value in trying to figure out the decision-making processes, the people and tools involved and what companies could learn from how they reacted in these highly stressful situations, how they resolved them and what they could do to prevent similar outages from happening again in the future. With no products to help, Jones began building tooling herself at her previous jobs, but she believed there needed to be a broader solution.

“We started Jeli and began building tooling to help engineers by [serving] the insights to help them know where to look after incidents,” she said. They do this by pulling together all of the data from emails, Slack channels, PagerDuty, Zoom recordings, logs and so forth that captured information about the incident, surfacing insights to help understand what happened without having to manually pull all of this information together.

The startup currently has eight employees, with plans to add people across the board in 2021. As she does this, she is cognizant of the importance of building a diverse workforce. “I am extremely committed to diversity and inclusion. It is something that’s been important and a requirement for me from day one. I’ve been in situations in organizations before where I was the only one represented, and I know how that feels. I want to make sure I’m including that from day one because ultimately it leads to a better product,” she said.

The product is currently in private beta, and the company is working with early customers to refine the platform. The plan is to continue to invite companies in the coming months, then open that up more widely some time next year.

Eliot Durbin, general partner at Boldstart Ventures, says that he began talking to Jones a couple of years ago when she was at Netflix just to learn about this space, and when she was ready to start a company, his firm jumped at the chance to write an early check, even while the startup was pre-revenue.

“When we met Nora we realized that she’s on a lifelong mission to make things much more resilient […]. And we had the benefit of getting to know her for years before she started the company, so it was really a natural continuation to a conversation that we were already in,” Durbin explained.

Feb
10
2020
--

Facebook Workplace co-founder launches downtime fire alarm Kintaba

“It’s an open secret that every company is on fire,” says Kintaba co-founder John Egan. “At any given moment something is going horribly wrong in a way that it has never gone wrong before.” Code failure downtimes, server outages and hack attacks plague engineering teams. Yet the tools for waking up the right employees, assembling a team to fix the problem and doing a post-mortem to assess how to prevent it from happening again can be as chaotic as the crisis itself.

Text messages, Slack channels, task managers and Google Docs aren’t sufficient for actually learning from mistakes. Alerting systems like PagerDuty focus on the rapid response, but not the educational process in the aftermath. Finally, there’s a more holistic solution to incident response with today’s launch of Kintaba.

The Kintaba team experienced these pains firsthand while working at Facebook after Egan and Zac Morris’ Y Combinator-backed data transfer startup Caffeinated Mind was acqui-hired in 2012. Years later, when they tried to build a blockchain startup and the whole stack was constantly in flames, they longed for a better incident alert tool. So they built one themselves and named it after the Japanese art of Kintsugi, where gold is used to fill in cracked pottery, “which teaches us to embrace the imperfect and to value the repaired,” Egan says.

With today’s launch, Kintaba offers a clear dashboard where everyone in the company can see what major problems have cropped up, plus who’s responding and how. Kintaba’s live activity log and collaboration space for responders let them debate and analyze their mitigation moves. It integrates with Slack, and lets team members subscribe to different levels of alerts or search through issues with categorized hashtags.

“The ability to turn catastrophes into opportunities is one of the biggest differentiating factors between successful and unsuccessful teams and companies,” says Egan. That’s why Kintaba doesn’t stop when your outage does.

Kintaba Founders (from left): John Egan, Zac Morris and Cole Potrocky

As the fire gets contained, Kintaba provides a rich text editor connected to its dashboard for quickly constructing a post-mortem of what went wrong, why, what fixes were tried, what worked and how to safeguard systems for the future. Its automated scheduling assistant helps teams plan meetings to internalize the post-mortem.

Kintaba’s well-pedigreed team and their approach to an unsexy but critical software-as-a-service attracted $2.25 million in funding led by New York’s FirstMark Capital.

“All these features add up to Kintaba taking away all the annoying administrative overhead and organization that comes with running a successful modern incident management practice,” says Egan, “so you can focus on fixing the big issues and learning from the experience.”

Egan, Morris and Cole Potrocky met while working at Facebook, which is known for spawning other enterprise productivity startups based on its top-notch internal tools. Facebook co-founder Dustin Moskovitz built a task management system to reduce how many meetings he had to hold, then left to turn that into Asana, which filed to go public this week.

The trio had been working on internal communication and engineering tools as well as the procedures for employing them. “We saw firsthand working at companies like Facebook how powerful those practices can be and wanted to make them easier for anyone to implement without having to stitch a bunch of tools together,” Egan tells me. He stuck around to co-found Facebook’s enterprise collaboration suite Workplace while Potrocky built engineering architecture there and Morris became a mobile security lead at Uber.

Like many blockchain projects, Kintaba’s predecessor, crypto collectibles wallet Vault, proved an engineering nightmare without clear product market fit. So the team ditched it and pivoted to build out the internal alerting tool they’d been tinkering with. That origin story sounds a lot like Slack’s, which began as a gaming company that pivoted to turn its internal chat tool into a business.

So what’s the difference between Kintaba and just using Slack and email or a monitoring tool like PagerDuty, Splunk’s VictorOps or Atlassian’s OpsGenie? Here’s how Egan breaks a site downtime situation handled with Kintaba:

You’re on call and your pager is blowing up because all your servers have stopped serving data. You’re overwhelmed and the root cause could be any of the multitude of systems sending you alerts. With Kintaba, you aren’t left to fend for yourself. You declare an incident with high severity and the system creates a collaborative space that automatically adds an experienced IMOC (incident manager on call) along with other relevant on calls. Kintaba also posts in a company-wide incident Slack channel. Now you can work together to solve the problem right inside the incident’s collaborative space or in Slack while simultaneously keeping stakeholders updated by directing them to the Kintaba incident page instead of sending out update emails. Interested parties can get quick info from the stickied comments and #tags. Once the incident is resolved, Kintaba helps you write a postmortem of what went wrong, how it was fixed, and what will be done to prevent it from happening. Kintaba then automatically distributes the postmortem and sets up an incident review on your calendar.

Essentially, instead of having one employee panicking about what to do until the team struggles to coordinate across a bunch of fragmented messaging threads, a smoother incident reporting process and all the discussion happens in Kintaba. And if there’s a security breach that a non-engineer notices, they can launch a Kintaba alert and assemble the legal and PR team to help, too.

Alternatively, Egan describes the downtime fiascoes he’d experience without Kintaba like this:

The on call has to start waking up their management chain to try and figure out who needs to be involved. The team maybe throws a Slack channel together but since there’s no common high severity incident management system and so many teams are affected by the downtime, other teams are also throwing slack channels together, email threads are happening all over the place, and multiple groups of people are trying to solve the problem at once. Engineers begin stepping all over each other and sales teams start emailing managers demanding to know what’s happening. Once the problem is solved, no one thinks to write up a postmortem and even if they do it only gets distributed to a few people and isn’t saved outside that email chain. Managers blame each other and point fingers at people instead of taking a level headed approach to reviewing the process that led to the failure. In short: panic, thrash, and poor communication.

While monitoring-apps like PagerDuty can do a good job of indicating there’s a problem, they’re weaker at the collaborative resolution and post-mortem process, and designed just for engineers rather than everyone, like Kintaba. Egan says, “It’s kind of like comparing the difference between the warning lights on a piece of machinery and the big red emergency button on a factory floor. We’re the big red button . . . That also means you don’t have to rip out PagerDuty to use Kintaba,” since it can be the trigger that starts the Kintaba flow.

Still, Kintaba will have to prove that it’s so much better than a shared Google Doc, an adequate replacement for monitoring solutions or a necessary add-on that companies should pay $12 per user per month. PagerDuty’s deeper technical focus helped it go public a year ago, though it has fallen about 60% since to a market cap of $1.75 billion. Still, customers like Dropbox, Zoom and Vodafone rely on its SMS incident alerts, while Kintaba’s integration with Slack might not be enough to rouse coders from their slumber when something catches fire.

If Kintaba can succeed in incident resolution with today’s launch, the four-person team sees adjacent markets in task prioritization, knowledge sharing, observability and team collaboration, though those would pit it against some massive rivals. If it can’t, perhaps Slack or Microsoft Teams could be suitable soft landings for Kintaba, bringing more structured systems for dealing with major screw-ups to their communication platforms.

When asked why he wanted to build a legacy atop software that might seem a bit boring on the surface, Egan concluded that, “Companies using Kintaba should be learning faster than their competitors . . . Everyone deserves to work within a culture that grows stronger through failure.”

Sep
04
2018
--

Atlassian acquires OpsGenie, launches Jira Ops for managing incidents

Atlassian today announced the first beta of a new edition of its flagship Jira project and issue tracking tool that is meant to help ops teams handle incidents faster and more efficiently.

Jira Ops integrates with tools like OpsGenie, PagerDuty, xMatters, Statuspage, Slack and others. Many teams already use these tools when their services go down, but Atlassian argues that most companies currently use a rather ad hoc approach to working with them. Jira Ops aims to be the glue that keeps everybody on the same page and provides visibility into ongoing incidents.

Update: after Atlassian announced Jira Ops, it also announced that it has acquired OpsGenie for $295 million.

This is obviously not the first time Atlassian is using Jira to branch out from its core developer audience. Jira Service Desk and Jira Core, for example, aim at a far broader audience. Ops, however, goes after a very specific vertical.

“Service Desk was the first step,” Jens Schumacher, Head of Software Teams at Atlassian, told me. And we were looking at what are the other verticals that we can attack with Jira.” Schumacher also noted that Atlassian built a lot of tools for its internal ops teams over the years to glue together all the different pieces that are necessary to track and manage incidents. With Jira Ops, the company is essentially turning its own playbook into a product.

In a way, though, using Jira Ops adds yet another piece to the puzzle. Schumacher, however, argues that the idea here is to have a single place to manage the process. “The is that when an incident happens, you have a central place where you can go, where you can find out everything about the incident,” he said. “You can see who has been paged and alerted; you can alert more people if you need to right from there; you know what Slack channel the incident is being discussed in.”

Unlike some of Atlassian’s other products, the company doesn’t currently have any plans to launch a self-hosted version of Jira Ops. The argument here is pretty straightforward: if your infrastructure goes down, then Jira Opes could also go do down — and then you don’t have a tool for managing that downtime.

Jira Ops is now available for free for early access beta users. The company expects to launch version 1.0 in early 2019. By then Atlassian will surely also have figured out a pricing plan, something it didn’t announce today.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com