Dec
03
2019
--

AWS announces new enterprise search tool powered by machine learning

Today at AWS re:Invent in Las Vegas, the company announced a new search tool called Kendra, which provides natural language search across a variety of content repositories using machine learning.

Matt Wood, AWS VP of artificial intelligence, said the new search tool uses machine learning, but doesn’t actually require machine learning expertise of any kind. Amazon is taking care of that for customers under the hood.

You start by identifying your content repositories. This could be anything from an S3 storage repository to OneDrive to Salesforce — anywhere you store content. You can use pre-built connectors from AWS, provide your credentials and connect to all of these different tools.

Kendra then builds an index based on the content it finds in the connected repositories, and users can begin to interact with the search tool using natural language queries. The tool understands concepts like time, so if the question is something like “When is the IT Help Desk is open,” the search engine understands that this is about time, checks the index and delivers the right information to the user.

The beauty of this search tool is not only that it uses machine learning, but based on simple feedback from a user, like a smiley face or sad face emoji, it can learn which answers are good and which ones require improvement, and it does this automatically for the search team.

Once you have it set up, you can drop the search on your company intranet or you can use it internally inside an application and it behaves as you would expect a search tool to do, with features like type ahead.

Dec
03
2019
--

AWS’ CodeGuru uses machine learning to automate code reviews

AWS today announced CodeGuru, a new machine learning-based service that automates code reviews based on the data the company has gathered from doing code reviews internally.

Developers write the code and simply add CodeGuru to the pull requests. It supports GitHub and CodeCommit, for the time being. CodeGuru uses its knowledge of reviews from Amazon and about 10,000 open-source projects to find issues, then comments on the pull request as needed. It will obviously identify the issues, but it also will suggest remediations and offer links to the relevant documentation.

Encoded in CodeGuru are AWS’s own best practices. Among other things, it also finds concurrency issues, incorrect handling of resources and issues with input validation.

AWS and Amazon’s consumer side have used the profiler part of CodeGuru for the last few years to find the “most expensive line of code.” Over the last few years, even as some of the company’s applications grew, some teams were able to increase their CPU utilization by more than 325% at 36% lower cost.

Dec
03
2019
--

AWS AutoPilot gives you more visible AutoML in SageMaker Studio

Today at AWS re:Invent in Las Vegas, the company announced AutoPilot, a new tool that gives you greater visibility into automated machine learning model creation, known as AutoML. This new tool is part of the new SageMaker Studio also announced today.

As AWS CEO Andy Jassy pointed out onstage today, one of the problems with AutoML is that it’s basically a black box. If you want to improve a mediocre model, or just evolve it for your business, you have no idea how it was built.

The idea behind AutoPilot is to give you the ease of model creation you get from an AutoML-generated model, but also give you much deeper insight into how the system built the model. “AutoPilot is a way to create a model automatically, but give you full visibility and control,” Jassy said.

“Using a single API call, or a few clicks in Amazon SageMaker Studio, SageMaker Autopilot first inspects your data set, and runs a number of candidates to figure out the optimal combination of data preprocessing steps, machine learning algorithms and hyperparameters. Then, it uses this combination to train an Inference Pipeline, which you can easily deploy either on a real-time endpoint or for batch processing. As usual with Amazon SageMaker, all of this takes place on fully-managed infrastructure,” the company explained in a blog post announcing the new feature.

You can look at the model’s parameters, and see 50 automated models, and it provides you with a leader board of what models performed the best. What’s more, you can look at the model’s underlying notebook, and also see what trade-offs were made to generate that best model. For instance, it may be the most accurate, but sacrifices speed to get that.

Your company may have its own set of unique requirements and you can choose the best model based on whatever parameters you consider to be most important, even though it was generated in an automated fashion.

Once you have the model you like best, you can go into SageMaker Studio, select it and launch it with a single click. The tool is available now.

 

Dec
03
2019
--

AWS speeds up Redshift queries 10x with AQUA

At its re:Invent conference, AWS CEO Andy Jassy today announced the launch of AQUA (the Advanced Query Accelerator) for Amazon Redshift, the company’s data warehousing service. As Jassy noted in his keynote, it’s hard to scale data warehouses when you want to do analytics over that data. At some point, as your data warehouse or lake grows, the data starts overwhelming your network or available compute, even with today’s highspeed networks and chips. So to handle this, AQUA is essentially a hardware-accelerated cache and promises up to 10x better query performance than competing cloud-based data warehouses.

“Think about how much data you have to move over the network to get to your compute,” Jassy said. And if that’s not a problem for a company today, he added, it will likely become one soon, given how much data most enterprises now generate.

With this, Jassy explained, you’re bringing the compute power you need directly to the storage layer. The cache sits on top of Amazon’s standard S3 service and can hence scale out as needed across as many nodes as needed.

AWS designed its own analytics processors to power this service and accelerate the data compression and encryption on the fly.

Unsurprisingly, the service is also 100% compatible with the current version of Redshift.

In addition, AWS also today announced next-generation compute instances for Redshift, the RA3 instances, with 48 vCPUs and 384GiB of memory and up to 64 TB of storage. You can build clusters of these with up to 128 instances.

Dec
02
2019
--

New Amazon tool simplifies delivery of containerized machine learning models

As part of the flurry of announcements coming this week out of AWS re:Invent, Amazon announced the release of Amazon SageMaker Operators for Kubernetes, a way for data scientists and developers to simplify training, tuning and deploying containerized machine learning models.

Packaging machine learning models in containers can help put them to work inside organizations faster, but getting there often requires a lot of extra management to make it all work. Amazon SageMaker Operators for Kubernetes is supposed to make it easier to run and manage those containers, the underlying infrastructure needed to run the models and the workflows associated with all of it.

“While Kubernetes gives customers control and portability, running ML workloads on a Kubernetes cluster brings unique challenges. For example, the underlying infrastructure requires additional management such as optimizing for utilization, cost and performance; complying with appropriate security and regulatory requirements; and ensuring high availability and reliability,” AWS’ Aditya Bindal wrote in a blog post introducing the new feature.

When you combine that with the workflows associated with delivering a machine learning model inside an organization at scale, it becomes part of a much bigger delivery pipeline, one that is challenging to manage across departments and a variety of resource requirements.

This is precisely what Amazon SageMaker Operators for Kubernetes has been designed to help DevOps teams do. “Amazon SageMaker Operators for Kubernetes bridges this gap, and customers are now spared all the heavy lifting of integrating their Amazon SageMaker and Kubernetes workflows. Starting today, customers using Kubernetes can make a simple call to Amazon SageMaker, a modular and fully-managed service that makes it easier to build, train, and deploy machine learning (ML) models at scale,” Bindal wrote.

The promise of Kubernetes is that it can orchestrate the delivery of containers at the right moment, but if you haven’t automated delivery of the underlying infrastructure, you can over (or under) provision and not provide the correct amount of resources required to run the job. That’s where this new tool, combined with SageMaker, can help.

“With workflows in Amazon SageMaker, compute resources are pre-configured and optimized, only provisioned when requested, scaled as needed, and shut down automatically when jobs complete, offering near 100% utilization,” Bindal wrote.

Amazon SageMaker Operators for Kubernetes are available today in select AWS regions.

Nov
27
2019
--

Box looks to balance growth and profitability as it matures

Prevailing wisdom states that as an enterprise SaaS company evolves, there’s a tendency to sacrifice profitability for growth — understandably so, especially in the early days of the company. At some point, however, a company needs to become profitable.

Box has struggled to reach that goal since going public in 2015, but yesterday, it delivered a mostly positive earnings report. Wall Street seemed to approve, with the stock up 6.75% as we published this article.

Box CEO Aaron Levie says the goal moving forward is to find better balance between growth and profitability. In his post-report call with analysts, Levie pointed to some positive numbers.

“As we shared in October [at BoxWorks], we are focused on driving a balance of long-term growth and improved profitability as measured by the combination of revenue growth plus free cash flow margin. On this combined metric, we expect to deliver a significant increase in FY ’21 to at least 25% and eventually reaching at least 35% in FY ’23,” Levie said.

Growing the platform

Part of the maturation and drive to profitability is spurred by the fact that Box now has a more complete product platform. While many struggle to understand the company’s business model, it provides content management in the cloud and modernizing that aspect of enterprise software. As a result, there are few pure-play content management vendors that can do what Box does in a cloud context.

Nov
26
2019
--

New Amazon capabilities put machine learning in reach of more developers

Today, Amazon announced a new approach that it says will put machine learning technology in reach of more developers and line of business users. Amazon has been making a flurry of announcements ahead of its re:Invent customer conference next week in Las Vegas.

While the company offers plenty of tools for data scientists to build machine learning models and to process, store and visualize data, it wants to put that capability directly in the hands of developers with the help of the popular database query language, SQL.

By taking advantage of tools like Amazon QuickSight, Aurora and Athena in combination with SQL queries, developers can have much more direct access to machine learning models and underlying data without any additional coding, says VP of artificial intelligence at AWS, Matt Wood.

“This announcement is all about making it easier for developers to add machine learning predictions to their products and their processes by integrating those predictions directly with their databases,” Wood told TechCrunch.

For starters, Wood says developers can take advantage of Aurora, the company’s MySQL (and Postgres)-compatible database to build a simple SQL query into an application, which will automatically pull the data into the application and run whatever machine learning model the developer associates with it.

The second piece involves Athena, the company’s serverless query service. As with Aurora, developers can write a SQL query — in this case, against any data store — and based on a machine learning model they choose, return a set of data for use in an application.

The final piece is QuickSight, which is Amazon’s data visualization tool. Using one of the other tools to return some set of data, developers can use that data to create visualizations based on it inside whatever application they are creating.

“By making sophisticated ML predictions more easily available through SQL queries and dashboards, the changes we’re announcing today help to make ML more usable and accessible to database developers and business analysts. Now anyone who can write SQL can make — and importantly use — predictions in their applications without any custom code,” Amazon’s Matt Asay wrote in a blog post announcing these new capabilities.

Asay added that this approach is far easier than what developers had to do in the past to achieve this. “There is often a large amount of fiddly, manual work required to take these predictions and make them part of a broader application, process or analytics dashboard,” he wrote.

As an example, Wood offers a lead-scoring model you might use to pick the most likely sales targets to convert. “Today, in order to do lead scoring you have to go off and wire up all these pieces together in order to be able to get the predictions into the application,” he said. With this new capability, you can get there much faster.

“Now, as a developer I can just say that I have this lead scoring model which is deployed in SageMaker, and all I have to do is write literally one SQL statement that I do all day long into Aurora, and I can start getting back that lead scoring information. And then I just display it in my application and away I go,” Wood explained.

As for the machine learning models, these can come pre-built from Amazon, be developed by an in-house data science team or purchased in a machine learning model marketplace on Amazon, says Wood.

Today’s announcements from Amazon are designed to simplify machine learning and data access, and reduce the amount of coding to get from query to answer faster.

Nov
26
2019
--

Comparing S3 Streaming Tools with Percona XtraBackup

Comparing S3 Streaming Tools

Making backups over the network can be done in two ways: either save on disk and transfer or just transfer without saving. Both ways have their strong and weak points. The second way, particularly, is highly dependent on the upload speed, which would either reduce or increase the backup time. Other factors that influence it are chunk size and the number of upload threads.

Percona XtraBackup 2.4.14 has gained S3 streaming, which is the capability to upload backups directly to s3-compatible storage without saving locally first. This feature was developed because we wanted to improve the upload speeds of backups in Percona Operator for XtraDB Cluster.

There are many implementations of S3 Compatible Storage: AWS S3, Google Cloud Storage, Digital Ocean Spaces, Alibaba Cloud OSS, MinIO, and Wasabi.

We’ve measured the speed of AWS CLI, gsutil, MinIO client, rclone, gof3r and the xbcloud tool (part of Percona XtraBackup) on AWS (in single and multi-region setups) and on Google Cloud. XtraBackup was compared in two variants: a default configuration and one with tuned chunk size and amount of uploading threads.

Here are the results.

AWS (Same Region)

The backup data was streamed from the AWS EC2 instance to the AWS S3, both in the us-east-1 region.

 

 

tool settings CPU max mem speed speed comparison
AWS CLI default settings 66% 149Mb 130MiB/s baseline
AWS CLI 10Mb block, 16 threads 68% 169Mb 141MiB/s +8%
MinIO client not changeable 10% 679Mb 59MiB/s -55%
rclone rcat not changeable 102% 7138Mb 139MiB/s +7%
gof3r default settings 69% 252Mb 97MiB/s -25%
gof3r 10Mb block, 16 threads 77% 520Mb 108MiB/s -17%
xbcloud default settings 10% 96Mb 25MiB/s -81%
xbcloud 10Mb block, 16 threads 60% 185Mb 134MiB/s +3%

 

Tip: If you run MySQL on an EC2 instance to make backups inside one region, do snapshots instead.

AWS (From US to EU)

The backup data was streamed from AWS EC2 in us-east-1 to AWS S3 in eu-central-1.

 

 

tool settings CPU max mem speed speed comparison
AWS CLI default settings 31% 149Mb 61MiB/s baseline
AWS CLI 10Mb block, 16 threads 33% 169Mb 66MiB/s +8%
MinIO client not changeable 3% 679Mb 20MiB/s -67%
rclone rcat not changeable 55% 9307Mb 77MiB/s +26%
gof3r default settings 69% 252Mb 97MiB/s +59%
gof3r 10Mb block, 16 threads 77% 520Mb 108MiB/s +77%
xbcloud default settings 4% 96Mb 10MiB/s -84%
xbcloud 10Mb block, 16 threads 59% 417Mb 123MiB/s +101%

 

Tip: Think about disaster recovery, and what will you do when the whole region is not available. It makes no sense to back up to the same region; always transfer backups to another region.

Google Cloud (From US to EU)

The backup data were streamed from Compute Engine instance in us-east1 to Cloud Storage europe-west3. Interestingly, Google Cloud Storage supports both native protocol and S3(interoperability) API. So, Percona XtraBackup can transfer data to Google Cloud Storage directly via S3(interoperability) API.

 

tool settings CPU max mem speed speed comparison
gsutil not changeable, native protocol 8% 246Mb 23MiB/s etalon
rclone rcat not changeable, native protocol 6% 61Mb 16MiB/s -30%
xbcloud default settings, s3 protocol 3% 97Mb 9MiB/s -61%
xbcloud 10Mb block, 16 threads, s3 protocol 50% 417Mb 133MiB/s +478%

 

Tip: A cloud provider can block your account due to many reasons, such as human or robot mistakes, inappropriate content abuse after hacking, credit card expire, sanctions, etc. Think about disaster recovery and what will you do when a cloud provider blocks your account, it may make sense to back up to another cloud provider or on-premise.

Conclusion

xbcloud tool (part of Percona XtraBackup) is 2-5 times faster with tuned settings on long-distance with native cloud vendor tools, and 14% faster and requires 20% less memory than analogs with the same settings. Also, xbcloud is the most reliable tool for transferring backups to S3-compatible storage because of two reasons:

  • It calculates md5 sums during the uploading and puts them into a .md5/filename.md5 file and verifies sums on the download (gof3r does the same).
  • xbcloud sends data in 10mb chunks and resends them if any network failure happens.

PS: Please find instructions on GitHub if you would like to reproduce this article’s results.

Nov
26
2019
--

Coralogix announces $10M Series A to bring more intelligence to logging

Coralogix, a startup that wants to bring automation and intelligence to logging, announced a $10 million Series A investment today.

The round was led by Aleph with participation from StageOne Ventures, Janvest Capital Partners and 2B Angels. Today’s investment brings the total raised to $16.2 million, according to the company.

CEO and co-founder Ariel Assaraf says his company focuses on two main areas: logging and analysis. The startup has been doing traditional applications performance monitoring up until now, but today, it also announced it was getting into security logging, where it tracks logs for anomalies and shares this information with security information and event management (SEIM) tools.

“We do standard log analytics in terms of ingesting, parsing, visualizing, alerting and searching for log data at scale using scaled, secure infrastructure,” Assaraf said. In addition, the company has developed a set of algorithms to analyze the data, and begin to understand patterns of expected behavior, and how to make use of that data to recognize and solve problems in an automated fashion.

“So the idea is to generally monitor a system automatically for customers plus giving them the tools to quickly drill down into data, understand how it behaves and get context to the issues that they see,” he said.

For instance, the tool could recognize that a certain sequence of events like a user logging in, authenticating that user and redirecting him or her to the application or website. All of those events happen every time, so if there is something different, the system will recognize that and share the information with DevOps team that something is amiss.

The company, which has offices in Tel Aviv, San Francisco and Kiev, was founded in 2015. It already has 1500 customers including Postman, Fiverr, KFC and Caesars Palace. They’ve been able to build the company with just 30 people to this point, but want to expand the sales and marketing team to help build it out the customer base further. The new money should help in that regard.

Nov
26
2019
--

Vivun snags $3M seed round to bring order to pre-sales

Vivun, a startup that wants to help companies keep better track of pre-sales data announced a $3 million seed round today led by Unusual Ventures, the venture firm run by Harness CEO Jyoti Bansal.

Vivun founder and CEO Matt Darrow says that pre-sales team works more closely with the customer than anyone else, delivering demos and proof of concepts, and generally helping sales get over the finish line. While sales has CRM to store knowledge about the customer, pre-sales has been lacking a tool to track info about their interactions with customers, and that’s what his company built.

“The main problem that we solve is we give technology to those pre-sales leaders to run and operate their teams, but then take those insights from the group that knows more about the technology and the customer than anybody else, and we deliver that across the organization to the product team, sales team and executive staff,” Darrow explained.

Darrow is a Zuora alumni, and his story is similar to that company’s founder Tien Tzuo, who built the first billing system for Salesforce, then founded Zuroa to build a subscription billing system for everyone else. Similarly, Darrow built a pre-sales tool for Zuroa after finding there wasn’t anything else out there that was devoted specifically to tracking that kind of information.

“At Zuora, I had to build everything from scratch. After the IPO, I realized that this is something that every tech company can take advantage of because every technology company will really need this role to be of high value and impact,” he said.

The company not only tracks information via a mobile app and browser tool, it also has a reporting dashboard to help companies understand and share the information the pre-sales team is hearing from the customer. For example, they might know that x number of customers have been asking for a certain feature, and this information can be organized and passed onto other parts of the company.

Screenshot: Vivun

Bansal, who was previously CEO and co-founder at AppDynamics, a company he sold to Cisco for $3.7 billion just before its IPO in 2017, saw a company filling a big hole in the enterprise software ecosystem. He is not just an investor, he’s also a customer.

“To be successful, a technology company needs to understand three things: where it will be in five years, what its customers need right now, and what the market wants that it’s not currently providing. Pre-sales has answers to all three questions and is a strategically important department that needs management, analytics, and tools for accelerating deals. Yet, no one was making software for this critical department until Vivun,” he said in a statement.

The company was founded in 2018 and has been bootstrapped until now. It spent the first year building out the product. Today, the company has 20 customers including SignalFx (acquired by Splunk in August for $1.05 billion) and Harness.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com