Jul
31
2019
--

Calling all hardware startups! Apply to Hardware Battlefield @ TC Shenzhen

Got hardware? Well then, listen up, because our search continues for boundary-pushing, early-stage hardware startups to join us in Shenzhen, China for an epic opportunity; launch your startup on a global stage and compete in Hardware Battlefield at TC Shenzhen on November 11-12.

Apply here to compete in TC Hardware Battlefield 2019. Why? It’s your chance to demo your product to the top investors and technologists in the world. Hardware Battlefield, cousin to Startup Battlefield, focuses exclusively on innovative hardware because, let’s face it, it’s the backbone of technology. From enterprise solutions to agtech advancements, medical devices to consumer product goods — hardware startups are in the international spotlight.

If you make the cut, you’ll compete against 15 of the world’s most innovative hardware makers for bragging rights, plenty of investor love, media exposure and $25,000 in equity-free cash. Just participating in a Battlefield can change the whole trajectory of your business in the best way possible.

We chose to bring our fifth Hardware Battlefield to Shenzhen because of its outstanding track record of supporting hardware startups. The city achieves this through a combination of accelerators, rapid prototyping and world-class manufacturing. What’s more, TC Hardware Battlefield 2019 takes place as part of the larger TechCrunch Shenzhen that runs November 9-12.

Creativity and innovation no know boundaries, and that’s why we’re opening this competition to any early-stage hardware startup from any country. While we’ve seen amazing hardware in previous Battlefields — like robotic armsfood testing devicesmalaria diagnostic tools, smart socks for diabetics and e-motorcycles, we can’t wait to see the next generation of hardware, so bring it on!

Meet the minimum requirements listed below, and we’ll consider your startup:

Here’s how Hardware Battlefield works. TechCrunch editors vet every qualified application and pick 15 startups to compete. Those startups receive six rigorous weeks of free coaching. Forget stage fright. You’ll be prepped and ready to step into the spotlight.

Teams have six minutes to pitch and demo their products, which is immediately followed by an in-depth Q&A with the judges. If you make it to the final round, you’ll repeat the process in front of a new set of judges.

The judges will name one outstanding startup the Hardware Battlefield champion. Hoist the Battlefield Cup, claim those bragging rights and the $25,000. This nerve-wracking thrill-ride takes place in front of a live audience, and we capture the entire event on video and post it to our global audience on TechCrunch.

Hardware Battlefield at TC Shenzhen takes place on November 11-12. Don’t hide your hardware or miss your chance to show us — and the entire tech world — your startup magic. Apply to compete in TC Hardware Battlefield 2019, and join us in Shenzhen!

Is your company interested in sponsoring or exhibiting at Hardware Battlefield at TC Shenzhen? Contact our sponsorship sales team by filling out this form.

Jul
31
2019
--

Amazon acquires flash-based cloud storage startup E8 Storage

Amazon has acquired Israeli storage tech startup E8 Storage, as first reported by Reuters, CNBC and Globes and confirmed by TechCrunch. The acquisition will bring the team and technology from E8 in to Amazon’s existing Amazon Web Services center in Tel Aviv, per reports.

E8 Storage’s particular focus was on building storage hardware that employs flash-based memory to deliver faster performance than competing offerings, according to its own claims. How exactly AWS intends to use the company’s talent or assets isn’t yet known, but it clearly lines up with their primary business.

AWS acquisitions this year include TSO Logic, a Vancouver-based startup that optimizes data center workload operating efficiency, and Israel-based CloudEndure, which provides data recovery services in the event of a disaster.

Jul
31
2019
--

Save with group discounts and bring your team to TechCrunch’s first-ever Enterprise event Sept. 5 in SF

Get ready to dive into the fiercely competitive waters of enterprise software. Join more than 1,000 attendees for TC Sessions Enterprise 2019 on September 5 to navigate this rapidly evolving category with the industry’s brightest minds, biggest names and exciting startups.

Our $249 early-bird ticket price remains in play, which saves you $100. But one is the loneliest number, so why not take advantage of our group discount, buy in bulk and bring your whole team? Save an extra 20% when you buy four or more tickets at once.

We’ve packed this day-long conference with an outstanding lineup of presentations, interviews, panel discussions, demos, breakout sessions and, of course, networking. Check out the agenda, which includes both industry titans and boundary-pushing startups eager to disrupt the status quo.

We’ll add more surprises along the way, but these sessions provide a taste of what to expect — and why you’ll need your posse to absorb as much intel as possible.

Talking Developer Tools
Scott Farquhar (Atlassian)

With tools like Jira, Bitbucket and Confluence, few companies influence how developers work as much as Atlassian. The company’s co-founder and co-CEO Scott Farquhar will join us to talk about growing his company, how it is bringing its tools to enterprises and what the future of software development in and for the enterprise will look like.

Keeping the Enterprise Secure
Martin Casado (Andreessen Horowitz), Wendy Nather (Duo Security), Emily Heath (United Airlines)

Enterprises face a litany of threats from both inside and outside the firewall. Now more than ever, companies — especially startups — have to put security first. From preventing data from leaking to keeping bad actors out of your network, enterprises have it tough. How can you secure the enterprise without slowing growth? We’ll discuss the role of a modern CSO and how to move fast — without breaking things.

Keeping an Enterprise Behemoth on Course
Bill McDermott (SAP)

With over $166 billion in market cap, Germany-based SAP is one of the most valuable tech companies in the world today. Bill McDermott took the leadership in 2014, becoming the first American to hold this position. Since then, he has quickly grown the company, in part thanks to a number of $1 billion-plus acquisitions. We’ll talk to him about his approach to these acquisitions, his strategy for growing the company in a quickly changing market and the state of enterprise software in general.

The Quantum Enterprise
Jim Clarke (Intel), Jay Gambetta (IBM
and Krysta Svore (Microsoft)
4:20 PM – 4:45 PM

While we’re still a few years away from having quantum computers that will fulfill the full promise of this technology, many companies are already starting to experiment with what’s available today. We’ll talk about what startups and enterprises should know about quantum computing today to prepare for tomorrow.

TC Sessions Enterprise 2019 takes place on September 5. You can’t be everywhere at once, so bring your team, cover more ground and increase your ROI. Get your group discount tickets and save.

Jul
31
2019
--

Prodly announces $3.5M seed to automate low-code cloud deployments

Low-code programming is supposed to make things easier on companies, right? Low-code means you can count on trained administrators instead of more expensive software engineers to handle most tasks, but like any issue solved by technology, there are always unintended consequences. While running his former company, Steelbrick, which he sold to Salesforce in 2015 for $360 million, Max Rudman identified a persistent problem with low-code deployments. He decided to fix it with automation and testing, and the idea for his latest venture, Prodly, was born.

The company announced a $3.5 million seed round today, but more important than the money is the customer momentum. In spite of being a very early-stage startup, the company already has 100 customers using the product, a testament to the fact that other people were probably experiencing that same pain point Rudman was feeling, and there is a clear market for his idea.

As Rudman learned with his former company, going live with the data on a platform like Salesforce is just part of the journey. If you are updating configuration and pricing information on a regular basis, that means updating all the tables associated with that information. Sure, it’s been designed to be point and click, but if you have changes across 48 tables, it becomes a very tedious task, indeed.

The idea behind Prodly is to automate much of the configuration, provide a testing environment to be sure all the information is correct and, finally, automate deployment. For now, the company is just concentrating on configuration, but with the funding it plans to expand the product to solve the other problems, as well.

Rudman is careful to point out that his company’s solution is not built strictly for the Salesforce platform. The startup is taking aim at Salesforce admins for its first go-round, but he sees the same problem with other cloud services that make heavy use of trained administrators to make changes.

“The plan is to start with Salesforce, but this problem actually exists on most cloud platforms — ServiceNow, Workday — none of them have the tools we have focused on for admins, and making the admins more productive and building the tooling that they need to efficiently manage a complex application,” Rudman told TechCrunch.

Customers include Nutanix, Johnson & Johnson, Splunk, Tableau and Verizon (which owns this publication). The $3.5 million round was led by Shasta Ventures, with participation from Norwest Venture Partners.

Jul
31
2019
--

PostgreSQL: Simple C extension Development for a Novice User (and Performance Advantages)

PostegreSQL simple C extension development

PostgreSQL Simple C extensionOne of the great features of PostgreSQL is its extendability. My colleague and senior PostgreSQL developer Ibar has blogged about developing an extension with much broader capabilities including callback functionality. But in this blog post, I am trying to address a complete novice user who has never tried but wants to develop a simple function with business logic. Towards the end of the blog post, I want to show how lightweight the function is by doing simple benchmarking which is repeatable and should act as a strong justification for why end-users should do such development.

Generally, PostgreSQL and extension developers work on a PostgreSQL source build. For a novice user, that may not be required, instead, dev/devel packages provided for the Linux distro would be sufficient. Assuming that you have installed PostgreSQL already, the following steps can get you the additional development libraries required.

On Ubuntu/Debian

$ sudo apt install postgresql-server-dev-11

On RHEL/CentOS

sudo yum install postgresql11-devel

The next step is to add a PostgreSQL binary path to your environment, to ensure that pg_config is there in the path. In my Ubuntu laptop, this is how:

export PATH=/usr/lib/postgresql/11/bin:$PATH

Above mentioned paths may vary according to the environment.

Please make sure that the pg_config is executing without specifying the path:

$ pg_config

PostgreSQL installation provides a build infrastructure for extensions, called PGXS, so that simple extension modules can be built simply against an already-installed server. It automates common build rules for simple server extension modules.

$ pg_config --pgxs
/usr/lib/postgresql/11/lib/pgxs/src/makefiles/pgxs.mk

Now let’s create a directory for development. I am going to develop a simple extension addme with a function addme to add 2 numbers.

$ mkdir addme

Now we need to create a Makefile which builds the extension. Luckily, we can use all PGXS macros.

MODULES = addme
EXTENSION = addme
DATA = addme--0.0.1.sql
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

MODULE specifies the shared object without file extension and EXTENSION specifies the name of the extension name. DATA defines the installation script. The reason for –0.0.1 specifying in the name is that I should match the version we specify in the control file.

Now we need a control file addme.control with the following content:

comment = 'Simple number add function'
default_version = '0.0.1'
relocatable = true
module_pathname = '$libdir/addme'

And we can prepare our function in C which will add 2 integers:

#include "postgres.h"
#include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(addme);

Datum
addme(PG_FUNCTION_ARGS)
{
int32 arg1 = PG_GETARG_INT32(0);
int32 arg2 = PG_GETARG_INT32(1);

PG_RETURN_INT32(arg1 + arg2);
}

At this stage, we have only 3 files in the directory.

$ ls
addme.c addme.control Makefile

Now we can make the file:

$ make

For installing the extension, we need a SQL file with create function. This SQL file name should be the same as the one we specified in DATA parameter in the Makefile, which is addme–0.0.1.sql

Add the following content into this file:

CREATE OR REPLACE FUNCTION
addme(int,int) RETURNS int AS 'MODULE_PATHNAME','addme'
LANGUAGE C STRICT;

And install the extension:

$ sudo make install

Now we can proceed to create the extension and test it:

postgres=# create extension addme;
CREATE EXTENSION
postgres=# select addme(2,3);
addme
-------
5
(1 row)

Just like any function, we can use it in queries against multiple tuples.

postgres=# select 7||'+'||g||'='||addme(7,g) from generate_series(1,10) as g;
?column?
----------
7+1=8
7+2=9
7+3=10
7+4=11
7+5=12
7+6=13
7+7=14
7+8=15
7+9=16
7+10=17
(10 rows)

Performance Benchmarking

Now it is important to understand the performance characteristics calling a C function in extension. For comparison, we have two  options like:
1. ‘+’ operator provided by SQL like

select 1+2;

2. PLpgSQL function as below

CREATE FUNCTION addmepl(a integer, b integer)
 RETURNS integer
as $$ 
BEGIN
  return a+b;
END;
$$ LANGUAGE plpgsql;

For this test/benchmark, I am going to call the function for 1 million times!

SQL + operator

time psql -c "select floor(random() * (100-1+1) + 1)::int+g from generate_series(1,1000000) as g" > out.txt

C function call

$ time psql -c "select addme(floor(random() * (100-1+1) + 1)::int,g) from generate_series(1,1000000) as g" > out.txt

PL function call

$ time psql -c "select addmepl(floor(random() * (100-1+1) + 1)::int,g) from generate_series(1,1000000) as g" > out.txt

I have performed the tests 6 times for each case and tabulated below.

Test Run

As we can see, the performance of Built in ‘+’ operator and the custom C function in the extension takes the least time with almost the same performance. But the PLpgSQL function call is slow and it shows considerable overhead. Hope this justifies why those functions, which are heavily used, need to be written as a native C extension.

Jul
31
2019
--

Blog Poll: Who’s Responsible for Securing the Data in your Databases?

blog poll

blog pollNext up in line for the blog poll series is a question about database management. Last year, we asked you a few questions in a blog poll and we received a great amount of feedback. We wanted to follow up on those some of those same survey questions to see what may have changed over the last 12 months. So with that in mind, we’re hoping you can take a minute or so to answer the first survey question in this series: In your company, who is responsible for securing the data in your databases? DBAs? Security team? Cloud vendor? We would like to know!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

This poll will be up for one month and will be maintained over in the sidebar should you wish to come back at a later date and take part. We look forward to seeing your responses!

Jul
30
2019
--

Using plpgsql_check to Find Compilation Errors and Profile Functions

plpgsql_check

plpgsql_checkThere is always a need for profiling tools in databases for admins or developers. While it is easy to understand the point where an SQL is spending more time using

EXPLAIN

or

EXPLAIN ANALYZE

in PostgreSQL, the same would not work for functions. Recently, Jobin has published a blog post where he demonstrated how plprofiler can be useful in profiling functions.

plprofiler

builds call graphs and creates flame graphs which make the report very easy to understand. Similarly, there is another interesting project called

plpgsql_check

which can be used for a similar purpose as

plprofiler

, while it also looks at code and points out compilation errors. Let us see all of that in action, in this blog post.

Installing plpgsql-check

You could use yum on RedHat/CentOS to install this extension from PGDG repository. Steps to perform source installation on Ubuntu/Debian are also mentioned in the following logs.

On RedHat/CentOS

$ sudo yum install plpgsql_check_11

On Ubuntu/Debian

$ sudo apt-get install postgresql-server-dev-11 libicu-dev gcc make 
$ git clone https://github.com/okbob/plpgsql_check.git 
$ cd plpgsql_check/ 
$ make && make install

Creating and enabling this extension

There are 3 advantages of using

plpgsql_check

  1. Checking for compilation errors in a function code
  2. Finding dependencies in functions
  3. Profiling functions

When using plpgsql_check for the first 2 requirements, you may not need to add any entry to

shared_preload_libraries

. However, if you need to use it for profiling functions (3), then you should add appropriate entries to

shared_preload_libraries

 so that it could load both

plpgsql

and

plpgsql_check

. Due to dependencies,

plpgsql

must be before

plpgsql_check

in the

shared_preload_libraries

config as you see in the following example :

shared_preload_libraries = plpgsql, plpgsql_check

Any change to

shared_preload_libraries

requires a restart. You may see the following error when you do not have

plpgsql

before

plpgsql_check

in the

shared_preload_libraries

config.

$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf
shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql_check'

$ pg_ctl -D /var/lib/pgsql/11/data restart -mf
waiting for server to shut down.... done
server stopped
waiting for server to start....2019-07-07 02:07:10.104 EDT [6423] 
FATAL: could not load library "/usr/pgsql-11/lib/plpgsql_check.so": /usr/pgsql-11/lib/plpgsql_check.so: undefined symbol: plpgsql_parser_setup
2019-07-07 02:07:10.104 EDT [6423] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

$ grep "shared_preload_libraries" $PGDATA/postgresql.auto.conf
shared_preload_libraries = 'pg_qualstats, pg_stat_statements, plpgsql, plpgsql_check'

$ pg_ctl -D /var/lib/pgsql/11/data start
.....
server started

Once the PostgreSQL instance is started, create this extension in the database where you need to perform any of the previously discussed 3 tasks.

$ psql -d percona -c "CREATE EXTENSION plpgsql_check"
CREATE EXTENSION

Finding Compilation Errors

As discussed earlier, this extension can help developers and admins determine compilation errors. But why is it needed? Let’s consider the following example where we see no errors while creating the function. By the way, I have taken this example from my previous blog post where I was talking about Automatic Index recommendations using

hypopg

and

pg_qualstats

. You might want to read that blog post to understand the logic of the following function.

percona=# CREATE OR REPLACE FUNCTION find_usable_indexes()
percona-# RETURNS VOID AS
percona-# $$
percona$# DECLARE
percona$#     l_queries     record;
percona$#     l_querytext     text;
percona$#     l_idx_def       text;
percona$#     l_bef_exp       text;
percona$#     l_after_exp     text;
percona$#     hypo_idx      record;
percona$#     l_attr        record;
percona$#     /* l_err       int; */
percona$# BEGIN
percona$#     CREATE TABLE IF NOT EXISTS public.idx_recommendations (queryid bigint, 
percona$#     query text, current_plan jsonb, recmnded_index text, hypo_plan jsonb);
percona$#     FOR l_queries IN
percona$#     SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums, 
percona$#     pg_qualstats_example_query(t.queryid) as query
percona$#       FROM 
percona$#         ( 
percona$#          SELECT qs.relid::regclass AS relname, qs.relid AS relid, qs.queryid, 
percona$#          string_agg(DISTINCT attnames.attnames,',') AS attnames, qs.attnums
percona$#          FROM pg_qualstats_all qs
percona$#          JOIN pg_qualstats q ON q.queryid = qs.queryid
percona$#          JOIN pg_stat_statements ps ON q.queryid = ps.queryid
percona$#          JOIN pg_amop amop ON amop.amopopr = qs.opno
percona$#          JOIN pg_am ON amop.amopmethod = pg_am.oid,
percona$#          LATERAL 
percona$#               ( 
percona$#                SELECT pg_attribute.attname AS attnames
percona$#                FROM pg_attribute
percona$#                JOIN unnest(qs.attnums) a(a) ON a.a = pg_attribute.attnum 
percona$#                AND pg_attribute.attrelid = qs.relid
percona$#                ORDER BY pg_attribute.attnum) attnames,     
percona$#          LATERAL unnest(qs.attnums) attnum(attnum)
percona$#                WHERE NOT 
percona$#                (
percona$#                 EXISTS 
percona$#                       ( 
percona$#                        SELECT 1
percona$#                        FROM pg_index i
percona$#                        WHERE i.indrelid = qs.relid AND 
percona$#                        (arraycontains((i.indkey::integer[])[0:array_length(qs.attnums, 1) - 1], 
percona$#                         qs.attnums::integer[]) OR arraycontains(qs.attnums::integer[], 
percona$#                         (i.indkey::integer[])[0:array_length(i.indkey, 1) + 1]) AND i.indisunique)))
percona$#                        GROUP BY qs.relid, qs.queryid, qs.qualnodeid, qs.attnums) t
percona$#                        GROUP BY t.relid, t.relname, t.queryid, t.attnames, t.attnums                   
percona$#     LOOP
percona$#         /* RAISE NOTICE '% : is queryid',l_queries.queryid; */
percona$#         execute 'explain (FORMAT JSON) '||l_queries.query INTO l_bef_exp;
percona$#         execute 'select hypopg_reset()';
percona$#         execute 'SELECT indexrelid,indexname FROM hypopg_create_index(''CREATE INDEX on '||l_queries.relname||'('||l_queries.attnames||')'')' INTO hypo_idx;      
percona$#         execute 'explain (FORMAT JSON) '||l_queries.query INTO l_after_exp;
percona$#         execute 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')' INTO l_idx_def;
percona$#         INSERT INTO public.idx_recommendations (queryid,query,current_plan,recmnded_index,hypo_plan) 
percona$#         VALUES (l_queries.queryid,l_querytext,l_bef_exp::jsonb,l_idx_def,l_after_exp::jsonb);        
percona$#     END LOOP;    
percona$#         execute 'select hypopg_reset()';
percona$# END;
percona$# $$ LANGUAGE plpgsql;
CREATE FUNCTION

From the above log, it has created the function with no errors. Unless we call the above function, we do not know if it has any compilation errors. Surprisingly, with this extension, we can use the

plpgsql_check_function_tb()

function to learn whether there are any errors, without actually calling it.

percona=# SELECT functionid, lineno, statement, sqlstate, message, detail, hint, level, position, 
left (query,60),context FROM plpgsql_check_function_tb('find_usable_indexes()');
-[ RECORD 1 ]------------------------------------------------------------
functionid | find_usable_indexes
lineno     | 14
statement  | FOR over SELECT rows
sqlstate   | 42P01
message    | relation "pg_qualstats_all" does not exist
detail     | 
hint       | 
level      | error
position   | 306
left       | SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums,
context    |

From the above log, it is clear that there is an error where a relation used in the function does not exist. But, if we are using dynamic SQLs that are assembled in runtime, false positives are possible, as you can see in the following example. So, you may avoid the functions using dynamic SQL’s or comment the code containing those SQLs before checking for other compilation errors.

percona=# select * from plpgsql_check_function_tb('find_usable_indexes()');
-[ RECORD 1 ]------------------------------------------------------------------------------
functionid | find_usable_indexes
lineno     | 50
statement  | EXECUTE
sqlstate   | 00000
message    | cannot determinate a result of dynamic SQL
detail     | There is a risk of related false alarms.
hint       | Don't use dynamic SQL and record type together, when you would check function.
level      | warning
position   | 
query      | 
context    | 
-[ RECORD 2 ]------------------------------------------------------------------------------
functionid | find_usable_indexes
lineno     | 52
statement  | EXECUTE
sqlstate   | 55000
message    | record "hypo_idx" is not assigned yet
detail     | The tuple structure of a not-yet-assigned record is indeterminate.
hint       | 
level      | error
position   | 
query      | 
context    | SQL statement "SELECT 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')'"

Finding Dependencies

This extension can be used to find dependent objects in a function. This way, it becomes easy for administrators to understand the objects being used in a function without spending a huge amount of time reading the code. The trick is to simply pass your function as a parameter to

plpgsql_show_dependency_tb()

as you see in the following example.

percona=# select * from plpgsql_show_dependency_tb('find_usable_indexes()');
   type   |  oid  |   schema   |            name            |  params   
----------+-------+------------+----------------------------+-----------
 FUNCTION | 16547 | public     | pg_qualstats               | ()
 FUNCTION | 16545 | public     | pg_qualstats_example_query | (bigint)
 FUNCTION | 16588 | public     | pg_stat_statements         | (boolean)
 RELATION |  2601 | pg_catalog | pg_am                      | 
 RELATION |  2602 | pg_catalog | pg_amop                    | 
 RELATION |  1249 | pg_catalog | pg_attribute               | 
 RELATION |  1262 | pg_catalog | pg_database                | 
 RELATION |  2610 | pg_catalog | pg_index                   | 
 RELATION | 16480 | public     | idx_recommendations        | 
 RELATION | 16549 | public     | pg_qualstats               | 
 RELATION | 16559 | public     | pg_qualstats_all           | 
 RELATION | 16589 | public     | pg_stat_statements         | 
(12 rows)

Profiling Functions

This is one of the very interesting features. Once you have added the appropriate entries to

shared_preload_libraries

as discussed earlier, you could easily enable or disable profiling through a GUC:

plpgsql_check.profiler.

This parameter can either be set globally or for only your session. Here’s an example to understand it better.

percona=# ALTER SYSTEM SET plpgsql_check.profiler TO 'ON';
ALTER SYSTEM
percona=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

When you set it globally, all the functions running in the database are automatically profiled and stored. This may be fine in a development or a testing environment, but you could also enable profiling of functions called in a single session through a session-level setting as you see in the following example.

percona=# BEGIN;
BEGIN
percona=# SET plpgsql_check.profiler TO 'ON';
SET
percona=# select salary_update(1000);
 salary_update 
---------------
 
(1 row)

percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');
 lineno | avg_time |                                                    source                                                     
--------+----------+---------------------------------------------------------------------------------------------------------------
      1 |          | 
      2 |          | DECLARE 
      3 |          | l_abc record;
      4 |          | new_sal     INT;
      5 |    0.005 | BEGIN
      6 |   144.96 |   FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'
      7 |          |   LOOP
      8 |      NaN |       new_sal := l_abc.salary + sal_raise;
      9 |      NaN |       UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;
     10 |          |   END LOOP;
     11 |          | END;
(11 rows)

--- Create an Index and check if it improves the execution time of FOR loop.

percona=# CREATE INDEX idx_1 ON employee (emp_id, dept_id);
CREATE INDEX
percona=# select salary_update(1000);
 salary_update 
---------------
 
(1 row)

percona=# select lineno, avg_time, source from plpgsql_profiler_function_tb('salary_update(int)');
 lineno | avg_time |                                                    source                                                     
--------+----------+---------------------------------------------------------------------------------------------------------------
      1 |          | 
      2 |          | DECLARE 
      3 |          | l_abc record;
      4 |          | new_sal     INT;
      5 |    0.007 | BEGIN
      6 |   73.074 |   FOR l_abc IN EXECUTE 'SELECT emp_id, salary FROM employee where emp_id between 1 and 10000 and dept_id = 2'
      7 |          |   LOOP
      8 |      NaN |       new_sal := l_abc.salary + sal_raise;
      9 |      NaN |       UPDATE employee SET salary = new_sal WHERE emp_id = l_abc.emp_id;
     10 |          |   END LOOP;
     11 |          | END;
(11 rows)

percona=# END;
COMMIT
percona=# show plpgsql_check.profiler;
 plpgsql_check.profiler 
------------------------
 on
(1 row)

In the above log, I have opened a new transaction block and enabled the parameter

plpgsql_check.profiler

only for that block. So any function that I have executed in that transaction is profiled, which can be extracted using

plpgsql_profiler_function_tb()

. Once we have identified the area where the execution time is high, the immediate action is to tune that piece of code. After creating the index, I have called the function again. It has now performed faster than earlier.

Conclusion

Special thanks to Pavel Stehule who has authored this extension and also to the contributors who have put this extension into a usable stage today. This is one of the simplest extensions that can be used to check for compilation errors and dependencies. While this can also be a handy profiling tool, a developer may find both plprofiler or

plpgsql_check

helpful for profiling as well.

Jul
30
2019
--

Network (Transport) Encryption for MongoDB

Encryption for MongoDB

Encryption for MongoDBWhy do I need Network encryption?

In our previous blog post MongoDB Security vs. Five ‘Bad Guys’ there’s an overview of five main areas of security functions.

Let’s say you’ve enabled #1 and #2 (Authentication, Authorization) and #4 (Storage encryption a.k.a. encryption-at-rest and Auditing) mentioned in the previous blog post. Only authenticated users will be connecting, and they will be only doing what they’re authorized to. With storage encryption configured properly, the database data can’t be decrypted even if the server’s disk was stolen or accidentally given away.

You will have some pretty tight database servers indeed. However, consider the following movement of user data over the network:

  • Clients sending updates to the database (to a mongos, or mongod if unsharded).
  • A mongos or mongod sending query results back to a client.
  • Between replica set members as they replicate to each other.
  • mongos nodes retrieving collection data from the shards before relaying it to the user.
  • Shards with each other if chunks are being moved between sharded collections

As it moves, the user collection data is no longer within the database ‘fortress’. It’s riding in plain, unencrypted TCP packets. It can be grabbed from that with tcpdump etc. as shown here:

~$ #mongod primary is running on localhost:28051 is this example.
~$ #
~$ #In a different terminal I run: 
~$ #  mongo --port 28051 -u akira -p secret  --quiet --eval 'db.getSiblingDB("payments").TheImportantCollection.findOne()
~$ 
~$ sudo ngrep -d lo . 'port 28051'
interface: lo (127.0.0.0/255.0.0.0)
filter: ( port 28051 ) and ((ip || ip6) || (vlan && (ip || ip6)))
match: .
####
...
...
T 127.0.0.1:51632 -> 127.0.0.1:28051 [AP] #19
  ..........................find.....TheImportantCollection..filter.......lim
  it........?.singleBatch...lsid......id........-H..HN.n.`..}{..$clusterTime.
  X....clusterTime...../%9].signature.3....hash.........>.9...(.j. ..H4. .key
  Id.....fK.]...$db.....payments..                                           
#
T 127.0.0.1:28051 -> 127.0.0.1:51632 [AP] #20
  ....4................s....cursor......firstBatch......0......_id..........c
  ustomer.d....fn.....Smith..gn.....Ken..city.....Georgeville..street1.....1 
  Wishful St...postcode.....45031...order_ids.........id..........ns. ...paym
  ents.TheImportantCollection...ok........?.operationTime...../%9].$clusterTi
  me.X....clusterTime...../%9].signature.3....hash.........>.9...(.j. ..H4. .
  keyId.....fK.]...                                                          
#
T 127.0.0.1:51632 -> 127.0.0.1:28051 [AP] #21
  \....................G....endSessions.&....0......id........-H..HN.n.`..}{.
  ..$db.....admin..                                                          
#
T 127.0.0.1:28051 -> 127.0.0.1:51632 [AP] #22
  ....5.....................ok........?.operationTime...../%9].$clusterTime.X
  ....clusterTime...../%9].signature.3....hash.........>.9...(.j. ..H4. .keyI
  d.....fK.]...                                                              
###^Cexit

The key names and strings such as customer name and address are visible at a glance. This is proof that the TCP data isn’t encrypted. It is moving around in the plain. (You can use “mongoreplay monitor” if you want to see numeric and other non-ascii-string data in a fully human-readable way.)

(If you can unscramble the ascii soup above and see the whole BSON document in your head – great! But you failed the “I am not a robot” test so now you have to stop reading this web page.)

For comparison, this is what the same ngrep command prints when I change to using TLS in the client <-> database connection.

~$ #ngrep during: mongo --port 28051 --ssl --sslCAFile /data/tls_certs_etc/root.crt \
~$ #  --sslPEMKeyFile /data/tls_certs_etc/client.foo_app.pem -u akira -p secret --quiet \
~$ #  --eval 'db.getSiblingDB("payments").TheImportantCollection.findOne()'
~$ 
~$ sudo ngrep -d lo . 'port 28051'
interface: lo (127.0.0.0/255.0.0.0)
filter: ( port 28051 ) and ((ip || ip6) || (vlan && (ip || ip6)))
match: .
####
...
...
T 127.0.0.1:51612 -> 127.0.0.1:28051 [AP] #23
  .........5nYe.).I.M..H.T..j...r".4{.1\.....>...N.Vm.........C..m.V....7.nP.
  f..Z37......}..c?...$.......edN..Qj....$....O[Zb...[...v.....<s.T..m8..u.u3
  R.?....5;...$.F.h...]....@...uq....."..F.M(^.b.....cv.../............\.z..9
  hY........Bz.QEu...`z.W..O@...\.K..C.....N..=.......}.                     
#
T 127.0.0.1:28051 -> 127.0.0.1:51612 [AP] #24
  .....*......4...p.t...G5!.Od...e}.......b.dt.\.xo..^0g.F'.;""..a.....L....#
  DXR.H..)..b.3`.y.vB{@...../..;lOn.k.$7R.]?.M.!H..BC.7........8..k..Rl2.._..
  .pa..-.u...t..;7T8s. z4...Q.....+.Y.\B.............B`.R.(.........~@f..^{.s
  .....\}.D[&..?..m.j#jb.....*.a..`. J?".........Z...J.,....B6............M>.
  ....J....N.H.).!:...B.g2...lua.....5......L9.?.a3....~.G..:...........VB..v
  ........E..[f.S."+...W...A...3...0.G5^.                                    
#
T 127.0.0.1:51612 -> 127.0.0.1:28051 [AP] #25
  ....m..m.5...u...i.H|..L;...M..~#._.v.....7..e...7w.0.......[p..".E=...a.?.
  G{{TS&.........s\..).U.vwV.t...t..2.%..                                    
#
T 127.0.0.1:28051 -> 127.0.0.1:51612 [AP] #26
  .....?..u.*.j...^.LF]6...I.5..5...X.....?..IR(v.T..sX.`....>..Vos.v...z.3d.
  .z.(d.DFs..j.SIA.d]x..s{7..{....n.[n{z.'e'...r............\..#.<<.Y.5.K....
  .....[......[6.....2......[w.5....H                                        
###^Cexit

 

No more plain data to see! The high-loss ascii format being printed by ngrep above can’t provide genuine satisfaction that this is perfectly encrypted binary data, but I hope I’ve demonstrated a quick, useful way to do a ‘sanity check’ that you are using TLS and are not still sending data in the plain.

Note: I’ve used ngrep because I found it made the shortest example. If you prefer tcpdump you can capture the dump with tcpdump <interface filter> <bpf filter> -w <dump file>, then open with the Wireshark GUI or view it with tshark -r <dump file> -V on the command line. And for real satisfaction that the TLS traffic is cryptographically protected data, you can print the captured data in hexadecimal / binary format (as opposed to ‘ascii’) and run an entropy assessment on it.

What’s the risk, really?

It’s probably a very difficult task for a hypothetical spy who was targeting you 1-to-1 to find and occupy a place in your network where they can just read the TCP traffic as a man-in-the-middle. But wholesale network scanners, who don’t know or care who any target is beforehand, will find any place that happens to have a gate open on the day they were passing by.

The scrambled look of raw TCP data to the human eye is not a distraction to them as it is to you, the DBA or server or application programmer. They’ve already scripted the unpacking of all the protocols. I assume the technical problem for the blackhat hackers is more a big-data one (too much copied data to process). As an aside, I hypothesize that they are already pioneering a lot of edge-computing techniques.

It is true that data captured on the move between servers might be only a tiny fraction of the whole data. But if you are making backups by the dump method once a day, and the traffic between the database server and the backup store server is being listened to, then it would be the full database.

How to enable MongoDB network encryption

MongoDB traffic is not encrypted until you create a set of TLS/SSL certificates and keys, and apply them in the mongod and mongos configuration files of your entire cluster (or non-sharded replica set). If you are an experienced TLS/SSL admin, or you are a DBA who has been given a certificate and key set by security administrators elsewhere in your organization, then I think you will find enabling MongoDB’s TLS easy – just distribute the files, reference them in the net.ssl.* options, and stop all the nodes and restart them. Gradually enabling without downtime takes longer but is still possible by using rolling restarts changing net.ssl.mode from disabled -> allowSSL -> preferSSL -> requireSSL (doc link) in each restart.

Conversely, if you are an experienced DBA and it will be your first time creating and distributing TLS certificates and keys, be prepared to spend some time learning about it first.

The way the certificates and PEM key files are created varies according to the following choices:

  • Using an external certificate authority or making a new root certificate just for these MongoDB clusters
  • If you are using it just for the internal system authentication between mongod and mongos nodes, or if you are enabling TLS for clients too
  • How strict you will be making certificates (e.g. with host limitations)
  • Whether you need the ability to revoke certificates

To repeat the first point in this section: if you have a security administration team who already know and control these public key infrastructure (PKI) components – ask them for help, in the interests of saving time and being more certain you’re getting certificates that conform with internal policy.

Self-made test certificates

Percona Security Team note: This is not a best practice, even though it is in the documentation as a tutorial; we recommend you do not use this in production deployments.

So you want to get hands-on with TLS configuration of MongoDB a.s.a.p.? You’ll need certificates and PEM key files. Having the patience to fully master certificate administration would be a virtue, but you are not that virtuous. So you are going to use the existing tutorials (links below) to create self-signed certificates.

The quickest way to create certificates is:

  • Make a new root certificate
  • Generate server certificates (i.e. the ones the mongod and mongos nodes use for net.ssl.PEMKeyFile) from that root certificate
  • Generate client certificates from the new root certificate too
    • Skip setting CN / “subject” fields that limit the hosts or domains the client certificate can be used on
  • Self-sign those certificates
  • Skip making revocation certificates

The weakness in these certificates is:

  • A man in the middle attack is possible (MongoDB doc link):
    “MongoDB can use any valid TLS/SSL certificate issued by a certificate authority or a self-signed certificate. If you use a self-signed certificate, although the communications channel will be encrypted, there will be no validation of server identity. Although such a situation will prevent eavesdropping on the connection, it leaves you vulnerable to a man-in-the-middle attack. Using a certificate signed by a trusted certificate authority will permit MongoDB drivers to verify the server’s identity.”
  • What will happen if someone gets a copy of one of them?
    • If they get the client or a server certificate they will be able to decrypt or spoof being a  SSL encrypting-and-decrypting network peer on the network edges to those nodes.
    • When using self-signed certificates you distribute a copy of the root certificate with the server or client certificate to every mongod, mongos, and client app. I.e. it’s as likely to be misplaced or stolen as a single client or server certificate. With the root certificate spoofing can be done on any edge in the network.
    • You can’t revoke a stolen client or server certificate and cut them off from further access. You’re stuck with it. You’ll have to completely replace all the server-side and client certificates with cluster-wide downtime (at least for MongoDB < 4.2).

Examples on how to make self-signed certificates:

  • This snippet from MongoDB’s Kevin Adistimba is the most concise I’ve seen.
  • This replicaset setup tutorial for Percona’s Corrado Pandiani includes similar instructions with more mongodb context on the page.

Reference in the MongoDB docs:

Various configuration file examples

Three detailed appendix entries on how to make OpenSSL Certificates for Testing.

Troubleshooting

I like the brevity of the SSL Troubleshooting page in Gabriel Ciciliani’s MongoDB administration cool tips presentation from Percona Live Europe ’18. Speaking from my own experience before enabling them in the MongoDB config it’s crucial to make sure the PEM files (both server and client ones) pass the ‘openssl verify’ test command against the root / CA certificate they’re derived from. Absolutely, 100% do this before trying to use them in your mongodb config.

If “openssl verify“-confirmed certificates still create a mongodb replicaset or cluster that is unconnectable then add the --sslAllowInvalidHostnames option when connecting with the mongo shell, and/or net.ssl.allowInvalidHostnames in mongod/mongos configuration. This is a differential diagnosis to see if the hostname requirements of the certificates are the only thing causing the SSL rules to reject the certificates.

If you find it takes --sslAllowInvalidHostnames to make it work it means the CN subject field and/or SAN fields in the certificate need to be edited until they match the hostnames and domains that the SSL lib identifies the hosts as. Don’t be tempted to just conveniently forget about it; disabling hostname verification is a gap that might be leveraged into a man-in-the-middle attack.

If you still are experiencing trouble my next step would be to check the mongod logs. You will find lines matching the grep expression ‘NETWORK .*SSL’ in the log if there are rejections. (This might become “TLS” later.) E.g.

2019-07-25T16:34:49.981+0900 I NETWORK  [conn11] Error receiving request from client: SSLHandshakeFailed: SSL peer certificate validation failed: self signed certificate in certificate chain. Ending connection from 127.0.0.1:33456 (connection id: 11)

You might also try grepping for '[EW] NETWORK' to look for all network errors and warnings.

For SSL there is no need to raise the logging verbosity to see errors and warnings. From what I can see in ssl_manager_openssl.cpp those all come at the default log verbosity of 0. Only if you want to confirm normal, successful connections would I advise briefly raising log verbosity in the config file to level 2 for the exact log ‘component’ (in this case this is the “network”). (Don’t forget to turn it off soon after – forgetting you set log level 2 is a great way to fill your disk.) But for this topic the only thing I think log level 2 will add is “Accepted TLS connection from peer” confirmations like the following. 

2019-07-25T16:29:41.779+0900 D NETWORK  [conn18] Accepted TLS connection from peer: emailAddress=akira.kurogane@nowhere.com,CN=svrA80v,OU=testmongocluster,O=MongoTestCorp,L=Tokyo,ST=Tokyo,C=JP

Take a peek in the code

Certificate acceptance rules are a big topic and I am not the author to cover it. But take a look at the SSLManagerOpenSSL::parseAndValidatePeerCertificate(…) function in ssl_manager_openssl.cpp as a starting point if you’d like to be a bit more familiar with MongoDB’s application.

Jul
30
2019
--

Catalyst raises $15M from Accel to transform data-driven customer success

Managing your customers has changed a lot in the past decade. Out are the steak dinners and ballgame tickets to get a sense of a contract’s chance at renewal, and in are churn analysis and a whole bunch of data science to learn whether a customer and their users like or love your product. That customer experience revolution has been critical to the success of SaaS products, but it can remain wickedly hard to centralize all the data needed to drive top performance in a customer success organization.

That’s where Catalyst comes in. The company, founded in New York City in 2017 and launched April last year, wants to centralize all of your disparate data sources on your customers into one easy-to-digest tool to learn how to approach each of them individually to optimize for the best experience.

The company’s early success has attracted more top investors. It announced today that it has raised a $15 million Series A led by Vas Natarajan of Accel, who previously backed enterprise companies like Frame.io, Segment, InVision, and Blameless. The company had previously raised $3 million from NYC enterprise-focused Work-Bench and $2.4 million from True Ventures. Both firms participated in this new round.

Catalyst CEO Edward Chiu told me that Accel was attractive because of the firm’s recent high-profile success in the enterprise space, including IPOs like Slack, PagerDuty, and CrowdStrike.

When we last spoke with Catalyst a year and a half ago, the firm had just raised its first seed round and was just the company’s co-founders — brothers Edward and Kevin Chiu — and a smattering of employees. Now, the company has 19 employees and is targeting 40 employees by the end of the year.

Team Photo

In that time, the product has continued to evolve as it has worked with its customers. One major feature of Catalyst’s product is a “health score” that determines whether a customer is likely to grow or churn in the coming months based on ingested data around usage. CEO Chiu said that “we’ve gotten our health score to be very very accurate” and “we have the ability to take automated action based on that health score.” Today, the company offers “prefect sync” with Salesforce, Mixpanel, Zendesk, among other services, and will continue to make investments in new integrations.

One high priority for the company has been increasing the speed of integration when a new customer signs up for Catalyst. Chiu said that new customers can be onboarded in minutes, and they can use the platform’s formula builder to define the exact nuances of their health score for their specific customers. “We mold to your use case,” he said.

One lesson the company has learned is that as success teams increasingly become critical to the lifeblood of companies, other parts of the organization and senior executives are working together to improve their customer’s experiences. Chiu told me that the startup often starts with onboarding a customer success team, only to later find that C-suite and other team leads have also joined and are also interacting together on the platform.

An interesting dynamic for the company is that it does its own customer success on its customer success platform. “We are our own best customer,” Chiu said. “We login every day to see the health of our customers… our product managers login to Catalyst every day to read product feedback.”

Since the last time we checked in, the company has added a slew of senior execs, including Cliff Kim as head of product, Danny Han as head of engineering, and Jessica Marucci as head of people, with whom the two Chius had worked together at cloud infrastructure startup DigitalOcean.

Moving forward, Chiu expects to invest further in data analysis and engineering. “One of the most unique things about us is that we are collecting so much unique data: usage patterns, [customer] spend fluctuations, [customer] health scores,” Chiu said. “It would be a hugely missed opportunity not to analyze that data and work on churn.”

Jul
30
2019
--

Confluera snags $9M Series A to help stop cyberattacks in real time

Just yesterday, we experienced yet another major breach when Capital One announced it had been hacked and years of credit card application information had been stolen. Another day, another hack, but the question is how can companies protect themselves in the face of an onslaught of attacks. Confluera, a Palo Alto startup, wants to help with a new tool that purports to stop these kinds of attacks in real time.

Today the company, which launched last year, announced a $9 million Series A investment led by Lightspeed Venture Partners . It also has the backing of several influential technology execs, including John W. Thompson, who is chairman of Microsoft and former CEO at Symantec; Frank Slootman, CEO at Snowflake and formerly CEO at ServiceNow; and Lane Bess, former CEO of Palo Alto Networks.

What has attracted this interest is the company’s approach to cybersecurity. “Confluera is a real-time cybersecurity company. We are delivering the industry’s first platform to deterministically stop cyberattacks in real time,” company co-founder and CEO Abhijit Ghosh told TechCrunch.

To do that, Ghosh says, his company’s solution watches across the customer’s infrastructure, finds issues and recommends ways to mitigate the attack. “We see the problem that there are too many solutions which have been used. What is required is a platform that has visibility across the infrastructure, and uses security information from multiple sources to make that determination of where the attacker currently is and how to mitigate that,” he explained.

Microsoft chairman John Thompson, who is also an investor, says this is more than just real-time detection or real-time remediation. “It’s not just the audit trail and telling them what to do. It’s more importantly blocking the attack in real time. And that’s the unique nature of this platform, that you’re able to use the insight that comes from the science of the data to really block the attacks in real time.”

It’s early days for Confluera, as it has 19 employees and three customers using the platform so far. For starters, it will be officially launching next week at Black Hat. After that, it has to continue building out the product and prove that it can work as described to stop the types of attacks we see on a regular basis.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com