Nov
30
2020
--

Vista acquires Gainsight for $1.1B, adding to its growing enterprise arsenal

Vista Equity Partners hasn’t been shy about scooping up enterprise companies over the years, and today it added to a growing portfolio with its purchase of Gainsight. The company’s software helps clients with customer success, meaning it helps create a positive customer experience when they interact with your brand, making them more likely to come back and recommend you to others. Sources pegged the price tag at $1.1 billion.

As you might expect, both parties are putting a happy face on the deal, talking about how they can work together to grow Gainsight further. Certainly, other companies like Ping Identity seem to have benefited from joining forces with Vista. Being part of a well-capitalized firm allowed them to make some strategic investments along the way to eventually going public last year.

Gainsight and Vista are certainly hoping for a similar outcome in this case. Monti Saroya, co-head of the Vista Flagship Fund and senior managing director at the firm, sees a company with a lot of potential that could expand and grow with help from Vista’s consulting arm, which helps portfolio companies with different aspects of their business like sales, marketing and operations.

“We are excited to partner with the Gainsight team in its next phase of growth, helping the company to expand the category it has created and deliver even more solutions that drive retention and growth to businesses across the globe,” Saroya said in a statement.

Gainsight CEO Nick Mehta likes the idea of being part of Vista’s portfolio of enterprise companies, many of whom are using his company’s products.

“We’ve known Vista for years, since 24 of their portfolio companies use Gainsight. We’ve seen Gainsight clients like JAMF and Ping Identity partner with Vista and then go public. We believe we are just getting started with customer success, so we wanted the right partner for the long term and we’re excited to work with Vista on the next phase of our journey,” Mehta told TechCrunch.

Brent Leary, principle analyst at CRM Essentials, who covers the sales and marketing space, says that it appears that Vista is piecing together a sales and marketing platform that it could flip or go public in a few years.

“It’s not only the power that’s in the platform, it’s also the money. And Vista seems to be piecing together an engagement platform based on the acquisitions of Gainsight, Pipedrive and even last year’s Acquia purchase. Vista isn’t afraid to spend big money, if they can make even bigger money in a couple years if they can make these pieces fit together,” Leary told TechCrunch.

While Gainsight exits as a unicorn, the deal might not have been the outcome it was looking for. The company raised more than $187 million, according to PitchBook data, though its fundraising had slowed in recent years. Gainsight raised $50 million in April of 2017 at a post-money valuation of $515 million, again per PitchBook. In July of 2018 it added $25 million to its coffers, and the final entry was a small debt investment raised in 2019.

It could be that the startup saw its growth slow down, leaving it somewhere between ready for new venture investment and profitability. That’s a gap that PE shops like Vista look for, write a check, shake up a company and hopefully exit at an elevated price.

Gainsight hired a new chief revenue officer last month, notably. Per Forbes, the company was on track to reach “about” $100 million ARR by the end of 2020, giving it a revenue multiple of around 11x in the deal. That’s under current market norms, which could imply that Gainsight had either lower gross margins than comparable companies, or as previously noted, that its growth had slowed.

A $1.1 billion exit is never something to bemoan — and every startup wants to become a unicorn — but Gainsight and Mehta are well known, and we were hoping for the details only an S-1 could deliver. Perhaps one day with Vista’s help that could happen.

Nov
30
2020
--

C3.ai’s initial IPO pricing guidance spotlights the public market’s tech appetite

On the heels of news that DoorDash is targeting an initial IPO valuation up to $27 billion, C3.ai also dropped a new S-1 filing detailing a first-draft guess of what the richly valued company might be worth after its debut.

C3.ai posted an initial IPO price range of $31 to $34 per share, with the company anticipating a sale of 15.5 million shares at that price. The enterprise-focused artificial intelligence company is also selling $100 million of stock at its IPO price to Spring Creek Capital, and another $50 million to Microsoft at the same terms. And there are 2.325 million shares reserved for its underwriters as well.

The total tally of shares that C3.ai will have outstanding after its IPO bloc is sold, Spring Creek and Microsoft buy in, and its underwriters take up their option, is 99,216,958. At the extremes of its initial IPO price range, the company would be worth between $3.08 billion and $3.37 billion using that share count.

Those numbers decline by around $70 and $80 million, respectively, if the underwriters do not purchase their option.

So is the IPO a win for the company at those prices? And is it a win for all C3.ai investors? Amazingly enough, it feels like the answers are yes and no. Let’s explore why.

Slowing growth, rising valuation

If we just look at C3.ai’s revenue history in chunks, you can argue a growth story for the company; that it grew from $73.8 million in the the two quarters of 2019 ending July 31, to $81.8 million in revenue during the same portion of 2020. That’s growth of just under 11% on a year-over-year basis. Not great, but positive.

Nov
30
2020
--

Support for Percona XtraDB Cluster in ProxySQL (Part Two)

Support for Percona XtraDB Cluster in ProxySQL

Support for Percona XtraDB Cluster in ProxySQL

How scheduler and script stand in supporting failover (Percona and Marco example) 

In part one of this series,  I had illustrated how simple scenarios may fail or have problems when using Galera native support inside ProxySQL. In this post, I will repeat the same tests but using the scheduler option and the external script.

The Scheduler

First a brief explanation about the scheduler.

The scheduler inside ProxySQL was created to allow administrators to extend ProxySQL capabilities. The scheduler gives the option to add any kind of script or application and run it at the specified interval of time. The scheduler was also the initial first way we had to deal with Galera/Percona XtraDB Cluster (PXC) node management in case of issues. 

The scheduler table is composed as follows:

CREATE TABLE scheduler (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
    interval_ms INTEGER CHECK (interval_ms>=100 AND interval_ms<=100000000) NOT NULL,
    filename VARCHAR NOT NULL,
    arg1 VARCHAR,
    arg2 VARCHAR,
    arg3 VARCHAR,
    arg4 VARCHAR,
    arg5 VARCHAR,
    comment VARCHAR NOT NULL DEFAULT '')

The relevant elements are:

  • Active: that defines if the scheduler should execute or not the external script
  • Interval_ms: frequency of the execution. This has NO check if previous executions terminate. Given that a script must include a check to prevent launching multiple instances which will probably create conflicts and resource issues.
  • Filename: the FULL path of the script/app you want to be executed.
  • Arg(s): whatever you want to pass as arguments. When you have a complex script, either use a configuration file or collapse multiple arguments in a single string.

The Scripts

In this blog, I will present two different scripts (as examples). Both will cover the scenarios as in the previous article and can do more, but I will focus only on that part for now.

One script is written in Bash and is the porting of the proxysql_galera_checker Percona was using with ProxySQL-admin in ProxySQL version 1.4. The script is available here from Percona-lab (git clone ).

The other, written by me, is written in Perl and is probably the first script that came out in 2016. I have done some enhancements and bug fixing to it during the years. Available here (git clone).

Both are offered here as examples and I am not suggesting to use them in critical production environments.

The Setup

To use the two scripts some custom setup must be done. First of all, check that the files are executable by the user running ProxySQL.

Let’s start with mine in Perl

To make it work we need to define a set of host groups that will work as Reader/Writer/Backup-writer/backup-reader (optional but recommended). The difference from the native support is that instead of having them indicated in a specialized table, we will use the mysql_servers table.

  • Writer: 100
  • Readers: 101
  • Backup Writers:8100
  • Backup Readers: 8101

Given the above, on top of the already defined servers in the previous article, we just need to add the 8000 HGs. 

For example:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',8100,3306,1000,2000,'Failover server preferred');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',8100,3306,999,2000,'Second preferred');    
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',8100,3306,998,2000,'Third and last in the list');      
    
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',8101,3306,100,2000,'');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',8101,3306,1000,2000,'');    
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',8101,3306,1000,2000,'');

After that we need to insert the instructions for the scheduler:

INSERT  INTO scheduler (id,active,interval_ms,filename,arg1) values (10,0,2000,"/opt/tools/proxy_sql_tools/galera_check.pl","-u=cluster1 -p=clusterpass -h=192.168.4.191 -H=100:W,101:R -P=6032 --retry_down=2 --retry_up=1 --main_segment=2 --debug=0  --log=/var/lib/proxysql/galeraLog --active_failover=1");

The result will be:

id: 10
     active: 0
interval_ms: 2000
   filename: /opt/tools/proxy_sql_tools/galera_check.pl
       arg1: -u=cluster1 -p=clusterpass -h=192.168.4.191 -H=100:W,101:R -P=6032 --retry_down=2 --retry_up=1 --main_segment=2 --debug=0  --log=/var/lib/proxysql/galeraLog --active_failover=1
       arg2: NULL
       arg3: NULL
       arg4: NULL
       arg5: NULL
    comment:

Please refer to the instruction in Github for the details of the parameters. What we can specify here is:

  • -H=100:W,101:R Are the Host Group we need to refer to as the ones dealing with our PXC cluster
  • –active_failover=1 Failover method to apply
  • –retry_down=2 –retry_up=1 If action must be taken immediately or if a retry is to be done. This is to avoid the possible jojo effect due to any delay from the node or network.  

Always set it to 0 and activate only when all is set and you are ready to go. Once the above is done, the script ready to be used by ProxySQL is the galera_check script.

Percona proxysql_galera_checker

One limitation this script has is that you cannot use different IPs for the PXC internal communication and the ProxySQL node. Given that, we need to modify the setup we had in the previous blog to match the script requirements. Also here we need to define which HG will be the writer which the reader, but we will specify the internal IPs, and, of course, ProxySQL must have access to that network as well.

  • Writer HG : 200
  • Reader HG: 201
  • Network IPs 10.0.0.22 – 23 – 33

Given that, our ProxySQL setup will be:

delete from mysql_users where username='app_test';
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('app_test','test',1,200,'mysql',1,'application test user DC1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

delete from mysql_query_rules where rule_id in(1040,1042);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1040,6033,'app_test',200,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1042,6033,'app_test',201,1,3,'^SELECT.*$',1);
load mysql query rules to run;save mysql query rules to disk;

delete from mysql_servers where hostgroup_id in (200,201);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.22',200,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.22',201,3306,100,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.23',201,3306,10000,2000,'DC1');    
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.33',201,3306,10000,2000,'DC1');        

load mysql servers to run;save mysql servers to disk;

As you can see here we need to redefine also the user and query rules to match the different HGs, if you use the same (100 -101) no need to do that. Now it’s time to add the line in for the scheduler:

delete from scheduler where id=60;
INSERT  INTO scheduler (id,active,interval_ms,filename,arg1) values (60,0,3000,"/opt/tools/proxysql-scheduler/proxysql_galera_checker","--config-file=/opt/tools/proxysql-scheduler/proxysql-admin-sample.cnf --writer-is-reader=always --write-hg=200 --read-hg=201 --writer-count=1 --priority=10.0.0.22:3306,10.0.0.23:3306,10.0.0.33:3306 --mode=singlewrite --debug --log=/tmp/pxc_test_proxysql_galera_check.log");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

Also in this case please refer to the specifications of the parameters, but it’s worth mentioning:

  • –write-hg=200 –read-hg=201 Host groups definition
  • –writer-is-reader=always Keep this as ALWAYS please, we will see you do not need anything different.
  • –mode=singlewrite Possible modes are load balancer and single writer. This is refuse from the old. Never, ever use Galera/PXC in multi-primary mode, period.
  • –priority=10.0.0.22:3306,10.0.0.23:3306,10.0.0.33:3306 This is where we define the priority for the writers.

Also in this case when loading a schedule, keep the schedule deactivated, and enable it only when ready.

The Tests

Read Test

The first test is the simple read test, so while we have sysbench running in read_only mode we remove one reader after the other.

Marco script:

+---------+-----------+---------------+----------+--------------+----------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed | ConnFree |
+---------+-----------+---------------+----------+--------------+----------+----------+
| 10000   | 100       | 192.168.4.22  | 3306     | ONLINE       | 0        | 0        |
| 10000   | 101       | 192.168.4.233 | 3306     | ONLINE       | 38       | 8        |
| 10000   | 101       | 192.168.4.23  | 3306     | ONLINE	| 15       | 49       |
| 100     | 101       | 192.168.4.22  | 3306     | ONLINE       | 0        | 64       |

As we can see, by just setting the weight we will be able to prevent sending reads to the Writer, and while some will still arrive there, it is negligible. Once we put all the readers down…

Marco script: 

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000   | 100       | 192.168.4.22  | 3306     | ONLINE       | 0        |
| 10000   | 101       | 192.168.4.233 | 3306     | SHUNNED      | 0        |
| 10000   | 101       | 192.168.4.23  | 3306     | SHUNNED      | 0 	|
| 100     | 101       | 192.168.4.22  | 3306     | ONLINE       | 58       |

Given the last node also if with the low weight it will serve all the reads.

Percona Script:

+---------+-----------+---------------+----------+--------------+--------
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed | 
+---------+-----------+---------------+----------+--------------+--------
| 10000   | 200       | 10.0.0.22     | 3306     | ONLINE       | 0        | 
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE       | 22       | 
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE       | 21	| 
| 100     | 201       | 10.0.0.22     | 3306     | ONLINE       | 1        |

Remove the reads:

+---------+-----------+---------------+----------+--------------+-------
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+-------
| 10000   | 200       | 10.0.0.22     | 3306     | ONLINE       | 0        |
| 10000   | 201       | 10.0.0.33     | 3306     | OFFLINE_SOFT | 0 	
| 10000   | 201       | 10.0.0.23     | 3306     | OFFLINE_SOFT | 0 	
| 100     | 201       | 10.0.0.22     | 3306     | ONLINE       | 62       |

In both cases, no issue at all; the writer takes the load of the reads only when left alone. 

Maintenance Test

In this test, I will simply put the node down into maintenance mode using pxc_maint_mode=maintenance, as done in the other article. As a reminder, this was working fine also with native Galera.


Marco script:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000   | 100       | 192.168.4.22  | 3306     | ONLINE       | 50       |
| 10000   | 101       | 192.168.4.233 | 3306     | ONLINE       | 8        |
| 10000   | 101       | 192.168.4.23  | 3306     | ONLINE       | 3    	   |
| 100     | 101       | 192.168.4.22  | 3306     | ONLINE       | 0        |
| 1000000 | 200       | 10.0.0.23     | 3306     | OFFLINE_SOFT | 0        |

After:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 999     | 100       | 192.168.4.23  | 3306     | ONLINE       | 50       |
| 10000   | 100       | 192.168.4.22  | 3306     | OFFLINE_SOFT | 0        |
| 10000   | 101       | 192.168.4.233 | 3306     | ONLINE       | 5        |
| 10000   | 101       | 192.168.4.23  | 3306     | ONLINE       | 6 	   |
| 100     | 101       | 192.168.4.22  | 3306     | OFFLINE_SOFT | 0        |

Node was elected and connections on the old writer were also able to end given OFFLINE_SOFT. Putting back the node, removing it from maintenance:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 999     | 100       | 192.168.4.23  | 3306     | ONLINE	| 50       |
| 10000   | 101       | 192.168.4.233 | 3306     | ONLINE       | 5        |
| 10000   | 101       | 192.168.4.23  | 3306     | ONLINE       | 5 	   |
| 100     | 101       | 192.168.4.22  | 3306     | ONLINE       | 0        |

Node WILL NOT failback by default (this is by design), this will eventually allow you to warm caches or anything else it may be meaningful before moving the node to Primary role again.

The Percona script will behave a bit differently:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200       | 10.0.0.23     | 3306     | OFFLINE_SOFT | 0        |
| 10000   | 200       | 10.0.0.22     | 3306     | ONLINE       | 50       |
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE       | 4 	  |
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE       | 10	  |
| 100     | 201       | 10.0.0.22     | 3306     | ONLINE       | 0        |

Then I put the node under maintenance:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200       | 10.0.0.23     | 3306     | ONLINE       | 26       |
| 10000   | 200       | 10.0.0.22     | 3306     | OFFLINE_SOFT | 22       |
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE       | 8        |
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE       | 12       |
| 100     | 201       | 10.0.0.22     | 3306     | OFFLINE_SOFT | 0        |

Connections will be moved to the new Writer slowly based on the application approach. 

But when I put the node back from maintenance:

+---------+-----------+---------------+----------+--------------+----------
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed 
+---------+-----------+---------------+----------+--------------+----------
| 1000000 | 200       | 10.0.0.23     | 3306     | OFFLINE_SOFT | 0        
| 10000   | 200       | 10.0.0.22     | 3306     | ONLINE       | 49       
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE       | 5        
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE       | 14       
| 100     | 201       | 10.0.0.22     | 3306     | ONLINE       | 0

The old Writer will be put back as Primary. As indicated above I consider this wrong, given we may risk putting back a node that is cold and that can affect production performance. It is true that putting it back from maintenance is a controlled action, but the more checks the better.

Testing Node Crash

Marco script:

To emulate a crash I will kill the mysqld process with kill -9 <pid>.

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status	| ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000    | 100       | 192.168.4.22  | 3306     | ONLINE	| 50       |
| 10000   | 101       | 192.168.4.233 | 3306     | ONLINE	| 12       |
| 10000   | 101       | 192.168.4.23  | 3306     | ONLINE	| 4        |
| 100     | 101       | 192.168.4.22  | 3306     | ONLINE	| 0        |

Kill the process:

59,50,53.99,6603.16,6205.21,218.97,178.98,1561.52,0.00,2.00
60,50,54.11,5674.25,5295.50,215.43,163.32,1648.20,0.00,1.00 
61,50,3.99,3382.12,3327.22,30.95,23.96,2159.29,0.00,48.91   <--- start
62,50,0.00,820.35,820.35,0.00,0.00,0.00,0.00,0.00         
63,50,0.00,2848.86,2550.67,195.13,103.07,0.00,0.00,0.00
64,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
65,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
66,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
67,50,50.00,4268.99,4066.99,52.00,150.00,7615.89,0.00,1.00  <--- failover end 
68,50,72.00,6522.40,6096.37,268.02,158.01,1109.09,0.00,1.00

Five seconds is consistently taken, of which two are because I set the scheduler to run every two seconds, and also a retry. And the new Primary is serving while the failed node is removed:

+---------+-----------+---------------+----------+--------------+----------+----------+
| weight  | hostgroup | srv_host      | srv_port | status	    | ConnUsed | ConnFree |
+---------+-----------+---------------+----------+--------------+----------+----------+
| 999     | 100       | 192.168.4.23  | 3306     | ONLINE       | 0        | 50       |
| 10000   | 101       | 192.168.4.233 | 3306     | ONLINE       | 0        | 34       |
| 10000   | 101       | 192.168.4.23  | 3306     | ONLINE       | 0        | 35       |
| 100     | 101       | 192.168.4.22  | 3306     | SHUNNED      | 0        | 0        |

Percona script:

Also, in this case, the Percona script behaves a bit differently.

Before the crash:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000   | 200       | 10.0.0.22     | 3306     | ONLINE       | 49       |
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE       | 5        |
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE       | 14       |
| 100     | 201       | 10.0.0.22     | 3306     | ONLINE       | 0        |

Then kill the process:

29,50,41.05,4099.74,3838.44,155.18,106.12,2009.23,0.00,0.00
30,50,8.01,1617.92,1547.79,37.07,33.06,1803.47,0.00,50.09
31,50,0.00,2696.60,2696.60,0.00,0.00,0.00,0.00,0.00       <--- start
32,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
33,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
34,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
35,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
36,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
37,50,12.96,2385.82,2172.46,91.72,121.63,8795.93,0.00,0.00  <--- failback ends 6"
38,50,39.95,4360.00,4083.38,148.80,127.82,9284.15,0.00,0.00

Variable time to recover but around 6-12 seconds.

+---------+-----------+---------------+----------+---------+----------+
| weight  | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+---------+-----------+---------------+----------+---------+----------+
| 1000000 | 200       | 10.0.0.23     | 3306     | ONLINE  | 50       | ? new
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE  | 11       |
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE  | 5        |

New Primary is elected. But on node recovery:

+---------+-----------+---------------+----------+--------------+----------+
| weight  | hostgroup | srv_host      | srv_port | status       | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200       | 10.0.0.23     | 3306     | OFFLINE_SOFT | 50       | 
| 10000   | 200       | 10.0.0.22     | 3306     | ONLINE       | 0        |<--old is back
| 10000   | 201       | 10.0.0.33     | 3306     | ONLINE       | 10       |
| 10000   | 201       | 10.0.0.23     | 3306     | ONLINE	| 6        |
| 1000    | 201       | 10.0.0.22     | 3306     | ONLINE       | 0        |

As for maintenance, when the node comes back, by default it is moved to the Primary role. As already explained I consider this wrong and dangerous, but it is a way of seeing what a script should do.

Conclusions

PXC is a complex product, the ways it can be deployed are many, and is not easy or possible to identify all of the possible variants.

Having the opportunity to use native support could be the easier to go solution, but as illustrated part one of this series, misbehavior is just around the corner and it may seriously impact your production environment.

The use of the scheduler with a properly developed script/application that handles the Galera support can guarantee better consistency and proper behavior in respect to your custom expectations. 

There are solutions out there that may fit you and your needs, but if not you can develop your own solution, and be sure that you keep consistency when changing versions of ProxySQL and/or PXC/Galera. In the end, once the main work is done, maintaining a script will be much easier than having to patch a product or wait for a feature request to be implemented. 

I know it may look like a step back, moving out from native support and using a scheduler again. But it is not, it’s just the acknowledgment that sometimes it is better to keep it simple and do some specific tuning/work, rather than trying to put the universe in a bottle which overcomplicates the problem.

Nov
30
2020
--

As Slack acquisition rumors swirl, a look at Salesforce’s six biggest deals

The rumors ignited last Thursday that Salesforce had interest in Slack. This morning, CNBC is reporting the deal is all but done and will be announced tomorrow. Chances are this is going to a big number, but this won’t be Salesforce’s first big acquisition. We thought it would be useful in light of these rumors to look back at the company’s biggest deals.

Salesforce has already surpassed $20 billion in annual revenue, and the company has a history of making a lot of deals to fill in the road map and give it more market lift as it searches for ever more revenue.

The biggest deal so far was the $15.7 billion Tableau acquisition last year. The deal gave Salesforce a missing data visualization component and a company with a huge existing market to feed the revenue beast. In an interview in August with TechCrunch, Salesforce president and chief operating officer Bret Taylor (who came to the company in the $750 million Quip deal in 2016), sees Tableau as a key part of the company’s growing success:

“Tableau is so strategic, both from a revenue and also from a technology strategy perspective,” he said. That’s because as companies make the shift to digital, it becomes more important than ever to help them visualize and understand that data in order to understand their customers’ requirements better.

Next on the Salesforce acquisition hit parade was the $6.5 billion MuleSoft acquisition in 2018. MuleSoft gave Salesforce access to something it didn’t have as an enterprise SaaS company — data locked in silos across the company, even in on-prem applications. The CRM giant could leverage MuleSoft to access data wherever it lived, and when you put the two mega deals together, you could see how you could visualize that data and also give more fuel to its Einstein intelligence layer.

In 2016, the company spent $2.8 billion on Demandware to make a big splash in e-commerce, a component of the platform that has grown in importance during the pandemic when companies large and small have been forced to move their businesses online. The company was incorporated into the Salesforce behemoth and became known as Commerce Cloud.

In 2013, the company made its first billion-dollar acquisition when it bought ExactTarget for $2.5 billion. This represented the first foray into what would become the Marketing Cloud. The purchase gave the company entrée into the targeted email marketing business, which again would grow increasingly in importance in 2020 when communicating with customers became crucial during the pandemic.

Last year, just days after closing the MuleSoft acquisition, Salesforce opened its wallet one more time and paid $1.35 billion for ClickSoftware. This one was a nod to the company’s Service cloud, which encompasses both customer service and field service. This acquisition was about the latter, and giving the company access to a bigger body of field service customers.

The final billion-dollar deal (until we hear about Slack perhaps) is the $1.33 billion Vlocity acquisition earlier this year. This one was a gift for the core CRM product. Vlocity gave Salesforce several vertical businesses built on the Salesforce platform and was a natural fit for the company. Using Vlocity’s platform, Salesforce could (and did) continue to build on these vertical markets giving it more ammo to sell into specialized markets.

While we can’t know for sure if the Slack deal will happen, it sure feels like it will, and chances are this deal will be even larger than Tableau as the Salesforce acquisition machine keeps chugging along.

Nov
30
2020
--

Support for Percona XtraDB Cluster in ProxySQL (Part One)

Support for Percona XtraDB Cluster in ProxySQL

Support for Percona XtraDB Cluster in ProxySQL

How native ProxySQL stands in failover support (both v2.0.15 and v2.1.0)

In recent times I have been designing several solutions focused on High Availability and Disaster Recovery. Some of them using Percona Server for MySQL with group replication, some using Percona XtraDB Cluster (PXC). What many of them had in common was the use of ProxySQL for the connection layer. This is because I consider the use of a layer 7 Proxy preferable, given the possible advantages provided in ReadWrite split and SQL filtering. 

The other positive aspect provided by ProxySQL, at least for Group Replication, is the native support which allows us to have a very quick resolution of possible node failures.

ProxySQL has Galera support as well, but in the past, that had shown to be pretty unstable, and the old method to use the scheduler was still the best way to go.

After Percona Live Online 2020 I decided to try it again and to see if at least the basics were now working fine. 

What I Have Tested

I was not looking for complicated tests that would have included different levels of transaction isolation. I was instead interested in the more simple and basic ones. My scenario was:

1 ProxySQL node v2.0.15  (192.168.4.191)
1 ProxySQL node v2.1.0  (192.168.4.108)
3 PXC 8.20 nodes (192.168.4.22/23/233) with internal network (10.0.0.22/23/33) 

ProxySQL was freshly installed. 

All the commands used to modify the configuration are here. Tests were done first using ProxySQL v2.015 then v2.1.0. Only if results diverge I will report the version and results. 

PXC- Failover Scenario

As mentioned above I am going to focus on the fail-over needs, period. I will have two different scenarios:

  • Maintenance
  • Node crash 

From the ProxySQL point of view I will have three scenarios always with a single Primary:

  • Writer is NOT a reader (option 0 and 2)
  • Writer is also a reader

The configuration of the native support will be:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',100,3306,10000,2000,'Preferred writer');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',100,3306,1000,2000,'Second preferred ');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',100,3306,100,2000,'Las chance');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',101,3306,100,2000,'last reader');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',101,3306,10000,2000,'reader1');    
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',101,3306,10000,2000,'reader2');

Galera host groups:

  • Writer: 100
  • Reader: 101
  • Backup_writer: 102
  • Offline_hostgroup: 9101

Before going ahead let us analyze the Mysql Servers settings. As you can notice I am using the weight attribute to indicate ProxySQL which is my preferred write. But I also use weight for the READ Host Group to indicate which servers should be used and how.

Given that we have that:

  • Write
    • 192.168.4.22  is the preferred Primary
    • 192.168.4.23  is the first failover 
    • 192.168.4.233 is the last chance 
  • Read
    • 192.168.4.233/23 have the same weight and load should be balanced between the two of them
    • The 192.168.4.22 given is the preferred writer should NOT receive the same load in reads and have a lower weight value.  

The Tests

First Test

The first test is to see how the cluster will behave in the case of 1 Writer and 2 readers, with the option writer_is_also_reader = 0.
To achieve this the settings for proxysql will be:

insert into mysql_galera_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) values (100,102,101,9101,1,1,0,10);

As soon as I load this to runtime, ProxySQL should move the nodes to the relevant Host Group. But this is not happening, instead, it keeps the readers in the writer HG and SHUN them.

+---------+--------------+---------------+--------------+
| weight  | hostgroup_id | srv_host      | status       |
+---------+--------------+---------------+--------------+
| 100     | 100          | 192.168.4.233 | SHUNNED      |
| 1000    | 100          | 192.168.4.23  | SHUNNED      |
| 10000   | 100          | 192.168.4.22  | ONLINE       |
| 100     | 102          | 192.168.4.233 | ONLINE       |
| 1000    | 102          | 192.168.4.23  | ONLINE       |
+---------+--------------+---------------+--------------+

This is, of course, wrong. But why does it happen?

The reason is simple. ProxySQL is expecting to see all nodes in the reader group with READ_ONLY flag set to 1. 

In ProxySQL documentation we can read:

writer_is_also_reader=0: nodes with read_only=0 will be placed either in the writer_hostgroup and in the backup_writer_hostgroup after a topology change, these will be excluded from the reader_hostgroup.

This is conceptually wrong. 

A PXC cluster is a tightly coupled replication cluster, with virtually synchronous replication. One of its benefits is to have the node “virtually” aligned with respect to the data state. 

In this kind of model, the cluster is data-centric, and each node shares the same data view.

What it also means is that if correctly set the nodes will be fully consistent in data READ.

The other characteristic of the cluster is that ANY node can become a writer anytime. 
While best practices indicate that it is better to use one Writer a time as Primary to prevent certification conflicts, this does not mean that the nodes not currently elected as Primary, should be prevented from becoming a writer.

Which is exactly what READ_ONLY flag does if activated.

Not only, the need to have READ_ONLY set means that we must change it BEFORE we have the node able to become a writer in case of fail-over. 

This, in short, means the need to have either a topology manager or a script that will do that with all the relative checks and logic to be safe. Which in time of fail-over means it will add time and complexity when it’s not really needed and that goes against the concept of the tightly-coupled cluster itself.

Given the above, we can say that this ProxySQL method related to writer_is_also_reader =0, as it is implemented today for Galera, is, at the best, useless. 

Why is it working for Group Replication? That is easy; because Group Replication internally uses a mechanism to lock/unlock the nodes when non-primary, when using the cluster in single Primary mode. That internal mechanism was implemented as a security guard to prevent random writes on multiple nodes, and also manage the READ_ONLY flag. 

Second Test

Let us move on and test with writer_is_also_reader = 2. Again from the documentation:

writer_is_also_reader=2 : Only the nodes with read_only=0 which are placed in the backup_writer_hostgroup are also placed in the reader_hostgroup after a topology change i.e. the nodes with read_only=0 exceeding the defined max_writers.

Given the settings as indicated above, my layout before using Galera support is:

+---------+--------------+---------------+--------------+
| weight  | hostgroup_id | srv_host      | status       |
+---------+--------------+---------------+--------------+
| 100     | 100          | 192.168.4.233 | ONLINE       |
| 1000    | 100          | 192.168.4.23  | ONLINE       |
| 10000   | 100          | 192.168.4.22  | ONLINE       |
| 10000   | 101          | 192.168.4.233 | ONLINE       |
| 10000   | 101          | 192.168.4.23  | ONLINE       |
| 100     | 101          | 192.168.4.22  | ONLINE       |
+---------+--------------+---------------+--------------+

After enabling Galera support:

+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host      | srv_port | status  |
+--------+-----------+---------------+----------+---------+
| 100    | 100       | 192.168.4.233 | 3306     | SHUNNED |
| 1000   | 100       | 192.168.4.23  | 3306     | SHUNNED |
| 10000  | 100       | 192.168.4.22  | 3306     | ONLINE  |
| 100    | 101       | 192.168.4.233 | 3306     | ONLINE  |
| 1000   | 101       | 192.168.4.23  | 3306     | ONLINE  |
| 100    | 102       | 192.168.4.233 | 3306     | ONLINE  |
| 1000   | 102       | 192.168.4.23  | 3306     | ONLINE  |
+--------+-----------+---------------+----------+---------+

So node ending with 22 (the Primary elected) is not in the reader pool. Which can be ok, I assume. 

But what is not OK at all is that the READERS have now a completely different weight. Nodes x.23 and x.233 are NOT balancing the load any longer, because the weight is not the same or the one I define. It is instead copied over from the WRITER settings. 

Well of course this is wrong and not what I want. Anyhow, let’s test the READ failover.

I will use sysbench read-only:

sysbench ./src/lua/windmills/oltp_read.lua  --mysql-host=192.168.4.191 --mysql-port=6033 --mysql-user=app_test --mysql-password=test --mysql-db=windmills_s --db-driver=mysql --tables=10 --table_size=10000  --rand-type=zipfian --rand-zipfian-exp=0.5 --skip_trx=true  --report-interval=1  --mysql_storage_engine=innodb --auto_inc=off --histogram --table_name=windmills  --stats_format=csv --db-ps-mode=disable --point-selects=50 --range-selects=true --threads=50 --time=2000   run

mysql> select * from  runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
       writer_hostgroup: 100
backup_writer_hostgroup: 102
       reader_hostgroup: 101
      offline_hostgroup: 9101
                 active: 1
            max_writers: 1
  writer_is_also_reader: 2
max_transactions_behind: 10
                comment: NULL

Test Running

+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100    | 100       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 1000   | 100       | 192.168.4.23  | 3306     | SHUNNED | 0        |
| 10000  | 100       | 192.168.4.22  | 3306     | ONLINE  | 0        |
| 100    | 101       | 192.168.4.233 | 3306     | ONLINE  | 1        |
| 1000   | 101       | 192.168.4.23  | 3306     | ONLINE  | 51       |
| 100    | 102       | 192.168.4.233 | 3306     | ONLINE  | 0        |
| 1000   | 102       | 192.168.4.23  | 3306     | ONLINE  | 0        |
+--------+-----------+---------------+----------+---------+----------+

As indicated above the reads are not balanced.  Removing node x.23 using wsrep_reject_queries=all:

+---------+--------------+---------------+--------------+----------+
| weight  | hostgroup_id | srv_host	 | status       | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100     | 100          | 192.168.4.233 | SHUNNED      | 0        |
| 10000   | 100          | 192.168.4.22  | ONLINE       | 0        |
| 100     | 101          | 192.168.4.233 | ONLINE       | 48       |
| 100     | 102          | 192.168.4.233 | ONLINE       | 0        |
+---------+--------------+---------------+--------------+----------+

The remaining node x.233 is taking all the writes, good. If I set wsrep_reject_queries=all also on x.233:

+---------+--------------+---------------+--------------+
| weight  | hostgroup_id | srv_host      | status       |   
+---------+--------------+---------------+--------------+
| 10000   | 100          | 192.168.4.22  | ONLINE	|
| 100     | 9101         | 192.168.4.233 | SHUNNED	|
| 10000   | 9101         | 192.168.4.23  | ONLINE	|
+---------+--------------+---------------+--------------+

And application failed:

FATAL: mysql_drv_query() returned error 9001 (Max connect timeout reached while reaching hostgroup 101 after 10000ms) for query ‘SELECT id, millid, date,active,kwatts_s FROM windmills2 WHERE id=9364’

Now, this may be like this by design, but I have serious difficulties understanding what the reasoning is here, given we allow a platform to fail serving while we still have a healthy server. 

Last but not least I am not allowed to decide WHICH the backup_writers are, ProxySQL will choose them from my writer list of servers. SO why not also include the one I have declared as Primary, at least in case of needs?  ¯\_(?)_/¯

Third Test

Ok last try with writer_is_also_reader = 1.

mysql> select * from  runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
       writer_hostgroup: 100
backup_writer_hostgroup: 102
       reader_hostgroup: 101
      offline_hostgroup: 9101
                 active: 1
            max_writers: 1
  writer_is_also_reader: 1
max_transactions_behind: 10
                comment: NULL
1 row in set (0.01 sec)

And now I have:

+---------+--------------+---------------+--------------+----------+
| weight  | hostgroup_id | srv_host	 | status       | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100     | 100          | 192.168.4.233 | SHUNNED      | 0        |
| 1000    | 100          | 192.168.4.23  | SHUNNED      | 0        |
| 10000   | 100          | 192.168.4.22  | ONLINE       | 0        |
| 100     | 101          | 192.168.4.233 | ONLINE       | 0        |
| 1000    | 101          | 192.168.4.23  | ONLINE       | 0        |
| 10000   | 101          | 192.168.4.22  | ONLINE       | 35       | <-- :(
| 100     | 102          | 192.168.4.233 | ONLINE	| 0        |
| 1000    | 102          | 192.168.4.23  | ONLINE	| 0        |
+---------+--------------+---------------+--------------+----------+

Then remove on Reader at the time as before:

+---------+--------------+---------------+--------------+----------+
| weight  | hostgroup_id | srv_host	 | status       | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100     | 100          | 192.168.4.233 | SHUNNED	| 0        |
| 10000   | 100          | 192.168.4.22  | ONLINE       | 0        |
| 100     | 101          | 192.168.4.233 | ONLINE	| 0        |
| 10000   | 101          | 192.168.4.22  | ONLINE	| 52       | <-- :(
| 100     | 102          | 192.168.4.233 | ONLINE       | 0        |
| 10000   | 9101         | 192.168.4.23  | ONLINE	| 0        |
+---------+--------------+---------------+--------------+----------+

+---------+--------------+---------------+--------------+----------+
| weight  | hostgroup_id | srv_host	 | status	| ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 10000   | 100          | 192.168.4.22  | ONLINE       | 0        |
| 100     | 101          | 192.168.4.22  | ONLINE       | 39       | <-- :(
| 100     | 9101         | 192.168.4.233 | SHUNNED	| 0        |
| 10000   | 9101         | 192.168.4.23  | ONLINE	| 0        |
+---------+--------------+---------------+--------------+----------+

Now as you may have already realized, the point here is that, YES I have my node x.22 (Primary) able to get the READS as well, but the node was taking the whole load from the beginning. This is because of the shift ProxySQL had done in regards to the weight. 

This happens because while internally ProxySQL initially populates the internal table mysql_servers_incoming with the data from the mysql_servers, after several steps that information is overwritten using the data coming from the writer also for the readers. 

Messing up the desired results.

Fourth Test

Failover due to maintenance. In this case, I will set the writer pxc_maint_mode = MAINTENANCE to failover to another writer.
The sysbench command used:

sysbench ./src/lua/windmills/oltp_read_write.lua  --mysql-host=192.168.4.191 --mysql-port=6033 --mysql-user=app_test --mysql-password=test --mysql-db=windmills_s --db-driver=mysql --tables=10 --table_size=10000  --rand-type=zipfian --rand-zipfian-exp=0.5 --skip_trx=false  --report-interval=1  --mysql_storage_engine=innodb --auto_inc=off --histogram --table_name=windmills  --stats_format=csv --db-ps-mode=disable --point-selects=50 --range-selects=true --threads=50 --time=2000   run

After started sysbench I set the writer in maintenance mode:

+-----------------------------+-------------+
| Variable_name               | Value       |
+-----------------------------+-------------+
| pxc_encrypt_cluster_traffic | OFF         |
| pxc_maint_mode              | MAINTENANCE |
| pxc_maint_transition_period | 10          |
| pxc_strict_mode             | ENFORCING   |
+-----------------------------+-------------+

ProxySQL is setting the node as SHUNNED, but is not able to pass over the connection given sysbench uses sticky connections.

+---------+--------------+---------------+--------------+----------+
| weight  | hostgroup_id | srv_host      | status       | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100     | 100          | 192.168.4.233 | SHUNNED      | 0        |
| 1000    | 100          | 192.168.4.23  | ONLINE       | 0        |
| 10000   | 100          | 192.168.4.22  | SHUNNED      | 50       |
| 100     | 101          | 192.168.4.233 | ONLINE       | 2        |
| 1000    | 101          | 192.168.4.23  | ONLINE       | 13       |
| 100     | 102          | 192.168.4.233 | ONLINE       | 0        |
| 10000   | 9101         | 192.168.4.22  | ONLINE       | 0        |
+---------+--------------+---------------+--------------+----------+

THIS IS EXPECTED!
If your application uses sticky connections and never refreshes, you must restart the application. Adding to the sysbench command –reconnect=50 I can see that the connections are a shift to the new master as expected:

+---------+--------------+---------------+--------------+----------+
| weight  | hostgroup_id | srv_host      | status       | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100     | 100          | 192.168.4.233 | SHUNNED      | 0        |
| 1000    | 100          | 192.168.4.23  | ONLINE       | 26       | <-- New Primary
| 10000   | 100          | 192.168.4.22  | SHUNNED      | 19       | <-- shift
| 100     | 101          | 192.168.4.233 | ONLINE       | 0        |
| 10000   | 101          | 192.168.4.23  | ONLINE       | 21       |
| 100     | 102          | 192.168.4.233 | ONLINE       | 0        |
| 10000   | 9101         | 192.168.4.23  | ONLINE       | 0        | <-- ??
| 10000   | 9101         | 192.168.4.22  | ONLINE       | 0        |
+---------+--------------+---------------+--------------+----------+

As we can see ProxySQL does the failover to node x.23 as expected. But it also adds the node in the HG 9101, which is supposed to host the offline servers.

So why move the Primary there? 

Once maintenance is over, disable pxc_main_mode will restore the master. In short, ProxySQL will fail-back. 

The whole process will be not impactful if the application is NOT using sticky connection, otherwise, the application will have to deal with:

  • Error with the connection
  • Retry cycle to re-run the drop DML

Failover Because of a Crash

To check the next case I will add –mysql-ignore-errors=all to sysbench, to be able to see how many errors I will have and for how long, when in the need to failover. To simulate a crash I will KILL -9 the mysqld process on the writer.

After Kill:

98,50,53.00,6472.71,6070.73,221.99,179.99,1327.91,0.00,1.00 <--
99,50,0.00,2719.17,2719.17,0.00,0.00,0.00,0.00,50.00        <--start
100,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
101,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
102,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
103,50,0.00,2849.89,2549.90,193.99,106.00,0.00,0.00,0.00
104,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
105,50,49.85,2663.99,2556.31,23.93,83.75,7615.89,0.00,6.98  <-- done

In this case, it takes 6 seconds for a failover.

+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed | 
+--------+-----------+---------------+----------+---------+----------+
| 100    | 100       | 192.168.4.233 | 3306     | SHUNNED | 0        | 
| 1000   | 100       | 192.168.4.23  | 3306     | ONLINE  | 48       | 
| 100    | 101       | 192.168.4.233 | 3306     | ONLINE  | 1        | 
| 1000   | 101       | 192.168.4.23  | 3306     | ONLINE  | 18       | 
| 100    | 102       | 192.168.4.233 | 3306     | ONLINE  | 0        | 
| 10000  | 9101      | 192.168.4.22  | 3306     | SHUNNED | 0        | 
+--------+-----------+---------------+----------+---------+----------+

So all good here. But during one of my tests ONLY on v2.0.15 and when using the same weight, I had the following weird behavior. Once the failover is done I found that ProxySQL is sending connections to BOTH remaining nodes.

Check below the data taken one after the other nodeS start to take over, keep in mind here the PRIMARY was node 192.168.4.233:

+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000  | 100       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 10000  | 100       | 192.168.4.23  | 3306     | ONLINE  | 10       |<--
| 10000  | 100       | 192.168.4.22  | 3306     | SHUNNED | 40       |<--
| 10000  | 101       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 10000  | 101       | 192.168.4.23  | 3306     | ONLINE  | 3        |
| 10000  | 101       | 192.168.4.22  | 3306     | ONLINE  | 12       |
| 10000  | 102       | 192.168.4.22  | 3306     | ONLINE  | 0        |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000  | 100       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 10000  | 100       | 192.168.4.23  | 3306     | ONLINE  | 37       |<--
| 10000  | 100       | 192.168.4.22  | 3306     | SHUNNED | 13       |<--
| 10000  | 101       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 10000  | 101       | 192.168.4.23  | 3306     | ONLINE  | 7        |
| 10000  | 101       | 192.168.4.22  | 3306     | ONLINE  | 12       |
| 10000  | 102       | 192.168.4.22  | 3306     | ONLINE  | 0        |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000  | 100       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 10000  | 100       | 192.168.4.23  | 3306     | ONLINE  | 49       |<--
| 10000  | 100       | 192.168.4.22  | 3306     | SHUNNED | 0        |<--
| 10000  | 101       | 192.168.4.233 | 3306     | SHUNNED | 0        |
| 10000  | 101       | 192.168.4.23  | 3306     | ONLINE  | 10       |
| 10000  | 101       | 192.168.4.22  | 3306     | ONLINE  | 10       |
| 10000  | 102       | 192.168.4.22  | 3306     | ONLINE  | 0        |
+--------+-----------+---------------+----------+---------+----------+

In the end, only one node will remain as Primary, but for an X amount of time, both were serving also if only ONE node was declared ONLINE.

A Problem Along the Road… (only with v2.0.15)

While I was trying to “fix” the issue with the weight for READERS…

Let’s say we have this:

+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000  | 100       | 192.168.4.23  | 3306     | ONLINE  | 686      |
| 10000  | 100       | 192.168.4.22  | 3306     | SHUNNED | 0        |
| 10000  | 101       | 192.168.4.233 | 3306     | ONLINE  | 62       |
| 10000  | 101       | 192.168.4.23  | 3306     | ONLINE  | 43       |
| 10000  | 101       | 192.168.4.22  | 3306     | ONLINE  | 19       |
| 10000  | 102       | 192.168.4.22  | 3306     | ONLINE  | 0        |
+--------+-----------+---------------+----------+---------+----------+

And I want to release some of the READ load from WRITER (currently 192.168.4.23).

If I simply do:

update mysql_servers set weight=100 where hostgroup_id=101 and hostname='192.168.4.23';

+--------------+---------------+------+-----------+--------+--------+
| hostgroup_id | hostname      | port | gtid_port | status | weight | 
+--------------+---------------+------+-----------+--------+--------+
| 100          | 192.168.4.23  | 3306 | 0         | ONLINE | 10000  | 
| 101          | 192.168.4.22  | 3306 | 0         | ONLINE | 10000  | 
| 101          | 192.168.4.23  | 3306 | 0         | ONLINE | 100    | 
| 101          | 192.168.4.233 | 3306 | 0         | ONLINE | 10000  | 
+--------------+---------------+------+-----------+--------+--------+

Now I load it into runtime, and… if I am lucky:

+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host      | srv_port | status  |
+--------+-----------+---------------+----------+---------+
| 10000  | 100       | 192.168.4.23  | 3306     | ONLINE  |
| 10000  | 100       | 192.168.4.22  | 3306     | SHUNNED |
| 10000  | 101       | 192.168.4.233 | 3306     | ONLINE  |
| 100    | 101       | 192.168.4.23  | 3306     | ONLINE  |
| 10000  | 101       | 192.168.4.22  | 3306     | ONLINE  |
| 10000  | 102       | 192.168.4.22  | 3306     | ONLINE  |
+--------+-----------+---------------+----------+---------+

And then it is changed to:

+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host      | srv_port | status  |
+--------+-----------+---------------+----------+---------+
| 10000  | 100       | 192.168.4.23  | 3306     | ONLINE  |
| 10000  | 100       | 192.168.4.22  | 3306     | SHUNNED |
| 10000  | 101       | 192.168.4.233 | 3306     | ONLINE  |
| 10000  | 101       | 192.168.4.23  | 3306     | ONLINE  |
| 10000  | 101       | 192.168.4.22  | 3306     | ONLINE  |
| 10000  | 102       | 192.168.4.22  | 3306     | ONLINE  |
+--------+-----------+---------------+----------+---------+

As you can notice ProxySQL initially set it to the value I choose. After, it changed back to what was set in the HG 100. But worse, is that if I am not lucky:

+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host      | srv_port | status  | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100    | 100       | 192.168.4.23  | 3306     | SHUNNED | 0        |
| 10000  | 100       | 192.168.4.22  | 3306     | ONLINE  | 0        |
| 10000  | 101       | 192.168.4.233 | 3306     | ONLINE  | 718      |
| 100    | 101       | 192.168.4.23  | 3306     | ONLINE  | 0        |
| 10000  | 101       | 192.168.4.22  | 3306     | SHUNNED | 0        |
| 100    | 102       | 192.168.4.23  | 3306     | ONLINE  | 0        |
+--------+-----------+---------------+----------+---------+----------+

it changes the value (randomly) also for the HG 102 which will impact also the WRITER HG causing a failover. At this point, I stopped testing. Too many things making a failover scenario too unpredictable. 

Conclusions

ProxySQL has a great concept behind it and is for sure covering a really needed gap existing in the MySQL environment, optimizing and powering up the connection layer between the application layer to the data layer.  

But, in regards to the Galera support, we are not there. The support provided is not only limited, it is fallacious, and could lead to serious and/or unexpected problems. Also using the option writer_is_also_reader=1, which is the only one worthy of usage, we still see too many issues in how the nodes are managed in case of serious events as failover.

ProxySQL v2.1.0 seems to have fixed some instabilities, but we still have too many open issues to trust the Galera native support. My advice is to stay away from it and use the scheduler to deal with the Galera cluster. Write a robust script that will cover your specific needs if you must customize the actions. A scheduler will serve you well. 

If too lazy to do so, there is a sample in Percona-Lab. This is the old script used in ProxySQL 1.4.x modified to work with ProxySQL 2.x. I have also written one a long time ago that can help as well here. Both come without any guarantee and I advise you to use them as examples for your own, see Part 2 of this post for details

Finally, let me say that ProxySQL is a great tool, but no tool can cover all. People like me that have been around for long enough have seen this happening many times, and it is of no surprise. 

Great MySQL to all.

References

https://www.percona.com/doc/percona-xtradb-cluster/LATEST/install/index.html

https://galeracluster.com/

https://proxysql.com/blog/proxysql-native-galera-support/

https://www.percona.com/blog/2019/02/20/proxysql-native-support-for-percona-xtradb-cluster-pxc/

https://proxysql.com/documentation/galera-configuration/

Nov
30
2020
--

Materialize scores $40 million investment for SQL streaming database

Materialize, the SQL streaming database startup built on top of the open-source Timely Dataflow project, announced a $32 million Series B investment led by Kleiner Perkins, with participation from Lightspeed Ventures.

While it was at it, the company also announced a previously unannounced $8 million Series A from last year, led by Lightspeed, bringing the total raised to $40 million.

These firms see a solid founding team that includes CEO Arjun Narayan, formerly of Cockroach Labs, and chief scientist Frank McSherry, who created the Timely Dataflow project on which the company is based.

Narayan says that the company believes fundamentally that every company needs to be a real-time company, and it will take a streaming database to make that happen. Further, he says the company is built using SQL because of its ubiquity, and the founders wanted to make sure that customers could access and make use of that data quickly without learning a new query language.

“Our goal is really to help any business to understand streaming data and build intelligent applications without using or needing any specialized skills. Fundamentally what that means is that you’re going to have to go to businesses using the technologies and tools that they understand, which is standard SQL,” Narayan explained.

Bucky Moore, the partner at Kleiner Perkins leading the B round, sees this standard querying ability as a key part of the technology. “As more businesses integrate streaming data into their decision-making pipelines, the inability to ask questions of this data with ease is becoming a non-starter. Materialize’s unique ability to provide SQL over streaming data solves this problem, laying the foundation for them to build the industry’s next great data platform,” he said.

They would naturally get compared to Confluent, a streaming database built on top of the Apache Kafka open-source streaming database project, but Narayan says his company uses straight SQL for querying, while Confluent uses its own flavor.

The company still is working out the commercial side of the house and currently provides a typical service offering for paying customers with support and a service agreement (SLA). The startup is working on a SaaS version of the product, which it expects to release some time next year.

They currently have 20 employees with plans to double that number by the end of next year as they continue to build out the product. As they grow, Narayan says the company is definitely thinking about how to build a diverse organization.

He says he’s found that hiring in general has been challenging during the pandemic, and he hopes that changes in 2021, but he says that he and his co-founders are looking at the top of the hiring funnel because otherwise, as he points out, it’s easy to get complacent and rely on the same network of people you have been working with before, which tends to be less diverse.

“The KPIs and the metrics we really want to use to ensure that we really are putting in the extra effort to ensure a diverse sourcing in your hiring pipeline and then following that through all the way through the funnel. That’s I think the most important way to ensure that you have a diverse [employee base], and I think this is true for every company,” he said.

While he is working remotely now, he sees having multiple offices with a headquarters in NYC when the pandemic finally ends. Some employees will continue to work remotely, with the majority coming into one of the offices.

Nov
30
2020
--

Webinar December 15: The Open Source Alternative to Paying for MongoDB

Open Source Alternative to Paying for MongoDB

Open Source Alternative to Paying for MongoDBPlease join Barrett Chambers, Percona Solutions Engineer, for an engaging webinar on MongoDB vs. Open Source.

Many organizations require Enterprise subscriptions for the coverage and features that it provides. However, many are unaware that there is an open source alternative offering all the features and benefits of a MongoDB Enterprise subscription without the licensing fees. In this talk, we will cover features that make Percona Server for MongoDB an enterprise-grade, license-free alternative to MongoDB Enterprise edition.

In this discussion we will address:

– Brief History of MongoDB

– MongoDB Enterprise versus Percona Server for MongoDB features, including:

  • Authentication
  • Authorization
  • Encryption
  • Governance
  • Audition
  • Storage Engines
  • Monitoring & Alerting

Please join Barrett Chambers, Percona Solutions Engineer, on Tuesday, December 15 at 11:00 AM EST for his webinar “The Open Source Alternative to Paying for MongoDB”.

Register for Webinar

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Nov
30
2020
--

ServiceNow is acquiring Element AI, the Canadian startup building AI services for enterprises

ServiceNow, the cloud-based IT services company, is making a significant acquisition today to fill out its longer-term strategy to be a big player in the worlds of automation and artificial intelligence for enterprises. It is acquiring Element AI, a startup out of Canada.

Founded by AI pioneers and backed by some of the world’s biggest AI companies — it raised hundreds of millions of dollars from the likes of Microsoft, Intel, Nvidia and Tencent, among others — Element AI’s aim was to build and provision AI-based IT services for enterprises, in many cases organizations that are not technology companies by nature.

Terms of the deal are not being disclosed, a spokesperson told TechCrunch, but we now have multiple sources telling us the price was around $500 million. For some context, Element AI was valued at between $600 million and $700 million when it last raised money, $151 million (or C$200 million at the time) in September 2019.

Even at $500 million, this deal would be ServiceNow’s biggest acquisition, although it would be a sizeable devaluation compared to the startup’s last price at fundraising.

A spokesperson confirmed that ServiceNow is making a full acquisition and will retain most of Element AI’s technical talent, including AI scientists and practitioners, but that it will be winding down its existing business after integrating what it wants and needs.

“Our focus with this acquisition is to gain technical talent and AI capabilities,” the spokesperson said. That will also include Element AI co-founder and CEO, JF Gagné, joining ServiceNow, and co-founder Dr. Yoshua Bengio taking on a role as technical advisor.

Those who are not part of those teams will be supported with severance or assistance in looking for other jobs within ServiceNow. A source estimated to us that this could affect around half of the organization.

The startup is headquartered in Montreal, and ServiceNow’s plan is to create an AI Innovation Hub based around that “to accelerate customer-focused AI innovation in the Now Platform.” (That is the brand name of its automation services.)

Last but not least, ServiceNow will start re-platforming some of Element AI’s capabilities, she said. “We expect to wind down most of Element AI’s customers after the deal is closed.”

The deal is the latest move for a company aiming to build a modern platform fit for our times.

ServiceNow, under CEO Bill McDermott (who joined in October 2019 from SAP), has been on a big investment spree in the name of bringing more AI and automation chops to the SaaS company. That has included a number of acquisitions this year, including Sweagle, Passage AI and Loom (respectively for $25 million, $33 million and $58 million), plus regular updates to its larger workflow automation platform.

ServiceNow has been around since 2004, so it’s not strictly a legacy business, but all the same, the publicly traded company, with a current market cap of nearly $103 billion, is vying to position itself as the go-to company for “digital transformation” — the buzz term for enterprise IT services this year, as everyone scrambles to do more online, in the cloud and remotely to continue operating through a global health pandemic and whatever comes in its wake.

“Technology is no longer supporting the business, technology is the business,” McDermott said earlier this year. In a tight market where it is completely plausible that Salesforce might scoop up Slack, ServiceNow is making a play for more tools to cover its own patch of the field.

“AI technology is evolving rapidly as companies race to digitally transform 20th century processes and business models,” said ServiceNow Chief AI Officer Vijay Narayanan, in a statement today. “ServiceNow is leading this once-in-a-generation opportunity to make work, work better for people. With Element AI’s powerful capabilities and world class talent, ServiceNow will empower employees and customers to focus on areas where only humans excel – creative thinking, customer interactions, and unpredictable work. That’s a smarter way to workflow.”

Element AI was always a very ambitious concept for a startup. Dr. Yoshua Bengio, winner of the 2018 Turing Award, who co-founded the company with AI expert Nicolas Chapados and Jean-François Gagné (Element AI’s CEO) alongside Anne Martel, Jean-Sebastien Cournoyer and Philippe Beaudoin, saw a gap in the market.

Their idea was to build AI services for businesses that were not tech companies in their DNA, but would still very much need to tap into the innovations of the tech world in order to continue growing and remaining competitive with said tech companies as the latter moved deeper into a wider range of industries and the companies themselves required increasing sophistication to operate and grow. They needed, in essence, to disrupt themselves before getting unceremoniously disrupted by someone else.

And on top of that, Element AI could work for and with the tech companies taking strategic investments in Element AI, as those investors wanted to tap some of that expertise themselves, as well as work with the startup to bring more services and win more deals in the enterprise. In addition to its four (sometimes fiercely competitive) investors, other backers included the likes of McKinsey.

Yet what form all of that would take was never completely clear.

When I covered the startup’s most recent tranche of funding last year, I noted that it wasn’t very forthcoming on who its customers actually were. Looking at its website, it still isn’t, although it does lay out several verticals where it aims to work. They include insurance, pharma, logistics, retail, supply chain, manufacturing, government and capital markets.

There were some other positive points. Element AI also played a strong ethics card with its AI For Good efforts, starting with work with Amnesty in 2018 and most recently Mozilla. Indeed, 2018 — a year after Element AI was founded — was also the year AI seemed to hit the mainstream consciousness — and also started to appear somewhat more creepy, with algorithmic misfires, pervasive facial recognition and more “automated” applications that didn’t work that well and so on — so launching an ethical aim definitely made sense.

But for all of that, it seems that there perhaps were not enough threads there to need a bigger cloth as a standalone business. Glassdoor reviews also speak of an endemic disorganization at the startup, which might not have helped, or was perhaps a sign of bigger issues.

“Element AI’s vision has always been to redefine how companies use AI to help people work smarter,” said Element AI founder and CEO, Jean-Francois Gagné in a statement. “ServiceNow is leading the workflow revolution and we are inspired by its purpose to make the world of work, work better for people. ServiceNow is the clear partner for us to apply our talent and technology to the most significant challenges facing the enterprise today.”

The acquisition is expected to be completed by early 2021.

Nov
29
2020
--

Talking Drupal #272 – GitLab Update

In episode #223, August 2019, Tim Lehnen, from the Drupal Association, joined us to talk about Drupal git Org moving to GitLab. A year and a half later Tim joins again to talk about project status and recent advancements.

www.talkingdrupal.com/272

Topics

  • Nic – Facemasks
  • John – Everything
  • Stephen – Teachers
  • Tim – Shout out
  • What is the Gitlab project?
  • GitLab project status
  • What is next?
  • Biggest obstacles
  • Next steps

Resources

Documentation

Blog overview w/ demo vid

Examples merge requests – Core, Contrib

All Drupal merge request activity so far

Hosts

Stephen Cross – www.stephencross.com @stephencross

John Picozzi – www.oomphinc.com @johnpicozzi

Nic Laflin – www.nLighteneddevelopment.com @nicxvan

Guest

Tim Lehnen  @TimLehnen  www.drupal.org/u/hestenet

 

Nov
27
2020
--

Wall Street needs to relax, as startups show remote work is here to stay

We are hearing that a COVID-19 vaccine could be on the way sooner than later, and that means we could be returning to normal life some time in 2021. That’s the good news. The perplexing news, however, is that each time some positive news emerges about a vaccine — and believe me I’m not complaining — Wall Street punishes stocks it thinks benefits from us being stuck at home. That would be companies like Zoom and Peloton.

While I’m not here to give investment advice, I’m confident that these companies are going to be fine even after we return to the office. While we surely pine for human contact, office brainstorming, going out to lunch with colleagues and just meeting and collaborating in the same space, it doesn’t mean we will simply return to life as it was before the pandemic and spend five days a week in the office.

One thing is clear in my discussions with startups born or growing up during the pandemic: They have learned to operate, hire and sell remotely, and many say they will continue to be remote-first when the pandemic is over. Established larger public companies like Dropbox, Facebook, Twitter, Shopify and others have announced they will continue to offer a remote-work option going forward. There are many other such examples.

It’s fair to say that we learned many lessons about working from home over this year, and we will carry them with us whenever we return to school and the office — and some percentage of us will continue to work from home at least some of the time, while a fair number of businesses could become remote-first.

Wall Street reactions

On November 9, news that the Pfizer vaccine was at least 90% effective threw the markets for a loop. The summer trade, in which investors moved capital from traditional, non-tech industries and pushed it into software shares, flipped; suddenly the stocks that had been riding a pandemic wave were losing ground while old-fashioned, even stodgy, companies shot higher.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com