Aug
18
2018
--

Distributed teams are rewriting the rules of office(less) politics

When we think about designing our dream home, we don’t think of having a thousand roommates in the same room with no doors or walls. Yet in today’s workplace where we spend most of our day, the purveyors of corporate office design insist that tearing down walls and bringing more people closer together in the same physical space will help foster better collaboration while dissolving the friction of traditional hierarchy and office politics.

But what happens when there is no office at all?

This is the reality for Jason Fried, Founder and CEO of Basecamp, and Matt Mullenweg, Founder and CEO of Automattic (makers of WordPress), who both run teams that are 100% distributed across six continents and many time zones. Fried and Mullenweg are the founding fathers of a movement that has inspired at least a dozen other companies to follow suit, including Zapier, Github, and Buffer. Both have either written a book, or have had a book written about them on the topic.

For all of the discussions about how to hire, fire, coordinate, motivate, and retain remote teams though, what is strangely missing is a discussion about how office politics changes when there is no office at all. To that end, I wanted to seek out the experience of these companies and ask: does remote work propagate, mitigate, or change the experience of office politics? What tactics are startups using to combat office politics, and are any of them effective?

“Can we take a step back here?”

Office politics is best described by a simple example. There is a project, with its goals, metrics, and timeline, and then there’s who gets to decide how it’s run, who gets to work on it, and who gets credit for it. The process for deciding this is a messy human one. While we all want to believe that these decisions are merit-based, data-driven, and objective, we all know the reality is very different. As a flood of research shows, they come with the baggage of human bias in perceptions, heuristics, and privilege.

Office politics is the internal maneuvering and positioning to shape these biases and perceptions to achieve a goal or influence a decision. When incentives are aligned, these goals point in same direction as the company. When they don’t, dysfunction ensues.

Perhaps this sounds too Darwinian, but it is a natural and inevitable outcome of being part of any organization where humans make the decisions. There is your work, and then there’s the management of your coworker’s and boss’s perception of your work.

There is no section in your employee handbook that will tell you how to navigate office politics. These are the tacit, unofficial rules that aren’t documented. This could include reworking your wardrobe to match your boss’s style (if you don’t believe me, ask how many people at Facebook own a pair of Nike Frees). Or making time to go to weekly happy hour not because you want to, but because it’s what you were told you needed to do to get ahead.

One of my favorite memes about workplace culture is Sarah Cooper’s “10 Tricks to Appear Smart in Meetings,” which includes…

  • Encouraging everyone to “take a step back” and ask “what problem are we really trying to solve”
  • Nodding continuously while appearing to take notes
  • Stepping out to take an “important phone call”
  • Jumping out of your seat to draw a Venn diagram on the whiteboard

Sarah Cooper, The Cooper Review

These cues and signals used in physical workplaces to shape and influence perceptions do not map onto the remote workplace, which gives us a unique opportunity to study how office politics can be different through the lens of the officeless.

Friends without benefits

For employees, the analogy that coworkers are like family is true in one sense — they are the roommates that we never got to choose. Learning to work together is difficult enough, but the physical office layers on the additional challenge of learning to live together. Contrast this with remote workplaces, which Mullenweg of Automattic believes helps alleviate the “cohabitation annoyances” that come with sharing the same space, allowing employees to focus on how to best work with each other, versus how their neighbor “talks too loud on the phone, listens to bad music, or eats smelly food.”

Additionally, remote workplaces free us of the tyranny of the tacit expectations and norms that might not have anything to do with work itself. At an investment bank, everyone knows that analysts come in before the managing director does, and leave after they do. This signals that you’re working hard.

Basecamp’s Fried calls this the “presence prison,” the need to be constantly aware of where your coworkers are and what they are doing at all times, both physically and virtually. And he’s waging a crusade against it, even to the point of removing the green dot on Basecamp’s product. “As a general rule, nobody at Basecamp really knows where anyone else is at any given moment. Are they working? Dunno. Are they taking a break? Dunno. Are they at lunch? Dunno. Are they picking up their kid from school? Dunno. Don’t care.”

There is credible basis for this practice. A study of factory workers by Harvard Business School showed that workers were 10% to 15% more productive when managers weren’t watching. This increase was attributed to giving workers the space and freedom to experiment with different approaches before explaining to managers, versus the control group which tended to follow prescribed instructions under the leery watch of their managers.

Remote workplaces experience a similar phenomenon, but by coincidence. “Working hard” can’t be observed physically so it has to be explained, documented, measured, and shared across the company. Cultural norms are not left to chance, or steered by fear or pressure, which should give individuals the autonomy to focus on the work itself, versus how their work is perceived.

Lastly, while physical workplaces can be the source of meaningful friendships and community, recent research by the Wharton School of Business is just beginning to unravel the complexities behind workplace friendships, which can be fraught with tensions from obligations, reciprocity and allegiances. When conflicts arise, you need to choose between what’s best for the company, and what’s best for your relationship with that person or group. You’re not going to help Bob because your best friend Sally used to date him and he was a dick. Or you’re willing to do anything for Jim because he coaches your kid’s soccer team, and vouched for you to get that promotion.

In remote workplaces, you don’t share the same neighborhood, your kids don’t go to the same school, and you don’t have to worry about which coworkers to invite to dinner parties. Your physical/personal and work communities don’t overlap, which means you (and your company) unintentionally avoid many of the hazards of toxic workplace relationships.

On the other hand, these same relationships can be important to overall employee engagement and well-being. This is evidenced by one of the findings in Buffer’s 2018 State of Remote Work Report, which surveyed over 1900 remote workers around the world. It found that next to collaborating and communicating, loneliness was the biggest struggle for remote workers.

Graph by Buffer (State of Remote Work 2018)

So while you may be able to feel like your own boss and avoid playing office politics in your home office, ultimately being alone may be more challenging than putting on a pair of pants and going to work.

Feature, not a bug?

Physical offices can have workers butting heads with each other. Image by UpperCut Images via Getty Images.

For organizations, the single biggest difference between remote and physical teams is the greater dependence on writing to establish the permanence and portability of organizational culture, norms and habits. Writing is different than speaking because it forces concision, deliberation, and structure, and this impacts how politics plays out in remote teams.

Writing changes the politics of meetings. Every Friday, Zapier employees send out a bulletin with: (1) things I said I’d do this week and their results, (2) other issues that came up, (3) things I’m doing next week. Everyone spends the first 10 minutes of the meeting in silence reading everyone’s updates.

Remote teams practice this context setting out of necessity, but it also provides positive auxiliary benefits of “hearing” from everyone around the table, and not letting meetings default to the loudest or most senior in the room. This practice can be adopted by companies with physical workplaces as well (in fact, Zapier CEO Wade Foster borrowed this from Amazon), but it takes discipline and leadership to change behavior, particularly when it is much easier for everyone to just show up like they’re used to.

Writing changes the politics of information sharing and transparency. At Basecamp, there are no all-hands or town hall meetings. All updates, decisions, and subsequent discussions are posted publicly to the entire company. For companies, this is pretty bold. It’s like having a Facebook wall with all your friends chiming in on your questionable decisions of the distant past that you can’t erase. But the beauty is that there is now a body of written decisions and discussions that serves as a rich and permanent artifact of institutional knowledge, accessible to anyone in the company. Documenting major decisions in writing depoliticizes access to information.

Remote workplaces are not without their challenges. Even though communication can be asynchronous through writing, leadership is not. Maintaining an apolitical culture (or any culture) requires a real-time feedback loop of not only what is said, but what is done, and how it’s done. Leaders lead by example in how they speak, act, and make decisions. This is much harder in a remote setting.

A designer from WordPress notes the interpersonal challenges of leading a remote team. “I can’t always see my teammates’ faces when I deliver instructions, feedback, or design criticism. I can’t always tell how they feel. It’s difficult to know if someone is having a bad day or a bad week.”

Zapier’s Foster is also well aware of these challenges in interpersonal dynamics. In fact, he has written a 200-page manifesto on how to run remote teams, where he has an entire section devoted to coaching teammates on how to meet each other for the first time. “Because we’re wired to look for threats in any new situation… try to limit phone or video calls to 15 minutes.” Or “listen without interrupting or sharing your own stories.” And to “ask short, open ended questions.” For anyone looking for a grade school refresher on how to make new friends, Wade Foster is the Dale Carnegie of the remote workforce.

To office, or not to office

What we learn from companies like Basecamp, Automattic, and Zapier is that closer proximity is not the antidote for office politics, and certainly not the quick fix for a healthy, productive culture.

Maintaining a healthy culture takes work, with deliberate processes and planning. Remote teams have to work harder to design and maintain these processes because they don’t have the luxury of assuming shared context through a physical workspace.

The result is a wealth of new ideas for a healthier, less political culture — being thoughtful about when to bring people together, and when to give people their time apart (ending the presence prison), or when to speak, and when to read and write (to democratize meetings). It seems that remote teams have largely succeeded in turning a bug into a feature. For any company still considering tearing down those office walls and doors, it’s time to pay attention to the lessons of the officeless.

Aug
17
2018
--

Percona Server for MySQL 5.6.41-84.1 Is Now Available

Percona Server for MySQL 5.6

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.6.41-84.1 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.6.41, including all the bug fixes in it. Percona Server for MySQL 5.6.41-84.1 is now the current GA release in the 5.6 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • A simple SELECT query on a table with CHARSET=euckr COLLATE=euckr_bin could return different results each time it was executed. Bug fixed #4513 (upstream 91091).
  • Percona Server 5.6.39-83.1 could crash when altering an InnoDB table that has a full-text search index defined. Bug fixed #3851 (upstream 68987).
Other Bugs Fixed
  • #3325 “online upgrade GTID cause binlog damage in high write QPS situation”
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”
  • #4506 “Backport fixes from 8.0 for InnoDB memcached Plugin”

Find the release notes for Percona Server for MySQL 5.6.41-84.1 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.6.41-84.1 Is Now Available appeared first on Percona Database Performance Blog.

Aug
17
2018
--

Percona Server for MySQL 5.5.61-38.13 Is Now Available

Percona Server for MySQL

Percona Server for MySQL 5.6Percona announces the release of Percona Server for MySQL 5.5.61-38.13 on August 17, 2018 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.5.61, including all the bug fixes in it. Percona Server for MySQL 5.5.61-38.13 is now the current GA release in the 5.5 series. All of Percona’s software is open-source and free.

Bugs Fixed
  • The --innodb-optimize-keys option of the mysqldump utility fails when a column name is used as a prefix of a column which has the AUTO_INCREMENT attribute. Bug fixed #4524.
Other Bugs Fixed
  • #4566 “stack-use-after-scope in reinit_io_cache()” (upstream 91603)
  • #4581 “stack-use-after-scope in _db_enter_() / mysql_select_db()” (upstream 91604)
  • #4600 “stack-use-after-scope in _db_enter_() / get_upgrade_info_file_name()” (upstream 91617)
  • #3976 “Errors in MTR tests main.variables-big, main.information_schema-big, innodb.innodb_bug14676111”

Find the release notes for Percona Server for MySQL 5.5.61-38.13 in our online documentation. Report bugs in the Jira bug tracker.

The post Percona Server for MySQL 5.5.61-38.13 Is Now Available appeared first on Percona Database Performance Blog.

Aug
17
2018
--

Incentivai launches to simulate how hackers break blockchains

Cryptocurrency projects can crash and burn if developers don’t predict how humans will abuse their blockchains. Once a decentralized digital economy is released into the wild and the coins start to fly, it’s tough to implement fixes to the smart contracts that govern them. That’s why Incentivai is coming out of stealth today with its artificial intelligence simulations that test not just for security holes, but for how greedy or illogical humans can crater a blockchain community. Crypto developers can use Incentivai’s service to fix their systems before they go live.

“There are many ways to check the code of a smart contract, but there’s no way to make sure the economy you’ve created works as expected,” says Incentivai’s solo founder Piotr Grudzie?. “I came up with the idea to build a simulation with machine learning agents that behave like humans so you can look into the future and see what your system is likely to behave like.”

Incentivai will graduate from Y Combinator next week and already has a few customers. They can either pay Incentivai to audit their project and produce a report, or they can host the AI simulation tool like a software-as-a-service. The first deployments of blockchains it’s checked will go out in a few months, and the startup has released some case studies to prove its worth.

“People do theoretical work or logic to prove that under certain conditions, this is the optimal strategy for the user. But users are not rational. There’s lots of unpredictable behavior that’s difficult to model,” Grudzie? explains. Incentivai explores those illogical trading strategies so developers don’t have to tear out their hair trying to imagine them.

Protecting crypto from the human x-factor

There’s no rewind button in the blockchain world. The immutable and irreversible qualities of this decentralized technology prevent inventors from meddling with it once in use, for better or worse. If developers don’t foresee how users could make false claims and bribe others to approve them, or take other actions to screw over the system, they might not be able to thwart the attack. But given the right open-ended incentives (hence the startup’s name), AI agents will try everything they can to earn the most money, exposing the conceptual flaws in the project’s architecture.

“The strategy is the same as what DeepMind does with AlphaGo, testing different strategies,” Grudzie? explains. He developed his AI chops earning a masters at Cambridge before working on natural language processing research for Microsoft.

Here’s how Incentivai works. First a developer writes the smart contracts they want to test for a product like selling insurance on the blockchain. Incentivai tells its AI agents what to optimize for and lays out all the possible actions they could take. The agents can have different identities, like a hacker trying to grab as much money as they can, a faker filing false claims or a speculator that cares about maximizing coin price while ignoring its functionality.

Incentivai then tweaks these agents to make them more or less risk averse, or care more or less about whether they disrupt the blockchain system in its totality. The startup monitors the agents and pulls out insights about how to change the system.

For example, Incentivai might learn that uneven token distribution leads to pump and dump schemes, so the developer should more evenly divide tokens and give fewer to early users. Or it might find that an insurance product where users vote on what claims should be approved needs to increase its bond price that voters pay for verifying a false claim so that it’s not profitable for voters to take bribes from fraudsters.

Grudzie? has done some predictions about his own startup too. He thinks that if the use of decentralized apps rises, there will be a lot of startups trying to copy his approach to security services. He says there are already some doing token engineering audits, incentive design and consultancy, but he hasn’t seen anyone else with a functional simulation product that’s produced case studies. “As the industry matures, I think we’ll see more and more complex economic systems that need this.”

Aug
17
2018
--

Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon

postgres mysql replication using pg_chameleon

postgres mysql replication using pg_chameleonReplication is one of the well-known features that allows us to build an identical copy of a database. It is supported in almost every RDBMS. The advantages of replication may be huge, especially HA (High Availability) and load balancing. But what if we need to build replication between 2 heterogeneous databases like MySQL and PostgreSQL? Can we continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is pg_chameleon.

For replicating continuous changes, pg_chameleon uses the mysql-replication library to pull the row images from MySQL, which are transformed into a jsonb object. A pl/pgsql function in postgres decodes the jsonb and replays the changes into the postgres database. In order to setup this type of replication, your mysql binlog_format must be “ROW”.

A few points you should know before setting up this tool :

  1. Tables that need to be replicated must have a primary key.
  2. Works for PostgreSQL versions > 9.5 and MySQL > 5.5
  3. binlog_format must be ROW in order to setup this replication.
  4. Python version must be > 3.3

When you initialize the replication, pg_chameleon pulls the data from MySQL using the CSV format in slices, to prevent memory overload. This data is flushed to postgres using the COPY command. If COPY fails, it tries INSERT, which may be slow. If INSERT fails, then the row is discarded.

To replicate changes from mysql, pg_chameleon mimics the behavior of a mysql slave. It creates the schema in postgres, performs the initial data load, connects to MySQL replication protocol, stores the row images into a table in postgres. Now, the respective functions in postgres decode those rows and apply the changes. This is similar to storing relay logs in postgres tables and applying them to a postgres schema. You do not have to create a postgres schema using any DDLs. This tool automatically does that for the tables configured for replication. If you need to specifically convert any types, you can specify this in the configuration file.

The following is just an exercise that you can experiment with and implement if it completely satisfies your requirement. We performed these tests on CentOS Linux release 7.4.

Prepare the environment

Set up Percona Server for MySQL

InstallMySQL 5.7 and add appropriate parameters for replication.

In this exercise, I have installed Percona Server for MySQL 5.7 using YUM repo.

yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

pg_chameleon requires the following the parameters to be set in your my.cnf file (parameter file of your MySQL server). You may add the following parameters to /etc/my.cnf

binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

Now start your MySQL server after adding the above parameters to your my.cnf file.

$ service mysql start

Fetch the temporary root password from mysqld.log, and reset the root password using mysqladmin

$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password 'Secret123!'

Now, connect to your MySQL instance and create sample schema/tables. I have also created an emp table for validation.

$ wget http://downloads.mysql.com/docs/sakila-db.tar.gz
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user for configuring replication using pg_chameleon and give appropriate privileges to the user using the following steps.

$ mysql -uroot -p
create user 'usr_replica'@'%' identified by 'Secret123!';
GRANT ALL ON sakila.* TO 'usr_replica'@'%';
GRANT RELOAD, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'usr_replica'@'%';
FLUSH PRIVILEGES;

While creating the user in your mysql server (‘usr_replica’@’%’), you may wish to replace % with the appropriate IP or hostname of the server on which pg_chameleon is running.

Set up PostgreSQL

Install PostgreSQL and start the database instance.

You may use the following steps to install PostgreSQL 10.x

yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install postgresql10*
su - postgres
$/usr/pgsql-10/bin/initdb
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As seen in the following logs, create a user in PostgreSQL using which pg_chameleon can write changed data to PostgreSQL. Also create the target database.

postgres=# CREATE USER usr_replica WITH ENCRYPTED PASSWORD 'secret';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

Steps to install and setup replication using pg_chameleon

Step 1: In this exercise, I installed Python 3.6 and pg_chameleon 2.0.8 using the following steps. You may skip the python install steps if you already have the desired python release. We can create a virtual environment if the OS does not include Python 3.x by default.

yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2: This tool requires a configuration file to store the source/target server details, and a directory to store the logs. Use the following command to let pg_chameleon create the configuration file template and the respective directories for you.

$ chameleon set_configuration_files

The above command would produce the following output, which shows that it created some directories and a file in the location where you ran the command.

creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, lets say, default.yml

$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

Here is how my default.yml file looks after adding all the required parameters. In this file, we can optionally specify the data type conversions, tables to skipped from replication and the DML events those need to skipped for selected list of tables.

---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''
# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"
#postgres  destination connection
pg_conn:
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
sources:
  mysql:
    db_conn:
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      sakila: sch_sakila
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
#        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

Step 3: Initialize the replica using this command:

$ chameleon create_replica_schema --debug

The above command creates a schema and nine tables in the PostgreSQL database that you specified in the .pg_chameleon/configuration/default.yml file. These tables are needed to manage replication from source to destination. The same can be observed in the following log.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | target_user
(2 rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner
---------------+------------------+-------+-------------
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9 rows)

Step 4: Add the source details to pg_chameleon using the following command. Provide the name of the source as specified in the configuration file. In this example, the source name is mysql and the target is postgres database defined under pg_conn.

$ chameleon add_source --config default --source mysql --debug

Once you run the above command, you should see that the source details are added to the t_sources table.

db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5: Initialize the replica/slave using the following command. Specify the source from which you are replicating the changes to the PostgreSQL database.

$ chameleon init_replica --config default --source mysql --debug

Initialization involves the following tasks on the MySQL server (source).

1. Flush the tables with read lock
2. Get the master’s coordinates
3. Copy the data
4. Release the locks

The above command creates the target schema in your postgres database automatically.
In the default.yml file, we mentioned the following schema_mappings.

schema_mappings:
sakila: sch_sakila

So, now it created the new schema scott in the target database db_replica.

db_replica=# \dn
List of schemas
Name | Owner
---------------+-------------
public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3 rows)

Step 6: Now, start replication using the following command.

$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and any errors using the following commands.

$ chameleon show_status --config default
$ chameleon show_errors

This is how the status looks:

$ chameleon show_status --source mysql
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema
--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---
Tables not replicated 0
Tables replicated 17
All tables 17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now, you should see that the changes are continuously getting replicated from MySQL to PostgreSQL.

Step 8:  To validate, you may insert a record into the table in MySQL that we created for the purpose of validation and check that it is replicated to postgres.

$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a password on the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
----+------------+-----------
  1 | avinash    | vallarapu
(1 row)

In the above log, we see that the record that was inserted to the MySQL table was replicated to the PostgreSQL table.

You may also add multiple sources for replication to PostgreSQL (target).

Reference : http://www.pgchameleon.org/documents/

Please refer to the above documentation to find out about the many more options that are available with pg_chameleon

The post Replication from Percona Server for MySQL to PostgreSQL using pg_chameleon appeared first on Percona Database Performance Blog.

Aug
17
2018
--

Klarity uses AI to strip drudgery from contract review

Klarity, a member of the Y Combinator 2018 Summer class, wants to automate much of the contract review process by applying artificial intelligence, specifically natural language processing.

Company co-founder and CEO Andrew Antos has experienced the pain of contract reviews first hand. After graduating from Harvard Law, he landed a job spending 16 hours a day reviewing contract language, a process he called mind-numbing. He figured there had to be a way to put technology to bear on the problem and Klarity was born.

“A lot of companies are employing internal or external lawyers because their customers, vendors or suppliers are sending them a contract to sign,” Antos explained They have to get somebody to read it, understand it and figure out whether it’s something that they can sign or if it requires specific changes.

You may think that this kind of work would be difficult to automate, but Antos said that  contracts have fairly standard language and most companies use ‘playbooks.’ “Think of the playbook as a checklist for NDAs, sales agreements and vendor agreements — what they are looking for and specific preferences on what they agree to or what needs to be changed,” Antos explained.

Klarity is a subscription cloud service that checks contracts in Microsoft Word documents using NLP. It makes suggestions when it sees something that doesn’t match up with the playbook checklist. The product then generates a document, and a human lawyer reviews and signs off on the suggested changes, reducing the review time from an hour or more to 10 or 15 minutes.

Screenshot: Klarity

They launched the first iteration of the product last year and have 14 companies using it with 4 paying customers so far including one of the world’s largest private equity funds. These companies signed on because they have to process huge numbers of contracts. Klarity is helping them save time and money, while applying their preferences in a consistent fashion, something that a human reviewer can have trouble doing.

He acknowledges the solution could be taking away work from human lawyers, something they think about quite a bit. Ultimately though, they believe that contract reviewing is so tedious, it is freeing up lawyers for work that requires a greater level of intellectual rigor and creativity.

Antos met his co-founder and CTO, Nischal Nadhamuni, at an MIT entrepreneurship class in 2016 and the two became fast friends. In fact, he says that they pretty much decided to start a company the first day. “We spent 3 hours walking around Cambridge and decided to work together to solve this real problem people are having.”

They applied to Y Combinator two other times before being accepted in this summer’s cohort. The third time was the charm. He says the primary value of being in YC is the community and friendships they have formed and the help they have had in refining their approach.

“It’s like having a constant mirror that helps you realize any mistakes or any suboptimal things in your business on a high speed basis,” he said.

Aug
17
2018
--

This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Beyond the MongoDB content that will be at Percona Live Europe 2018, there is also a bit of an agenda for MongoDB Europe 2018, happening on November 8 in London—a day after Percona Live in Frankfurt. I expect you’ll see a diverse set of MongoDB content at Percona Live.

The Percona Live Europe Call for Papers closes TODAY! (Friday August 17, 2018)

From Amazon, there have been some good MySQL changes. You now have access to time delayed replication as a strategy for your High Availability and disaster recovery. This works with versions 5.7.22, 5.6.40 and later. It is worth noting that this isn’t documented as working for MariaDB (yet?). It arrived in MariaDB Server in 10.2.3.

Another MySQL change from Amazon? Aurora Serverless MySQL is now generally available. You can build and run applications without thinking about instances: previously, the database function was not all that focused on serverless. This on-demand auto-scaling serverless Aurora should be fun to use. Only Aurora MySQL 5.6 is supported at the moment and also, be aware that this is not available in all regions yet (e.g. Singapore).

Releases

  • pgmetrics is described as an open-source, zero-dependency, single-binary tool that can collect a lot of information and statistics from a running PostgreSQL server and display it in easy-to-read text format or export it as JSON for scripting.
  • PostgreSQL 10.5, 9.6.10, 9.5.14, 9.4.19, 9.3.24, And 11 Beta 3 has two fixed security vulnerabilities may inspire an upgrade.

Link List

Industry Updates

  • Martin Arrieta (LinkedIn) is now a Site Reliability Engineer at Fastly. Formerly of Pythian and Percona.
  • Ivan Zoratti (LinkedIn) is now Director of Product Management at Neo4j. He was previously on founding teams, was the CTO of MariaDB Corporation (then SkySQL), and is a long time MySQL veteran.

Upcoming Appearances

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

 

The post This Week in Data with Colin Charles 49: MongoDB Conference Opportunities and Serverless Aurora MySQL appeared first on Percona Database Performance Blog.

Aug
16
2018
--

Work-Bench enterprise report predicts end of SaaS could be coming

Work-Bench, a New York City venture capital firm that spends a lot of time around Fortune 1000 companies, has put together The Work-Bench Enterprise Almanac: 2018 Edition, which you could think of as a State of the Enterprise report. It’s somewhat like Mary Meeker’s Internet Trends report, but with a focus on the tools and technologies that will be having a major impact on the enterprise in the coming year.

Perhaps the biggest take-away from the report could be that the end of SaaS as we’ve known could be coming if modern tools make it easier for companies to build software themselves. More on this later.

While the report writers state that their findings are based at least partly on anecdotal evidence, it is clearly an educated set of observations and predictions related to the company’s work with enterprise startups and the large companies they tend to target.

As they wrote in their Medium post launching the report, “Our primary aim is to help founders see the forest from the trees. For Fortune 1000 executives and other players in the ecosystem, it will help cut through the noise and marketing hype to see what really matters.” Whether that’s the case will be in the eye of the reader, but it’s a comprehensive attempt to document the state of the enterprise as they see it, and there are not too many who have done that.

The big picture

The report points out the broader landscape in which enterprise companies — startups and established players alike — are operating today. You have traditional tech companies like Cisco and HP, the mega cloud companies like Amazon, Microsoft and Google, the Growth Guard with companies like Snowflake, DataDog and Sumo Logic and the New Guard, those early stage enterprise companies gunning for the more established players.

 

As the report states, the mega cloud players are having a huge impact on the industry by providing the infrastructure services for startups to launch and grow without worrying about building their own data centers or scaling to meet increasing demand as a company develops.

The mega clouders also scoop up a fair number of startups. Yet they don’t devote quite the level of revenue to M&A as you might think based on how acquisitive the likes of Salesforce, Microsoft and Oracle have tended to be over the years. In fact, in spite of all the action and multi-billion deals we’ve seen, Work-Bench sees room for even more.

It’s worth pointing out that Work-Bench predicts Salesforce itself could become a target for mega cloud M&A action. They are predicting that either Amazon or Microsoft could buy the CRM giant. We saw such speculation several years ago and it turned out that Salesforce was too rich for even these company’s blood. While they may have more cash to spend, the price has probably only gone up as Salesforce acquires more and more companies and its revenue has surpassed $10 billion.

About those mega trends

The report dives into 4 main areas of coverage, none of which are likely to surprise you if you read about the enterprise regularly in this or other publications:

  • Machine Learning
  • Cloud
  • Security
  • SaaS

While all of these are really interconnected as SaaS is part of the cloud and all need security and will be (if they aren’t already) taking advantage of machine learning. Work-Bench is not seeing it in such simple terms, of course, diving into each area in detail.

The biggest take-away is perhaps that infrastructure could end up devouring SaaS in the long run. Software as a Service grew out of couple of earlier trends, the first being the rise of the Web as a way to deliver software, then the rise of mobile to move it beyond the desktop. The cloud-mobile connection is well documented and allowed companies like Uber and Airbnb, as just a couple of examples, to flourish by providing scalable infrastructure and a computer in our pockets to access their services whenever we needed them. These companies could never have existed without the combination of cloud-based infrastructure and mobile devices.

End of SaaS dominance?

But today, Work-Bench is saying that we are seeing some other trends that could be tipping the scales back to infrastructure. That includes containers and microservices, serverless, Database as a Service and React for building front ends. Work-Bench argues that if every company is truly a software company, these tools could make it easier for companies to build these kind of services cheaply and easily, and possibly bypass the SaaS vendors.

What’s more, they suggest that if these companies are doing mass customization to these services, then it might make more sense to build instead of buy, at least on one level. In the past, we have seen what happens when companies try to take these kinds of massive software projects on themselves and it hardly ever ended well. They were usually bulky, difficult to update and put the companies behind the curve competitively. Whether simplifying the entire developer tool kit would change that remains to be seen.

They don’t necessarily see companies running wholesale away from SaaS just yet to do this, but they do wonder if developers could push this trend inside of organizations as more tools appear on the landscape to make it easier to build your own.

The remainder of the report goes in depth into each of these trends, and this article just has scratched the surface of the information you’ll find there. The entire report is embedded below.

Aug
16
2018
--

Cisco’s $2.35 billion Duo acquisition front and center at earnings call

When Cisco bought Ann Arbor, Michigan security company, Duo for a whopping $2.35 billion earlier this month, it showed the growing value of security and security startups in the view of traditional tech companies like Cisco.

In yesterday’s earnings report, even before the ink had dried on the Duo acquisition contract, Cisco was reporting that its security business grew 12 percent year over year to $627 million. Given those numbers, the acquisition was top of mind in CEO Chuck Robbins’ comments to analysts.

“We recently announced our intent to acquire Duo Security to extend our intent-based networking portfolio into multi- cloud environments. Duo’s SaaS delivered solution will expand our cloud security capabilities to help enable any user on any device to securely connect to any application on any network,” he told analysts.

Indeed, security is going to continue to take center stage moving forward. “Security continues to be our customers number one concern and it is a top priority for us. Our strategy is to simplify and increase security efficacy through an architectural approach with products that work together and share analytics and actionable threat intelligence,” Robbins said.

That fits neatly with the Duo acquisition, whose guiding philosophy has been to simplify security. It is perhaps best known for its two-factor authentication tool. Often companies send a text with a code number to your phone after you change a password to prove it’s you, but even that method has proven vulnerable to attack.

What Duo does is send a message through its app to your phone asking if you are trying to sign on. You can approve if it’s you or deny if it’s not, and if you can’t get the message for some reason you can call instead to get approval. It can also verify the health of the app before granting access to a user. It’s a fairly painless and secure way to implement two-factor authentication, while making sure employees keep their software up-to-date.

Duo Approve/Deny tool in action on smartphone.

While Cisco’s security revenue accounted for a fraction of the company’s overall $12.8 billion for the quarter, the company clearly sees security as an area that could continue to grow.

Cisco hasn’t been shy about using its substantial cash holdings to expand in areas like security beyond pure networking hardware to provide a more diverse recurring revenue stream. The company currently has over $54 billion in cash on hand, according to Y Charts.

Cisco spent a fair amount money on Duo, which according to reports has $100 million in annual recurring revenue, a number that is expected to continue to grow substantially. It had raised over $121 million in venture investment since inception. In its last funding round in September 2017, the company raised $70 million on a valuation of $1.19 billion.

The acquisition price ended up more than doubling that valuation. That could be because it’s a security company with recurring revenue, and Cisco clearly wanted it badly as another piece in its security solutions portfolio, one it hopes can help keep pushing that security revenue needle ever higher.

Aug
16
2018
--

MongoDB: how to use the JSON Schema Validator

JSON Schema Validator for MongoDB

JSON Schema Validator for MongoDBThe flexibility of MongoDB as a schemaless database is one of its strengths. In early versions, it was left to application developers to ensure that any necessary data validation is implemented. With the introduction of JSON Schema Validator there are new techniques to enforce data integrity for MongoDB. In this article, we use examples to show you how to use the JSON Schema Validator to introduce validation checks at the database level—and consider the pros and cons of doing so.

Why validate?

MongoDB is a schemaless database. This means that we don’t have to define a fixed schema for a collection. We just need to insert a JSON document into a collection and that’s all. Documents in the same collection can have a completely different set of fields, and even the same fields can have different types on different documents. The same object can be a string in some documents and can be a number in other documents.

The schemaless feature has given MongoDB great flexibility and the capability to adapt the database to the changing needs of applications. Let’s say that this flexibility is one of the main reasons to use MongoDB. Relational databases are not so flexible: you always need to define a schema at first. Then, when you need to add new columns, create new tables or change existing architecture to respond to the needs of the applications it’s sometimes a very hard task.

The real world can often be messy and MongoDB can really help, but in most cases the real world requires some kind of backbone architecture too. In real applications built on MongoDB there is always some kind of “fixed schema” or “validation rules” in collections and in documents. It’s possible to have in a collection two documents that represent two completely different things.

Well, it’s technically possible, but it doesn’t make sense in most cases for the application. Most of the arguments for enforcing a schema on the data are well known: schemas maintain structure, giving a clear idea of what’s going into the database, reducing preventable bugs and allowing for cleaner code. Schemas are a form of self-documenting code, as they describe exactly what type of data something should be, and they let you know what checks will be performed. It’s good to be flexible, but behind the scenes we need some strong regulations.

So, what we need to do is to find a balance between flexibility and schema validation. In real world applications, we need to define a sort of “backbone schema” for our data and retain the possibility to be flexible to manage specific particularities. In the past developers implemented schema validation in their applications, but starting from version 3.6, MongoDB supports the JSON Schema Validator. We can rely on it to define a fixed schema and validation rules directly into the database and free the applications to take care of it.

Let’s have a look at how it works.

JSON Schema Validator

In fact, a “Validation Schema” was already introduced in 3.2 but the new “JSON Schema Validator” introduced in the 3.6 release is by far the best and a friendly way to manage validations in MongoDB.

What we need to do is to define the rules using the operator $jsonSchema in the db.createCollection command. The $jsonSchema operator requires a JSON document where we specify all the rules to be applied on each inserted or updated document: for example what are the required fields, what type the fields must be, what are the ranges of the values, what pattern a specific field must have, and so on.

Let’s have a look at the following example where we create a collection people defining validation rules with JSON Schema Validator.

db.createCollection( "people" , {
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "name", "surname", "email" ],
      properties: {
         name: {
            bsonType: "string",
            description: "required and must be a string" },
         surname: {
            bsonType: "string",
            description: "required and must be a string" },
         email: {
            bsonType: "string",
            pattern: "^.+\@.+$",
            description: "required and must be a valid email address" },
         year_of_birth: {
            bsonType: "int",
            minimum: 1900,
            maximum: 2018,
            description: "the value must be in the range 1900-2018" },
         gender: {
            enum: [ "M", "F" ],
            description: "can be only M or F" }
      }
   }
}})

Based on what we have defined, only 3 fields are strictly required in every document of the collection: name, surname, and email. In particular, the email field must have a specific pattern to be sure the content is a valid address. (Note: to validate an email address you need a more complex regular expression, here we just use a simpler version just to check there is the @ symbol). Other fields are not required but in case someone inserts them, we have defined a validation rule.

Let’s try to do some example inserting documents to test if everything is working as expected.

Insert a document with one of the required fields missing:

MongoDB > db.people.insert( { name : "John", surname : "Smith" } )
    WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Insert a document with all the required fields but with an invalid email address

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith.gmail.com" } )
   WriteResult({
      "nInserted" : 0,
      "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Finally, insert a valid document

MongoDB > db.people.insert( { name : "John", surname : "Smith", email : "john.smith@gmail.com" } )
WriteResult({ "nInserted" : 1 })

Let’s try now to do more inserts including of other fields.

MongoDB > db.people.insert( { name : "Bruce", surname : "Dickinson", email : "bruce@gmail.com", year_of_birth : NumberInt(1958), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Corrado", surname : "Pandiani", email : "corrado.pandiani@percona.com", year_of_birth : NumberInt(1971), gender : "M" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Marie", surname : "Adamson", email : "marie@gmail.com", year_of_birth : NumberInt(1992), gender : "F" } )
WriteResult({ "nInserted" : 1 })

The records were inserted correctly because all the rules on the required fields, and on the other not required fields, were satisfied. Let’s see now a case where the year_of_birth or gender fields are not correct.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(1980), gender : "X" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", year_of_birth : NumberInt(1899), gender : "F" } )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In the first query gender is X, but the valid values are only M or F. In the second query year of birth is outside the permitted range.

Let’s try now to insert documents with arbitrary extra fields that are not in the JSON Schema Validator.

MongoDB > db.people.insert( { name : "Tom", surname : "Tom", email : "tom@gmail.com", year_of_birth : NumberInt(2000), gender : "M", shirt_size : "XL", preferred_band : "Coldplay" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.insert( { name : "Luise", surname : "Luise", email : "tom@gmail.com", gender : "F", shirt_size : "M", preferred_band : "Maroon Five" } )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people.find().pretty()
{
"_id" : ObjectId("5b6b12e0f213dc83a7f5b5e8"),
"name" : "John",
"surname" : "Smith",
"email" : "john.smith@gmail.com"
}
{
"_id" : ObjectId("5b6b130ff213dc83a7f5b5e9"),
"name" : "Bruce",
"surname" : "Dickinson",
"email" : "bruce@gmail.com",
"year_of_birth" : 1958,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1328f213dc83a7f5b5ea"),
"name" : "Corrado",
"surname" : "Pandiani",
"email" : "corrado.pandiani@percona.com",
"year_of_birth" : 1971,
"gender" : "M"
}
{
"_id" : ObjectId("5b6b1356f213dc83a7f5b5ed"),
"name" : "Marie",
"surname" : "Adamson",
"email" : "marie@gmail.com",
"year_of_birth" : 1992,
"gender" : "F"
}
{
"_id" : ObjectId("5b6b1455f213dc83a7f5b5f0"),
"name" : "Tom",
"surname" : "Tom",
"email" : "tom@gmail.com",
"year_of_birth" : 2000,
"gender" : "M",
"shirt_size" : "XL",
"preferred_band" : "Coldplay"
}
{
"_id" : ObjectId("5b6b1476f213dc83a7f5b5f1"),
"name" : "Luise",
"surname" : "Luise",
"email" : "tom@gmail.com",
"gender" : "F",
"shirt_size" : "M",
"preferred_band" : "Maroon Five"
}

As we can see, we have the flexibility to add new fields with no restrictions on the permitted values.

Having a really fixed schema

The behavior we have seen so far to permit the addition of extra fields that are not in the validation rules is the default. If we would like to be more restrictive and have a really fixed schema for the collection we need to add the additionalProperties: false parameter in the createCollection command.

In the following example, we create a validator to permit only the required fields. No other extra fields are permitted.

db.createCollection( "people2" , {
   validator: {
     $jsonSchema: {
        bsonType: "object",
        additionalProperties: false,
        properties: {
           _id : {
              bsonType: "objectId" },
           name: {
              bsonType: "string",
              description: "required and must be a string" },
           age: {
              bsonType: "int",
              minimum: 0,
              maximum: 100,
              description: "required and must be in the range 0-100" }
        }
     }
}})

Note a couple of differences:

  • we don’t need to specify the list of required fields; using additionalProperties: false forces all the fields to be required by default
  • we need to put explicitly even the _id field

As you can notice in the following test, we are no longer allowed to add extra fields. We are forced to insert documents always with the same two fields name and age.

MongoDB > db.people2.insert( {name : "George", age: NumberInt(30)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people2.insert( {name : "Maria", age: NumberInt(35), surname: "Peterson"} )
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})

In this case we don’t have flexibility, and that is the main benefit of having a NoSQL database like MongoDB.

Well, it’s up to you to use it or not. It depends on the nature and goals of your application. I wouldn’t recommend it in most cases.

Add validation to existing collections

We have seen so far how to create a new collection with validation rules, But what about the existing collections? How can we add rules?

This is quite trivial. The syntax to use in $jsonSchema remains the same, we just need to use the collMod command instead of createCollection. The following example shows how to create validation rules on an existing collection.

First we create a simple new collection people3, inserting some documents.

MongoDB > db.people3.insert( {name: "Corrado", surname: "Pandiani", year_of_birth: NumberLong(1971)} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Tom", surname: "Cruise", year_of_birth: NumberLong(1961), gender: "M"} )
WriteResult({ "nInserted" : 1 })
MongoDB > db.people3.insert( {name: "Kevin", surname: "Bacon", year_of_birth: NumberLong(1964), gender: "M", shirt_size: "L"} )
WriteResult({ "nInserted" : 1 })

Let’s create the validator.

MongoDB > db.runCommand( { collMod: "people3",
   validator: {
      $jsonSchema : {
         bsonType: "object",
         required: [ "name", "surname", "gender" ],
         properties: {
            name: {
               bsonType: "string",
               description: "required and must be a string" },
            surname: {
               bsonType: "string",
               description: "required and must be a string" },
            gender: {
               enum: [ "M", "F" ],
               description: "required and must be M or F" }
         }
       }
},
validationLevel: "moderate",
validationAction: "warn"
})

The two new options validationLevel and validationAction are important in this case.

validationLevel can have the following values:

  • “off” : validation is not applied
  • “strict”: it’s the default value. Validation applies to all inserts and updates
  • “moderated”: validation applies to all the valid existing documents. Not valid documents are ignored.

When creating validation rules on existing collections, the “moderated” value is the safest option.

validationAction can have the following values:

  • “error”: it’s the default value. The document must pass the validation in order to be written
  • “warn”: a document that doesn’t pass the validation is written but a warning message is logged

When adding validation rules to an existing collection the safest option is “warn”

These two options can be applied even with createCollection. We didn’t use them because the default values are good in most of the cases.

How to investigate a collection definition

In case we want to see how a collection was defined, and, in particular, what the validator rules are, the command db.getCollectionInfos() can be used. The following example shows how to investigate the “schema” we have created for the people collection.

MongoDB > db.getCollectionInfos( {name: "people"} )
[
  {
    "name" : "people",
    "type" : "collection",
    "options" : {
      "validator" : {
        "$jsonSchema" : {
          "bsonType" : "object",
          "required" : [
            "name",
            "surname",
            "email"
          ],
          "properties" : {
            "name" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "surname" : {
              "bsonType" : "string",
              "description" : "required and must be a string"
            },
            "email" : {
              "bsonType" : "string",
              "pattern" : "^.+@.+$",
              "description" : "required and must be a valid email address"
             },
             "year_of_birth" : {
               "bsonType" : "int",
               "minimum" : 1900,
               "maximum" : 2018,
               "description" : "the value must be in the range 1900-2018"
             },
             "gender" : {
               "enum" : [
                 "M",
                 "F"
               ],
             "description" : "can be only M or F"
        }
      }
    }
  }
},
"info" : {
  "readOnly" : false,
  "uuid" : UUID("5b98c6f0-2c9e-4c10-a3f8-6c1e7eafd2b4")
},
"idIndex" : {
  "v" : 2,
  "key" : {
    "_id" : 1
  },
"name" : "_id_",
"ns" : "test.people"
}
}
]

Limitations and restrictions

Validators cannot be defined for collections in the following databases: admin, local, config.

Validators cannot be defined for system.* collections.

A limitation in the current implementation of JSON Schema Validator is that the error messages are not very good in terms of helping you to understand which of the rules are not satisfied by the document. This should be confirmed manually by doing some tests, and that’s not so easy when dealing with complex documents. Having more specific error strings, hopefully taken from the validator definition, could be very useful when debugging application errors and warnings. This is definitely something that should be improved in the next releases.

While waiting for improvements, someone has developed a wrapper for the mongo client to gather more defined error strings. You can have a look at https://www.npmjs.com/package/mongo-schemer. You can test it and use it, but pay attention to the clause “Running in prod is not recommended due to the overhead of validating documents against the schema“.

Conclusions

Doing schema validation in the application remains, in general, a best practice, but JSON Schema Validator is a good tool to enforce validation directly into the database.

Hence even though it needs some improvements, the JSON Schema feature is good enough for most of the common cases. We suggest to test it and use it when you really need to create a backbone structure for your data.

While you are here…

You might also enjoy these other articles about MongoDB 3.6

 

The post MongoDB: how to use the JSON Schema Validator appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com