Oct
22
2018
--

Pulumi raises $15M for its infrastructure as code platform

Pulumi, a Seattle-based startup that lets developers specify and manage their cloud infrastructure using the programming language they already know, today announced that it has raised a $15 million Series A funding round led by Madrona Venture Group. Tola Capital also participated in this round and Tola managing director Sheila Gulati will get a seat on the Pulumi board, where she’ll join former Microsoft exec and Madrona managing director S. Somasegar.

In addition to announcing its raise, the company also today launched its commercial platform, which builds upon Pulumi’s open-source work.

“Since launch, we’ve had a lot of inbound interest, both on the community side — so you’re seeing a lot of open source contributions, and they’re really impactful contributions, including, for example, community-led support for VMware and OpenStack,” Pulumi co-founder and CEO Eric Rudder told me. “So we’re actually seeing a lot of vibrancy in the open-source community. And at the same time, we have a lot of inbound interest on the commercial side of things. That is, teams wanting to operationalize Pulumi and put it into production and wanting to purchase the product.”

So to meet that opportunity, the team decided to raise a new round to scale out both its team and product. And now, that product includes a commercial offering of Pulumi with the company’s new ‘team edition.’ This new enterprise version includes support for unlimited users, integrations with third-party tools like GitHub and Slack, as well as role-based access controls and onboarding and 12×5 support. Like the free, single-user community edition, the team edition is delivered as a SaaS product and supports deployments to all of the major public and private cloud platforms.

“We’re all seeing the same things — the cloud is a foregone conclusion,” Tola’s Gulati told me when I asked her why she was investing in Pulumi. “Enterprises have a lot of complexity as they come over the cloud. And so dealing with VMs, containers and serverless is a reality for these enterprises. And the ability to do that in a way that there’s a single toolset, letting developers use real programming languages, letting them exist where they have skills today, but then allows them to bring the best of cloud into their organization. Frankly, Pulumi really has thought through the existing complexity, the developer reality, the IT and develop a relationship from both a runtime and deployment perspective. And they are the best that we’ve seen.”

Pulumi will, of course, continue to develop its open source tools, too. Indeed, the company noted that it would invest heavily in building out the community around its tools. The team told me that it is already seeing a lot of momentum but with the new funding, it’ll re-double its efforts.

With the new funding, the company will also work on making the onboarding process much easier, up to the point where it will become a full self-serve experience. But that doesn’t work for most large organizations, so Pulumi will also invest heavily in its pre- and post-sales organization. Right now, like most companies at this stage, the team is mostly composed of engineers.

Oct
19
2018
--

PostgreSQL Q&A: Building an Enterprise-Grade PostgreSQL Setup Using Open Source Tools

Enterprise-PostgreSQL-q-and-a

PostgreSQL logoHello everyone, and thank you to those that attended our webinar on Building an Enterprise-grade PostgreSQL setup using open source tools last Wednesday. You’ll find the recordings of such as well as the slides we have used during our presentation here.

We had over forty questions during the webinar but were only able to tackle a handful during the time available, so most remained unanswered. We address the remaining ones below, and they have been grouped in categories for better organization. Thank you for sending them over! We have merged related questions and kept some of our answers concise, but please leave us a comment if you would like to see a particular point addressed further.

Backups

Q: In our experience, pg_basebackup with compression is slow due to single-thread gzip compression. How to speed up online compressed full backup?

Single-thread operation is indeed a limitation of pg_basebackup, and this is not limited to compression only. pgBackRest is an interesting alternative tool in this regard as it does have support for parallel processing.

Q: Usually one setup database backup on primary DB in a HA setup. Is it possible to automatically activate backup on new primary DB after Patroni failover? (or other HA solutions)

Yes. This can be done transparently by pointing your backup system to the “master-role” port in the HAProxy instead – or to the “replica-role” port; in fact, it’s more common to use standby replicas as the backup source.

Q: Do backups and WAL backups work with third party backup managers like NetBackup for example?

Yes, as usual it depends on how good the vendor support is. NetBackup supports PostgreSQL, and so does Zmanda to mention another one.

Security and auditing

Q: Do you know a TDE solution for PostgreSQL? Can you talk a little bit about the encryption at rest solution for Postgres PCI/PII applications from Percona standpoint.

At this point PostgreSQL does not provide a native Transparent Data Encryption (TDE) functionality, relying instead in the underlying file system for data-at-rest encryption. Encryption at the column level can be achieved through the pgcrypto module.

Moreover, other PostgreSQL security features related to PCI compliance are:

Q: How to prevent superuser account to access raw data in Postgres? (…) we encounter companies usually ask that even managed accounts can not access the real data in any mean.

It is fundamental to maintain a superuser account that is able to access any object in the database for maintenance activities. Having said that, currently it is not possible to deny a superuser direct access to the raw data found in tables. What you can do to protect sensitive data from superuser access is to have it stored encrypted. As mentioned above, pgcrypto offers the necessary functionality for achieving this.

Furthermore, avoiding connecting to the database as a superuser is a best practice. The extension set_user allows for unprivileged users to escalate themselves as superuser for maintenance tasks on demand while providing an additional layer of logging and control for better auditing. Also, as discussed in the webinar, it’s possible to implement segregation of users using roles and privileges. Remember it’s best practice to only grant the essential privileges a role to fulfill its duties, including application users. Additionally, password authentication should be enforced to superusers.

Q: How can you make audit logging in Postgres record DMLs while masking data content in these recorded SQLs?

To the best of our knowledge, currently there is not a solution to apply query obfuscation to logs. Bind parameters are always included in both the audit and logging of DMLs, and that is by design. If you would rather avoid logging bind parameters and want to keep track of the statements executed only, you can use the pg_stat_statements extension instead. Note that while pg_stat_statements provides overall statistics of the executed statements, it does not keep track of when each DML has been executed.

Q: How to setup database audit logging effectively when utilizing pgbouncer or pgpool?

A key part of auditing is having separate user accounts in the database instead of a single, shared account. The connection to the database should be made by the appropriate user/application account. In pgBouncer we can have multiple pools for each of the user accounts. Every action by a connection from that pool will be audited against the corresponding user.

High Availability and replication

Q: Is there anything like Galera for PostgreSQL ?

Galera replication library provides support for multi-master, active-active MySQL clusters based on synchronous replication, such as Percona XtraDB Cluster. PostgreSQL does have support for synchronous replication but limited to a single active master context only.

There are, however, clustering solutions for PostgreSQL that address similar business requirements or problem domains such as scalability and high availability (HA). We have presented one of them, Patroni, in our webinar; it focuses on HA and read scaling. For write scaling, there have long been sharding based solutions, including Citus, and PostgreSQL 10 (and now 11!) bring substantial new features in the partitioning area. Finally, PostgreSQL based solutions like Greenplum and Amazon redshift addresses scalability for analytical processing, while TimescaleDB has been conceived to handle large volumes of time series data.

Q: Pgpool can load balance – what is the benefit of HAProxy over Pgpool?

No doubt Pgpool is feature rich, which includes load balancing besides connection pooling, among other functionalities. It could be used in place of HAProxy and PgBouncer, yes. But features is just one of the criteria for selecting a solution. In our evaluation we gave more weight to lightweight and faster, scalable solutions. HAProxy is well known for its lightweight connection routing capability without consuming much of the server resources.

Q: How to combine PgBouncer and Pgpool together so that one can achieve transaction pooling + load balancing? Can you let me know between the two scaling solutions which one is better, PgBouncer or Pgpool-II?

It depends, and must be analyzed on a case-by-case basis. If what we really need is just a connection pooler, PgBouncer will be our first choice because it is more lightweight compared to Pgpool. PgBouncer is thread-based while Pgpool is process-based—like PostgreSQL, forking the main process for each inbound connection is a somewhat expensive operation. PgBouncer is more effective in this front.

However, the relative heavyweight of Pgpool comes with a lot of features, including the capability to manage PostgreSQL replication, and the ability to parse statements fired against PostgreSQL and redirect them to certain cluster nodes for load balancing. Also, when your application cannot differentiate between read and write requests, Pgpool can parse the individual SQL statements and redirect them to the master,  if it is a write, or to a standby replica, if it is a read, as configured in your Pgpool setup. The demo application we used in our webinar setup was able to distinguish reads from writes and use multiple connection strings accordingly, so we employed HAProxy on top of Patroni.

We have seen environments where Pgpool was used for its load balancing capabilities while connection pooling duties were left for PgBouncer, but this is not a great combination. As described above, HAProxy is more efficient than Pgpool as a load balancer.

Finally, as discussed in the webinar, any external connection pooler like Pgbouncer is required only if there is no proper application layer connection pooler, or if the application layer connection pooler is not doing a great job in maintaining a proper connection pool, resulting in frequent connections and disconnections.

Q: Is it possible for Postgres to have a built-in connection pool worker? Maybe merge Pgbouncer into postgres core? That would make it much easier to use advanced authentication mechanisms (e.g. LDAP).

A great thought. That would indeed be a better approach in many aspects than employing an external connection pooler like Pgbouncer. Recently there were discussions among PostgreSQL contributors on the related topic, as seen here. A few sample patches have been submitted by hackers but nothing has been accepted yet. The PostgreSQL community is very keen to keep the server code lightweight and stable.

Q: Is rebooting the standby the only way to change master in PostgreSQL?

A standby-to-master promotion does not involve any restart.

From the perspective of the user, a standby is promoted by pg_ctl promote command or by creating a trigger file. During this operation, the replica stops the recovery related processing and becomes a read-write database.

Once we have a new master, all the other standby servers need to start replicating from it. This involves changes to the  recovery.conf parameters and, yes, a restart: the restart happens only on the standby side when the current master has to be changed. PostgreSQL currently does not allow us to change this parameter using a SIGHUP.

Q: Are external connection pooling solutions (PgBouncer, Pgpool) compatible with Java Hibernate ORM ?

External connection poolers like PgBouncer and Pgpool are compatible with regular PostgreSQL connections. So connections from Hibernate ORM can treat PgBouncer as regular PostgreSQL but running on a different port (or the same, depending on how you configure it). An important point to remember is that they are complementary to connection pools that integrate well with ORM components. For example c3p0 is a well known connection pooler for Hibernate. If an ORM connection pooler can be well tuned to avoid frequent connections and disconnections, then, external pooling solutions like PgBouncer or Pgpool will become redundant and can/should be avoided.

Q: Question regarding connection pool: I want to understand if the connections are never closed or if there are any settings to force the closing of the connection after some time.

There is no need to close a connection if it can be reused (recycled) again and again instead of having a new one created. That is the very purpose of the connection pooler. When an application “closes” a connection, the connection pooler will virtually release the connection from the application and recover it back to the pool of connections. On the next connection request, instead of establishing a new connection to the database the connection pooler will pick a connection from the pool of connections and “lend” it to the application. Furthermore, most connection poolers include a parameter to control the release of connections after a specified idle time.

Q: Question regarding Patroni: can we select in the settings to not failover automatically and only used Patroni for manual failover/failback?

Yes, Patroni allow users to pause its automation process, leaving them to manually trigger operations such as failover. The actual procedure for achieving this will make an interesting blog post (we put it in our to-do list).

Q: Where should we install PgBouncer, Patroni and HAproxy to fulfill the 3-lawyers format: web frontends, app backends and DB servers ? What about etcd ?

Patroni and etcd must be installed in the database servers. In fact, etcd can be running in other servers as well, because the set of etcd instances just form the distributed consensus store. HAProxy and PgBouncer can be installed on the application servers for simplicity, or optionally they can run on dedicated servers, especially when you ran a large amount of those. Having said that, HAProxy is very lightweight and can be maintained in each application server without added impact. If you want to install PgBouncer on dedicated servers, just make sure to avoid SPOF (single point of failure) by employing active-passive servers.

Q: How does HAproxy in your demo setup know how to route DML appropriately to the master and slaves (e.g. writes always go to the master and reads are load balanced between the replicas) ?

HAProxy does not parse SQL statements in the intermediate layer in order to redirect them to the master or to one of the replicas accordingly—this must be done at the application level. In order to benefit from this traffic distribution, your application needs to send write requests to the appropriate HAproxy port; the same with read requests. In our demo setup, the application connected to two different ports, one for reads and another for writes (DML).

Q: How often does the cluster poll each node/slave? Is it tunable for poor performing networks?

Patroni uses an underlying distributed consensus mechanism for all heartbeat checks. For example, etcd, which can be used for this, has default heartbeat interval of 100ms, but it is adjustable. Apart from this, in every layer of the stack, there are tunable TCP-like timeouts. For connection routing HAProxy polls by making use of the Patroni API, which also allows further control on how the checks can be done. Having said that, please keep in mind that poor performing networks are often a bad choice for distributed services, with problems spanning beyond timeout checks.

Miscellaneous

Q: Hi Avinash/Nando/Jobin, maybe I wasn’t able to catch up with DDL’s but what’s the best way to handle DDLs ? In MySQL, we can use pt-online-schema-change and avoid large replica lag, is there a way to achieve the same in PostgreSQL without blocking/downtime or does Percona has an equivalent tool for PostgreSQL? Looking forward to this!

Currently, PostgreSQL locks tables for DDLs. Some DDLs, such as creating triggers and indexes, may not lock every activity on the table. There isn’t a tool like pt-online-schema-change for PostgreSQL yet. There is, however, an extension called pg_repack, which assists in rebuilding a table online. Additionally, adding the keyword “CONCURRENTLY” to create index statement makes it gentle on the system and allows concurrent DMLs and queries to happen while the index is being built. Let’s suppose you want to rebuild the index behind the primary key or unique key: an index can be created independently and the index behind the key can be replaced with a momentarily lock that may be seamless.

A lot of new features are added in this space with each new release. One of the extreme cases of extended locking is adding a NOT NULL column on a table with DEFAULT values. In most of the database systems this operation can hold a write lock on the table until it completes. Just released, PostgreSQL 11 makes it a brief operation irrespective of the size of the table. It is now achieved with a simple metadata change rather than through a complete table rebuild. As PostgreSQL continues to get better on handling DDLs, the scope for external tools is reducing. Moreover, it is not resulting in table rewrite, so excessive I/O and other side effects like replication lag can be avoided.

Q: What are the actions that can be performed by the parallelization option in PostgreSQL ?

This is the area where PostgreSQL has improved significantly in the last few versions. The answer, then, depends on which version you are using. Parallelization has been introduced in PostgreSQL 9.6, with more capabilities added in version 10. As of version 11 pretty much everything can make use of parallelization, including index building. The more CPU cores your server has at its disposal, the more you would benefit from the latest versions of PostgreSQL, given that it is properly turned for parallel execution.

Q: is there any flashback query or flashback database option in PostgreSQL ?

If flashback queries are an application requirement please consider using temporal tables to better visualize data from a specific time or period. If the application is handling time series data (like IOT devices), then, TimescaleDB may be an interesting option for you.

Flashback of the database can be achieved in multiple ways, either with the help of backup tools (and point-in-time recovery) or using a delayed standby replica.

Q: Question regarding pg_repack: we have attempted running pg_repack and for some reason it kept running forever; can we simply cancel/abort its execution ?

Yes, the execution of pg_repack can be aborted without prejudice. This is safe to do because the tool creates an auxiliary table and uses it to rearrange the data, swapping it with the original table at the end of the process. If its execution is interrupted before it completes, the swapping of tables just doesn’t take place. However, since it works online and doesn’t hold an exclusive lock on the target table, depending on its size and the changes made on the target table during the process, it might take considerable time to complete. Please explore the parallel feature available with pg_repack.

Q: Will the monitoring tool from Percona be open source ?

Percona Monitoring and Management (PMM) has been released already as an open source project with its source code being available at GitHub.

Q: It’s unfortunate that the Master/Slave terminology is still used on slide. Why not use instead leader/follower or orchestrator node/node?

We agree with you, particularly regarding the reference on “slave” – “replica” is a more generally accepted term (for good reason), with “standby” [server|replica] being more commonly used with PostgreSQL.

Patroni usually employs the terms “leader” and “followers”.

The use of “cluster” (and thus “node”) in PostgreSQL, however, contrasts with what is usually the norm (when we think about traditional beowulf clusters, or even Galera and Patroni) as it denotes the set of databases running on a single PostgreSQL instance/server.

Oct
18
2018
--

ProxySQL 1.4.11 and Updated proxysql-admin Tool Now in the Percona Repository

ProxySQL 1.4.11

ProxySQL 1.4.11ProxySQL 1.4.11, released by ProxySQL, is now available for download in the Percona Repository along with an updated version of Percona’s proxysql-admin tool.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. René Cannaò created ProxySQL for DBAs as a means of solving complex replication topology issues.

The ProxySQL 1.4.11 source and binary packages available at https://percona.com/downloads/proxysql include ProxySQL Admin – a tool, developed by Percona to configure Percona XtraDB Cluster nodes into ProxySQL. Docker images for release 1.4.11 are available as well: https://hub.docker.com/r/percona/proxysql/. You can download the original ProxySQL from https://github.com/sysown/proxysql/releases. The documentation is hosted on GitHub in the wiki format.

Improvements

  • mysql_query_rules_fast_routing is enabled in ProxySQL Cluster. For more information, see #1674 at GitHub.
  • In this release, rmpdb checksum error is ignored when building ProxySQL in Docker.
  • By default, the permissions for proxysql.cnf are set to 600 (only the owner of the file can read it or make changes to it).

Bugs Fixed

  • Fixed the bug that could cause crashing of ProxySQL if IPv6 listening was enabled. For more information, see #1646 at GitHub.

ProxySQL is available under Open Source license GPLv3.

Oct
18
2018
--

Twilio launches a new SIM card and narrowband dev kit for IoT developers

Twilio is hosting its Signal developer conference in San Francisco this week. Yesterday was all about bots and taking payments over the phone; today is all about IoT. The company is launching two new (but related) products today that will make it easier for IoT developers to connect their devices. The first is the Global Super SIM that offers global connectivity management through the networks of Twilio’s partners. The second is Twilio Narrowband, which, in cooperation with T-Mobile, offers a full software and hardware kit for building low-bandwidth IoT solutions and the narrowband network to connect them.

Twilio also announced that it is expanding its wireless network partnerships with the addition of Singtel, Telefonica and Three Group. Unsurprisingly, those are also the partners that make the company’s Super SIM project possible.

The Super SIM, which is currently in private preview and will launch in public beta in the spring of 2019, provides developers with a global network that lets them deploy and manage their IoT devices anywhere (assuming there is a cell connection or other internet connectivity, of course). The Super SIM gives developers the ability to choose the network they want to use or to let Twilio pick the defaults based on the local networks.

Twilio Narrowband is a slightly different solution. Its focus right now is on the U.S., where T-Mobile rolled out its Narrowband IoT network earlier this year. As the name implies, this is about connecting low-bandwidth devices that only need to send out small data packets like timestamps, GPS coordinates or status updates. Twilio Narrowband sits on top of this, using Twilio’s Programmable Wireless and SIM card. It then adds an IoT developer kit with an Arduino-based development board and the standard Grove sensors on top of that, as well as a T-Mobile-certified hardware module for connecting to the narrowband network. To program that all, Twilio is launching an SDK for handling network registrations and optimizing the communication between the devices and the cloud.

The narrowband service will launch as a beta in early 2019 and offer three pricing plans: a developer plan for $2/month, an annual production plan for $10/year or $5/year at scale, and a five-year plan for $8/year or $4/year at scale.

Oct
18
2018
--

Percona Statement on MongoDB Community Server License Change

MongoDB Community Server License

MongoDB Community Server LicenseMongoDB, Inc. announced it has elected to change its license for MongoDB Community Server from AGPLv3 to a new license type they have created called a “Server Side Public License (SSPL)” citing the need to have a license better suited for the age of Software-as-a-Service.

First, it is important to state that MongoDB, Inc. is fully within its rights as a software copyright holder to change the license of MongoDB Community Server to a license which better reflects its business interests.

In our opinion, however, announcing the license and making the change effective immediately is not respectful to users of MongoDB Community Server. For many organizations, while AGPL may be an approved software license, the SSPL is not, and their respective internal review processes may take weeks. During this time users can’t get access, even to patch versions of old major releases, which might be required to ensure security in their environment, among other potential issues.

This issue is compounded by the fact that the SSPL has only recently been submitted to be evaluated by the Open Source Software Initiative, and it is not yet clear if it will be considered an Open Source License.

We believe it would have been much better for the MongoDB Community and the Open Source Community at large if MongoDB, Inc. would have chosen to release SSPL, and announce the move to this license at some future effective date, allowing for a more orderly transition.

This is a developing situation, and I’m sure over the next few days and weeks we will both hear from OSI with their decision, as well as have further clarification on many points of the SSPL in the FAQ, and possibly the license itself.  At Percona we’re watching this situation closely and will provide additional updates regarding potential impacts to our community and customers.

At this point we can state the following:

  • Percona will continue to support the latest AGPL versions of MongoDB Community Server and Percona Server for MongoDB until more clarity in regards to SSPL is available, giving companies time to complete their assessment of whether moving to the SSPL software version is feasible for them.
  • Being based on MongoDB Community Server, we anticipate that our Percona Server for MongoDB will change its license to SSPL when we move to the SSPL codebase released by MongoDB, Inc.
  • We believe this change does not impact other Percona software which interfaces with MongoDB, such as Percona Toolkit and Percona Monitoring and Management. At this point, we do not anticipate a license change for this software.
  • This license change does not impact Percona support customers, who will receive the same level of comprehensive, responsive, and cost-effective support as before. We encourage customers to evaluate the impact of this license change for their own software.
Oct
18
2018
--

Atlassian launches the new Jira Software Cloud

Atlassian previewed the next generation of its hosted Jira Software project tracking tool earlier this year. Today, it’s available to all Jira users. To build the new Jira, Atlassian redesigned both the back-end stack and rethought the user experience from the ground up. That’s not an easy change, given how important Jira has become for virtually every company that develops software — and given that it is Atlassian’s flagship product. And with this launch, Atlassian is now essentially splitting the hosted version of Jira (which is hosted on AWS) from the self-hosted server version and prioritizing different features for both.

So the new version of Jira that’s launching to all users today doesn’t just have a new, cleaner look, but more importantly, new functionality that allows for a more flexible workflow that’s less dependent on admins and gives more autonomy to teams (assuming the admins don’t turn those features off).

Because changes to such a popular tool are always going to upset at least some users, it’s worth noting at the outset that the old classic view isn’t going away. “It’s important to note that the next-gen experience will not replace our classic experience, which millions of users are happily using,” Jake Brereton, head of marketing for Jira Software Cloud, told me. “The next-gen experience and the associated project type will be available in addition to the classic projects that users have always had access to. We have no plans to remove or sunset any of the classic functionality in Jira Cloud.”

The core tenet of the redesign is that software development in 2018 is very different from the way developers worked in 2002, when Jira first launched. Interestingly enough, the acquisition of Trello also helped guide the overall design of the new Jira.

“One of the key things that guided our strategy is really bringing the simplicity of Trello and the power of Jira together,” Sean Regan, Atlassian’s head of growth for Software Teams, told me. “One of the reasons for that is that modern software development teams aren’t just developers down the hall taking requirements. In the best companies, they’re embedded with the business, where you have analysts, marketing, designers, product developers, product managers — all working together as a squad or a triad. So JIRA, it has to be simple enough for those teams to function but it has to be powerful enough to run a complex software development process.”

Unsurprisingly, the influence of Trello is most apparent in the Jira boards, where you can now drag and drop cards, add new columns with a few clicks and easily filter cards based on your current needs (without having to learn Jira’s powerful but arcane query language). Gone are the days where you had to dig into the configuration to make even the simplest of changes to a board.

As Regan noted, when Jira was first built, it was built with a single team in mind. Today, there’s a mix of teams from different departments that use it. So while a singular permissions model for all of Jira worked for one team, it doesn’t make sense anymore when the whole company uses the product. In the new Jira then, the permissions model is project-based. “So if we wanted to start a team right now and build a product, we could design our board, customize our own issues, build our own workflows — and we could do it without having to find the IT guy down the hall,” he noted.

One feature the team seems to be especially proud of is roadmaps. That’s a new feature in Jira that makes it easier for teams to see the big picture. Like with boards, it’s easy enough to change the roadmap by just dragging the different larger chunks of work (or “epics,” in Agile parlance) to a new date.

“It’s a really simple roadmap,” Brereton explained. “It’s that way by design. But the problem we’re really trying to solve here is, is to bring in any stakeholder in the business and give them one view where they can come in at any time and know that what they’re looking at is up to date. Because it’s tied to your real work, you know that what we’re looking at is up to date, which seems like a small thing, but it’s a huge thing in terms of changing the way these teams work for the positive.

The Atlassian team also redesigned what’s maybe the most-viewed page of the service: the Jira issue. Now, issues can have attachments of any file type, for example, making it easier to work with screenshots or files from designers.

Jira now also features a number of new APIs for integrations with Bitbucket and GitHub (which launched earlier this month), as well as InVision, Slack, Gmail and Facebook for Work.

With this update, Atlassian is also increasing the user limit to 5,000 seats, and Jira now features compliance with three different ISO certifications and SOC 2 Type II.

Oct
18
2018
--

PostgreSQL 11! Our First Take On The New Release

slonik_with_black_text

PostgreSQL logoYou may be aware that the new major version of PostgreSQL has been released today. PostgreSQL 11 is going to be one of the most vibrant releases in recent times. It incorporates many features found in proprietary, industry-leading database systems, further qualifying PostgreSQL as a strong open source alternative.

Without further ado, let’s have a look at some killer features in this new release.

Just In Time (JIT) Compilation of SQL Statements

This is a cutting edge feature in PostgreSQL: SQL statements can get compiled into native code for execution. It’s well know how much Google V8 JIT revolutionized the JavaScript language. JIT in PostgreSQL 11 supports accelerating two important factors—expression evaluation and tuple deforming during query execution—and helps CPU bound queries perform faster. Hopefully this is a new era in the SQL world.

Parallel B-tree Index build

This could be the most sought after feature by DBAs, especially those migrating large databases from other database systems to PostgreSQL. Gone are the days when a lot of time was spent on building indexes during data migration. Index maintenance (rebuild) for very large tables can now make an effective use of multiple cores in the server by parallelizing the operation, taking considerably less time to complete.

Lightweight and super fast ALTER TABLE for NOT NULL column with DEFAULT values

In the process of continuous enhancement and adding new features, we see several application developments that involve schema changes to the database. Most such changes include adding new columns to a table. This can be a nightmare if a new column needs to be added to a large table with a default value and a NOT NULL constraint. This is because an ALTER statement can hold a write lock on the table for a long period. It can also involve excessive IO due to table rewrite. PostgreSQL 11 addresses this issue by ensuring that the column addition with a default value and a NOT NULL constraint avoids a table rewrite.  

Stored procedures with transaction control

PostgreSQL 11 includes stored procedures. What really existed in PostgreSQL so far was functions. The lack of native stored procedures in PostgreSQL made the database code for migrations from other databases complex. They often required extensive manual work from experts. Since stored procedures might include transaction blocks with BEGIN, COMMIT, and ROLLBACK, it was necessary to apply workarounds to meet this requirement in past PostgreSQL versions, but not anymore.

Load cache upon crash or restart – pg_prewarm

Memory is becoming cheaper and faster, year over year. The latest generation of servers is commonly available with several hundreds of GBs of RAM, making it easy to employ large caches (shared_buffers) in PostgreSQL. Until today, you might have used pg_prewarm to warm up the cache manually (or automatically at server start). PostgreSQL 11 now includes a background worker thread that will take care of that for you, recording the contents of the shared_buffers—in fact, the “address” of those—to a file autoprewarm.blocks. Upon crash recovery or normal server restart, two such threads work in the background, reloading those blocks into the cache.

Hash Partition

Until PostgreSQL 9.6 we used table inheritance for partitioning a table. PostgreSQL 10 came up with declarative partitioning, using two of the three most common partitioning methods: list and range. And now, PostgreSQL 11 has introduced the missing piece: hash partitioning.

Advanced partitioning features that were always on demand

There were a lot of new features committed to the partitioning space in PostgreSQL 11. It now allows us to attach an index to a given partition even though it won’t behave as a global index.

Also, row updates now automatically move rows to new partitions (if necessary) based on the updated fields. During query processing, the optimizer may now simply skip “unwanted” partitions from the execution plan, which greatly simplifies the work to be done. Previously, it had to convey all the partitions, even if the target data was to be found in just a subset of them.

We will discuss these new features in detail in a future blog post.

Tables can have default partitions

Until PostgreSQL 10, if a table did not have a default partition, PostgreSQL had to reject a row when the row being inserted did not satisfy any of the existing partitions definitions. That changes with the introduction of default partitions in PostgreSQL 11.

Parallel hash join

Most of the SQLs with equi-joins do hash joins in the background. There is a great opportunity to speed up performance if we can leverage the power of hardware by spinning off multiple parallel workers. PostgreSQL 11 now allows hash joins to be performed in parallel.

Write-Ahead Log (WAL) improvements

Historically, PostgreSQL had a default WAL segment of 16MB and we had to recompile PostgreSQL in order to operate with WAL segments of a different size. Now it is possible to change the WAL size during the initialization of the data directory (initdb) or while resetting WALs using pg_resetwal by means of the parameter –wal-segsize = <wal_segment_size>.

Add extensions to convert JSONB data to/from PL/Perl and PL/Python

Python as a programming language continues to gain popularity. It is always among the top 5 in the TIOBE Index. One of the greatest features of PostgreSQL is that you can write stored procedures and functions in most of the popular programming languages, including Python (with PL/Python). Now it is also possible to transform JSONB (Binary JSON) data type to/from PL/Python. This feature was later made available for PL/Perl too. It can be a great add-on for organizations using PostgreSQL as a document store.

Command line improvements in psql: autocomplete and quit/exit

psql has always been friendly to first time PostgreSQL users through the various options like autocomplete and shortcuts. There’s an exception though: users may find it difficult to understand how to effectively quit from psql, and often attempt to use non-existing quit and exit commands. Eventually, they find \q or ctrl + D, but not without frustrating themselves first. Fortunately, that shouldn’t happen anymore: among many recent fixes and improvements to psql is the addition of the intuitive quit and exit commands to safely leave this popular command line client.

Improved statistics

PostgreSQL 10 introduced the new statement CREATE STATISTICS to collect more statistics information about columns from tables. This has been further improved in PostgreSQL 11. Previously, while collecting optimizer statistics, most-common-values (MCV) were chosen based on their significance compared to all columns. But now, MCVs are chosen based on their significance compared to non-MCV values.

The new features of PostgreSQL 11 are not limited to the ones mentioned above. We will be looking further into most of them in future blog posts. We invite you to leave a comment below and let us know if there is any particular feature you would be interested in knowing more about.

Oct
18
2018
--

Daivergent connects people on the autism spectrum with jobs in data management

Great startups normally come from a personal place. Byran Dai’s new company, Daivergent, is no different.

Founded in December 2017, Daivergent looks to connect enterprise clients with folks on the autism spectrum who will help complete tasks in AI/ML data management.

Dai’s younger brother, Brandon, is on the autism spectrum. Dai realized that his brother and other folks on the spectrum are perfect candidates for certain high-complexity tasks that require extraordinary attention to detail, such as data entry and enrichment, quality assurance and data validation, and content moderation.

In a landscape where just about everyone is working on AI and machine learning algorithms, organizing data is a top priority. Daivergent believes that it can put together the perfect pool of data specialists to complete any task in this space.

Daivergent partners with various agencies including the AHRC and Autism Speaks to source talent. Those folks go through a screening process, which assesses their abilities to complete these sorts of tasks. They then become Daivergent contractors, where they get further training and then start working on projects.

The company says that there are 2.5 million adults with autism in the U.S., and Autism Speaks reports an 85 percent unemployment rate among college-educated adults with autism.

Daivergent not only provides a way for these people to get into the workforce, but it offers a way for corporations and companies to employ American workers for projects they would likely otherwise employ overseas contractors.

When a new task comes in to Daivergent, the company splits that project into smaller tasks and then assigns those tasks to its workers. The company also determines the complexity of the overall project, factoring in the urgency level of the request, to decide pricing.

Daivergent takes a small cut of the earnings and passes the rest on to the workers.

Right now, Daivergent has 25 active workers performing tasks for customers, with 150 workers registered and going through the qualification process and another 400 adults with autism in the candidate pool.

The company recently graduated from the ERA accelerator.

Oct
18
2018
--

Seva snares $2.4M seed investment to find info across cloud services

Seva, a New York City startup, that wants to help customers find content wherever it lives across SaaS products, announced a $2.4 million seed round today. Avalon Ventures led the round with participation from Studio VC and Datadog founder and CEO Olivier Pomel.

Company founder and CEO Sanjay Jain says that he started this company because he felt the frustration personally of having to hunt across different cloud services to find the information he was looking for. When he began researching the idea for the company, he found others who also complained about this fragmentation.

“Our fundamental vision is to change the way that knowledge workers acquire the information they need to do their jobs from one where they have to spend a ton of time actually seeking it out to one where the Seva platform can prescribe the right information at the right time when and where the knowledge worker actually needs it, regardless of where it lives.”

Seva, which is currently in Beta, certainly isn’t the first company to try to solve this issue. Jain believes that with a modern application of AI and machine learning and single sign-on, Seva can provide a much more user-centric approach than past solutions simply because the technology wasn’t there yet.

The way they do this is by looking across the different information types. Today they support a range of products including Gmail, Google Calendar, Google Drive,, Box, Dropbox, Slack and JIRA, Confluence. Jain says they will be adding additional services over time.

Screenshot: Seva

Customers can link Seva to these products by simply selecting one and entering the user credentials. Seva inherits all of the security and permissioning applied to each of the services, so when it begins pulling information from different sources, it doesn’t violate any internal permissioning in the process.

Jain says once connected to these services, Seva can then start making logical connections between information wherever it lives. A salesperson might have an appointment with a customer in his or her calendar, information about the customer in a CRM and a training video related to the customer visit. It can deliver all of this information as a package, which users can share with one another within the platform, giving it a collaborative element.

Seva currently has 6 employees, but with the new funding is looking to hire a couple of more engineers to add to the team. Jain hopes the money will be a bridge to a Series A round at the end of next year by which time the product will be generally available.

Oct
17
2018
--

Upcoming Webinar Thurs 10/18: MongoDB 4.0 Features – Transactions & More

MongoDB 4.0 Features Webinar

MongoDB 4.0 Features WebinarPlease join Percona’s Principal Consultant, Alex Rubin, as he presents MongoDB 4.0 Features – Transactions & More on Thursday, October 18th at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

MongoDB 4.0 adds support for multi-document ACID transactions, combining the document model with ACID guarantees. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity.

This webinar mainly focuses on MongoDB transactions (the major feature of the latest update) and any future transaction improvements. We will also cover other new MongoDB features, such as Non-Blocking Secondary Reads, Security improvements and more.

After attending the webinar you will learn more about the latest MongoDB features.

Register for this webinar to learn about MongoDB transactions and other features.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com