Jun
14
2021
--

MongoDB Integrated Alerting in Percona Monitoring and Management

MongoDB Integrated Alerting

MongoDB Integrated AlertingPercona Monitoring and Management (PMM) recently introduced the Integrated Alerting feature as a technical preview. This was a very eagerly awaited feature, as PMM doesn’t need to integrate with an external alerting system anymore. Recently we blogged about the release of this feature.

PMM includes some built-in templates, and in this post, I am going to show you how to add your own alerts.

Enable Integrated Alerting

The first thing to do is navigate to the PMM Settings by clicking the wheel on the left menu, and choose Settings:

Next, go to Advanced Settings, and click on the slider to enable Integrated Alerting down in the “Technical Preview” section.

While you’re here, if you want to enable SMTP or Slack notifications you can set them up right now by clicking the new Communications tab (which shows up after you hit “Apply Changes” turning on the feature).

The example below shows how to configure email notifications through Gmail:

You should now see the Integrated Alerting option in the left menu under Alerting, so let’s go there next:

Configuring Alert Destinations

After clicking on the Integrated Alerting option, go to the Notification Channels to configure the destination for your alerts. At the time of this writing, email via your SMTP server, Slack and PagerDuty are supported.

Creating a Custom Alert Template

Alerts are defined using MetricsQL which is backward compatible with Prometheus QL. As an example, let’s configure an alert to let us know if MongoDB is down.

First, let’s go to the Explore option from the left menu. This is the place to play with the different metrics available and create the expressions for our alerts:

To identify MongoDB being down, one option is using the up metric. The following expression would give us the alert we need:

up{service_type="mongodb"}

To validate this, I shut down a member of a 3-node replica set and verified that the expression returns 0 when the node is down:

The next step is creating a template for this alert. I won’t go into a lot of detail here, but you can check Integrated Alerting Design in Percona Monitoring and Management for more information about how templates are defined.

Navigate to the Integrated Alerting page again, and click on the Add button, then add the following template:

---
templates:
  - name: MongoDBDown
    version: 1
    summary: MongoDB is down
    expr: |-
      up{service_type="mongodb"} == 0
    severity: critical
    annotations:
      summary: MongoDB is down ({{ $labels.service_name }})
      description: |-
        MongoDB {{ $labels.service_name }} on {{ $labels.node_name }} is down

This is how it looks like:

Next, go to the Alert Rules and create a new rule. We can use the Filters section to add comma-separated “key=value” pairs to filter alerts per node, per service, per agent, etc.

For example: node_id=/node_id/123456, service_name=mongo1, agent_id=/agent_id/123456

After you are done, hit the Save button and go to the Alerts dashboard to see if the alert is firing:

From this page, you can also silence any firing alerts.

If you configured email as a destination, you should have also received a message like this one:

For now, a single notification is sent. In the future, it will be possible to customize the behavior.

Creating MongoDB Alerts

In addition to the obvious “MongoDB is down” alert, there are a couple more things we should monitor. For starters, I’d suggest creating alerts for the following conditions:

  • Replica set member in an unusual state
mongodb_replset_member_state != 1 and mongodb_replset_member_state != 2

  • Connections higher than expected
avg by (service_name) (mongodb_connections{state="current"}) > 5000

  • Cache evictions higher than expected
avg by(service_name, type) (rate(mongodb_mongod_wiredtiger_cache_evicted_total[5m])) > 5000

  • Low WiredTiger tickets
avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50

The values listed above are just for illustrative purposes, you need to decide the proper thresholds for your specific environment(s).

As another example, let’s add the alert template for the low WiredTiger tickets:

---
templates:
  - name: MongoDB Wiredtiger Tickets
    version: 1
    summary: MongoDB Wiredtiger Tickets low
    expr: avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50
    severity: warning
    annotations:
      description: "WiredTiger available tickets on (instance {{ $labels.node_name }}) are less than 50"

Conclusion

Integrated alerting is a really nice to have feature. While it is still in tech preview state, we already have a few built-in alerts you can test, and also you can define your own. Make sure to check the Integrated Alerting official documentation for more information about this topic.

Do you have any specific MongoDB alerts you’d like to see? Given the feature is still in technical preview, any contributions and/or feedback about the functionality are welcome as we’re looking to release this as GA very soon!

Sep
29
2020
--

Using Security Threat Tool and Alertmanager in Percona Monitoring and Management

security threat tool percona monitoring and management

security threat tool percona monitoring and managementWith version 2.9.1 of Percona Monitoring and Management (PMM) we delivered some new improvements to its Security Threat Tool (STT).

Aside from an updated user interface, you now have the ability to run STT checks manually at any time, instead of waiting for the normal 24 hours check cycle. This can be useful if, for example, you want to see an alert gone after you fixed it. Moreover, you can now also temporarily mute (for 24 hours) some alerts you may want to work on later.

But how do these actions work?

Alertmanager

In a previous article, we briefly explained how the STT back end publishes alerts to Alertmanager so they appear in the STT section of PMM.

Now, before we uncover the details of that, please bear in mind that PMM’s built-in Alertmanager is still under development. We do not recommend you use it directly for your own needs, at least not for now.

With that out of the way, let’s see the details of the interaction with Alertmanager.

To retrieve the current alerts, the interface calls an Alertmanager’s API, filtering for non-silenced alerts:

GET /alertmanager/api/v2/alerts?silenced=false[...]

This call returns a list of active alerts, which looks like this:

[
  {
    "annotations": {
      "description": "MongoDB admin password does not meet the complexity requirement",
      "summary": "MongoDB password is weak"
    },
    "endsAt": "2020-09-30T14:39:03.575Z",
    "startsAt": "2020-04-20T12:08:48.946Z",
    "labels": {
      "service_name": "mongodb-inst-rpl-1",
      "severity": "warning",
      ...
    },
    ...
  },
  ...
]

Active alerts have a

startsAt

timestamp at the current time or in the past, while the

endsAt

 timestamp is in the future. The other properties contain descriptions and the severity of the issue the alert is about.

labels

, in particular, uniquely identify a specific alert and are used by Alertmanager to deduplicate alerts. (There are also other “meta” properties, but they are out of the scope of this article.)

Force Check

Clicking on “Run DB checks” will trigger an API call to the PMM server, which will execute the checks workflow on the PMM back end (you can read more about it here). At the end of that workflow, alerts are sent to Alertmanager through a POST call to the same endpoint used to retrieve active alerts. The call payload has the same structure as shown above.

Note that while you could create alerts manually this way, that’s highly discouraged, since it could negatively impact STT alerts. If you want to define your own rules for Alertmanager, PMM can integrate with an external Alertmanager, independent of STT. You can read more in Percona Monitoring and Management, Meet Prometheus Alertmanager.

Silences

Alertmanager has the concept of Silences. To temporarily mute an alert, the front end generates a “silence” payload starting from the metadata of the alert the user wants to mute and calls the silence API on Alertmanager:

POST /alertmanager/api/v2/silences

An example of a silence payload:

{
  "matchers": [
    { "name": "service_name", "value": "mongodb-inst-rpl-1", "isRegex": false },
    { "name": "severity", "value": "warning", "isRegex": false },
    ...
  ],
  "startsAt": "2020-09-14T20:24:15Z",
  "endsAt": "2020-09-15T20:24:15Z",
  "createdBy": "someuser",
  "comment": "reason for this silence",
  "id": "a-silence-id"
}

As a confirmation of success, this API call will return a

silenceID

:

{ "silenceID": "1fcaae42-ec92-4272-ab6b-410d98534dfc" }

 

Conclusion

From this quick overview, you can hopefully understand how simple it is for us to deliver security checks. Alertmanager helps us a lot in simplifying the final stage of delivering security checks to you in a reliable way. It allows us to focus more on the checks we deliver and the way you can interact with them.

We’re constantly improving our Security Threat Tool, adding more checks and features to help you protect your organization’s valuable data. While we’ll try to make our checks as comprehensive as possible, we know that you might have very specific needs. That’s why for the future we plan to make STT even more flexible, adding scheduling of checks (since some need to run more/less frequently than others), disabling of checks, and even the ability to let you add your own checks! Keep following the latest releases as we continue to iterate on STT.

For now, let us know in the comments: what other checks or features would you like to see in STT? We love to hear your feedback!

Check out our Percona Monitoring and Management Demo site or download Percona Monitoring and Management today and give it a try!

Feb
02
2017
--

PMM Alerting with Grafana: Working with Templated Dashboards

PMM Alerting

In this blog post, we will look into more intricate details of PMM alerting. More specifically, we’ll look at how to set up alerting based on templated dashboards.

Percona Monitoring and Management (PMM) 1.0.7 includes Grafana 4.0, which comes with the Alerting feature. Barrett Chambers shared how to enable alerting in general. This blog post looks at the specifics of setting up alerting based on the templated dashboards. Grafana 4.0 does not support basic alerting out-of-the-box.

PMM Alerting 1

This means if I try to set up an alert on the number of MySQL threads running, I get the error “Template variables are not supported in alert queries.”

What is the solution?

Until Grafana provides a better option, you need to do alerting based on graphs (which don’t use templating). This is how to do it.

Click on “Create New” in the Dashboards list to create a basic dashboard for your alerts:

PMM Alerting 2

Click on “Add Panel” and select “Graph”:

PMM Alerting 3

Click on the panel title of the related panel on the menu sign, and then click on “Panel JSON”.

PMM Alerting 4

This shows you the JSON of the panel, which will look like something like this:

PMM Alerting 5

Now you need to go back to the other browser window, and the dashboard with the graph you want to alert on. Show the JSON panel for it. In our case, we go to “MySQL Overview” and show the JSON for “MySQL Active Threads” panel.

Copy the JSON from the “MySQL Active Threads” panel and paste it into the new panel in the dashboard created for alerting.

Once we have done the copy/paste, click on the green Update button, and we’ll see the broken panel:

PMM Alerting 6

It’s broken because we’re using templating variables in dashboard expressions. None of them are set up in this dashboard. Expressions won’t work. We must replace the template variables in the formulas with actual hosts, instances, mount points, etc., for we want to alert on:

PMM Alerting 7

We need to change

$host

 to the name of the host we want to alert on, and the

$interval

 should align with the data capture interval (here we’ll set it to 5 seconds):

PMM Alerting 8

If correctly set up, you should see the graph showing the data.

Finally, we can go to edit the graph. Click on the “Alert” and “Create Alert”.

PMM Alerting 9

Specify

Evaluate Every

 to create an alert. This sets up the evaluation interval for the alert rule. Obviously, the more often the alert evaluates the condition, the more quickly you get alerted if something goes wrong (as well as alert conditions).

In our case, we want to get an alert if the number of running threads are sustained at a high rate. To do this, look at the minimum number of threads for last minute to be above 30:

PMM Alerting 10

Note that our query has two parameters: “A” is the number of threads connected, and “B” is the number of threads running. We’re choosing to Alert on “B”. 

The beautiful thing Grafana does is show the alert threshold clearly on the graph, and allows you to edit the alert just by moving this alert line with a mouse:

PMM Alerting 11

You may want to click on the floppy drive at the top to save dashboard (giving it whatever identifying name you want).

PMM Alerting 12

At this point, you should see the alert working. A little heart sign appears by the graph title, colored green (indicating it is not active) or red (indicating it is active). Additionally, you will see the red and green vertical lines in the alert history. These show when this alert gets triggered and when the system went back to normal.

PMM Alerting 13

You probably want to set up notifications as well as see alerts on the graphs. 

To set up notifications, go to the Grafana Configuration menu and configure Alerting. There are Grafana Support Email, Slack, Pagerduty and general Webhook notification options (with more on the way, I’m sure).

The same way you added the “Graph” panel to set up an alert, you can add the “Alert List” panel to see all the alerts you have set up (and their status):

PMM Alerting 14

Summary

As you can see, it is possible to set up alerts in PMM using the new Grafana 4.0 alerting feature. It is not very convenient or easy to do. This is first alerting support release for Grafana and PMM. As such, I’m sure it will become much easier and more convenient over time.

Jun
22
2016
--

NGINX’s Amplify monitoring tool is now in public beta

graphs-screen NGINX today launched Amplify, its new application monitoring tool, out of private beta. While the cloud-based tool is still officially in beta, it’s now available to all NGINX users — both those who run the paid NGINX Plus edition or the free open-source version. As NGINX CEO Gus Robertson and CMO Peter Guagenti told me, the company’s users told the team that they wanted to… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com