Grafana opened code of oncall incidents

Grafana Labs, developing the data visualization platform grafana and the monitoring system Prometheus , announced On the opening of the initial texts of the response system to the Oncall incidents, designed to ensure joint work of commands to eliminate and analyze incidents. Oncall was previously delivered in the form of a proprietary product and was purchased by Grafana in the process absorption Amixr Inc. In the past year. The project code is written in Python and open under the license agplv3.

The system allows you to collect information about anomalies and events from various monitoring systems, and then automatically group data, send notifications to responsible groups and monitor the state of solving problems. Integration with Grafana, Prometheus, Alertmanager and Zabbix monitoring systems is supported. The information received from the monitoring systems filters secondary and insignificant events, duplicates are aggregated and problems that can be solved without human participation are excluded.

Significant events cleared of unnecessary information noise are received in the subsystem of sending alerts, which allocates employees responsible for the solutions of the identified categories of problems, and sends notifications taking into account the schedule of their work and the degree of employment (data from the calendar-planner are evaluated). The rotation of the attachment of incidents between different employees and the escalation of especially important or without solving problems to other team members or employees of higher levels are supported.


, depending on the degree of importance of the notification incident, can be sent via phone calls, SMS, email, creation
events in the calendar-planner, messengers Slack and Telegram. At the same time, channels can be automatically created in Slack to discuss issues related to the solution of the incident, to which both individual employees and entire commands are automatically connected.

The system provides flexible possibilities for expanding and setting up (for example, you can configure the grouping and routing of events for your preferences, determine the rules and channels of notifications). For integration with external systems, the API and support terraform are provided. Work is controlled through the Web interface.


/Media reports.