Disk Usage Alerts for Slack
I am trying to create alerts for a slack webhook which will alert us when disk usage is above a certain percentage on any of our production instance's volumes. So far I have been able to create an alert that fires for one particular drive on one particular instance, but the message sent to slack doesn't have any details about the percentage of disk space used, or the critical level of the alert. First of all I want to know if there is a more efficient way of getting the percentage disk used for each drive we have in production other than the way I am doing it now (see first attachment). Secondly, is there a way to send more specific info regarding the percentage of disk space used in the alert that is sent to slack (see second attachment)? Right now it only lists the alert name, description, query, and time range.
Thanks.
edit: changed first screen cap to make it more readable
edit 2: added two more screen shots of alert setup page, and for our webhook payload
edit 3: In the second to last screen shot I attempted to get all the disk usage percentages by selecting * for DevName, but this resulted in it attempting to use every possible combination of DevNames between #A and #B. For example, it would use drive F on #A and drive C on #B and every combination in between. This meant that it was getting all the disk_used percentages like I'd hoped, but also a bunch of extraneous and meaningless metrics. To fix this I tried adding the filter you see in the last screen shot, so that it would only show metrics where #A.DevName == #B.DevName. This did not work and threw a syntax error.
-
Official comment
See the Webhook_Payload_Variables -- since you have a metrics query, you should be able to modify your webhook payload to include AlertSource, among others, to make the alert more meaningful/actionable.
{{AlertThreshold}}: The condition that triggered the alert (for example, above 90 at least once in the last 5 minutes). {{AlertSource}}: The metric and sourceHost that triggered the alert. {{AlertID}}: The ID of the triggered alert. {{AlertStatus}: Current status of the time series that triggered (for example, Critical or Warning).
Comment actions
Please sign in to leave a comment.
Comments
1 comment