Follow

Comments

4 comments

  • Avatar
    Ramakrishna Hande

    Thanks for the above article.

    In addition to this, suppose there is a requirement, when the

    sc_errors_ratio > 0.5, we need to filter out the errors starting with 500 and filter in errors starting with 400 in the result set and send this as an alert to a slack channel. How can this query to be altered ?

  • Avatar
    Graham Watts

    Hey Ramakrishna,

    Are you trying to show the trend in percent of 400s? If so you could use something like this:

    _sourceCategory=graham/travel/nginx
    | parse "HTTP/1.1\" * " as sc nodrop
    | if(sc matches "4*", 1, 0) as sc_400_counter
    | timeslice 15m
    | count as total_logs, sum(sc_400_counter) as sc_400s by _timeslice
    | (sc_400s/total_logs) as sc_400_ratio
    | fields sc_400_ratio, _timeslice

    You could also add the below lines to apply outlier and alert on spikes:


    | outlier sc_errors_ratio
    | where sc_errors_ratio_violation > 0
  • Avatar
    Ramakrishna Hande

    Hi Graham,

     

    My requirement is something like this.

    I have logging done on sumologic. The log JSON contains the response time of the request. Let it be a JSON key whose name is "response_time". Each request is identified by unique ID , denoted by JSON key "request_id". and a URL denoted by JSON key "url". I need to alert on a slack channel based on the following condition.

    1) In a window of 10 minutes, If there are 100 requests, and if more than 5 % of requests have response time more than 100ms, then alert the "url", "request_id" and "response_time" of the all those requests.

    2) If Less than Or Equal 5 % of requests have response time more than 100ms, then don't alert at all. I wrote a query like this.

     

    _sourceName=<my_source_name> 
    | json field=_raw "response_time" as response_time 
    | json field=_raw "request_id" as request_id 
    | if (num(response_time) > 100, 1, 0) as higher 
    | if (num(response_time) <= 100, 1, 0) as lower 
    | count as total_requests, sum(higher) as 
    response_time_greater_than_100, sum(lower) as 
    response_time_less_than_100 
    | (response_time_greater_than_100/total_requests) as failure_ratio 
    | where (failure_ratio > 0.05)


    Above query gives me all the requests when more than 5% of requests have response_time more than 100 ms. But It gives me all requests irrespective of response time. No results are returned otherwise.

    Along with this result, I want to filter above query further with requests having "response_time" > 100 ms.

    Do I need to write a subquery for this ??? Or Can it be done without subquery ?? I want to send the results to a REAL TIME ALERT. Is it possible ?

  • Avatar
    Graham Watts

    Hey Ramakrishna,

    This is interesting, I think this subquery will work:

    • The key is to hack the subquery to only return some string if the 5% threshold is met
    • If it's met, I pass some text in every log from the subquery, then filter for the > 100ms lines in the parent query
    _sourcecategory = <my_source_name>
    [subquery:
    _sourcecategory = <my_source_name>
    | json field=_raw "response_time" as response_time
    | json field=_raw "request_id" as request_id
    | if (num(response_time) > 100, 1, 0) as higher
    | if (num(response_time) <= 100, 1, 0) as lower
    | count as total_requests, sum(higher) as response_time_greater_than_100, sum(lower) as response_time_less_than_100
    | (response_time_greater_than_100/total_requests) as failure_ratio
    | where (failure_ratio > 0.05)
    | "response_time" as some_string_in_all_logs
    | compose some_string_in_all_logs keywords
    ]
    | json field=_raw "response_time" as response_time
    | where (num(response_time) > 400)

Please sign in to leave a comment.