How do I query for most recent log entry per host?

Comments

4 comments

  • Avatar
    Ben Newton

    A few options I can think of:

    1) EASIEST

    If you have a fairly fixed number of hosts, you could save and schedule the search (https://service.sumologic.com/ui/help/Default.htm#Sched_a_search.htm):

    _sourceCategory=Grocery/Chef | count by _sourceHost

    If you have 20 chef clients, then you would alert if the number of results < 20. The downside is that you would have to update the search on a regular basic.

    2) MORE COMPLICATED

    You could create a lookup table with the latest time a chef client sent data

    Create a schedule once an hour that creates a lookup table:

    _sourceCategory=Grocery/Chef

    | _sourceHost as sourceHost

    | timeslice by 1h

    | last(_timeslice) as time_block by sourceHost

    | save append /chef/timestamp

    Then create another scheduled job that checks against it:

    _sourceCategory=Grocery/Chef

    | _sourceHost as sourceHost

    | timeslice by 1h

    | last(_timeslice) as current_hour by sourceHost

    | (current_hour - 60*60*1000) as last_hour

    | lookup time_block as last_time from /ben/iis_test2 on sourceHost=sourceHost

    | "stuck" as state

    | if( last_time=current_hour , "good", state) as state

    | if( last_time=last_hour , "good", state) as state

    | fields sourceHost, state

    | where state = "stuck"

    On the second one you alert if you have > 0 results

    0
    Comment actions Permalink
  • Avatar
    Ben Newton

    Second search has wrong lookup table. FIXED:

     

    A few options I can think of:

    1) EASIEST

    If you have a fairly fixed number of hosts, you could save and schedule the search (https://service.sumologic.com/ui/help/Default.htm#Sched_a_search.htm):

    _sourceCategory=Grocery/Chef | count by _sourceHost

    If you have 20 chef clients, then you would alert if the number of results < 20. The downside is that you would have to update the search on a regular basic.

    2) MORE COMPLICATED

    You could create a lookup table with the latest time a chef client sent data

    Create a schedule once an hour that creates a lookup table:

    _sourceCategory=Grocery/Chef

    | _sourceHost as sourceHost

    | timeslice by 1h

    | last(_timeslice) as time_block by sourceHost

    | save append /chef/timestamp

    Then create another scheduled job that checks against it:

    _sourceCategory=Grocery/Chef

    | _sourceHost as sourceHost

    | timeslice by 1h

    | last(_timeslice) as current_hour by sourceHost

    | (current_hour - 60*60*1000) as last_hour

    | lookup time_block as last_time from /chef/timestamp on sourceHost=sourceHost

    | "stuck" as state

    | if( last_time=current_hour , "good", state) as state

    | if( last_time=last_hour , "good", state) as state

    | fields sourceHost, state

    | where state = "stuck"

    On the second one you alert if you have > 0 results

     

    0
    Comment actions Permalink
  • Avatar
    Joe Zulli

    Awesome. Thanks!

    0
    Comment actions Permalink
  • Avatar
    David Marcoux

    Try this query over the past 30 days. It provides a lot of really helpful information about _sourcehosts that have stopped sending logs.  Note, there are a few configurable parameters inside the query.

     

    _index=sumologic_volume | where _sourceCategory="sourcehost_volume"

    | timeslice 1h

    | join

    ( parse regex "(?<sourcehost>\"[^\"]+\"):{\"sizeInBytes\":(?<bytes>\d+),\"count\":(?<count>\d+)}" multi

    | parse regex field=sourcehost "\"(?<sourcehost>\S+)\""

    | first(_timeslice) as mostrecentlog by sourceHost ) as t1,

    ( parse regex "(?<sourcehost>\"[^\"]+\"):{\"sizeInBytes\":(?<bytes>\d+),\"count\":(?<count>\d+)}" multi

    | parse regex field=sourcehost "\"(?<sourcehost>\S+)\""

    | last(_timeslice) as leastrecentlog by sourceHost ) as t2,

    ( parse regex "(?<sourcehost>\"[^\"]+\"):{\"sizeInBytes\":(?<bytes>\d+),\"count\":(?<count>\d+)}" multi

    | parse regex field=sourcehost "\"(?<sourcehost>\S+)\""

    | count as buckets by sourcehost ) as t3

    on t1.sourcehost = t2.sourcehost and t1.sourcehost = t3.sourcehost

    | t3_buckets as frequency | t1_sourcehost as sourcehost

    | where t1_mostrecentlog - t2_leastrecentlog > (3600000 * 48) // 48 means 48 hours (configurable)

    | (t3_buckets * 100)/ ((t1_mostrecentlog - t2_leastrecentlog) / (1000*60*5)) as regularity

    | if(regularity > 100,100,regularity) as regularity | if(regularity < 1,1,regularity) as regularity

    | round(regularity) as regularity

    | round((now()-t1_mostrecentlog)/1000/60/60) as hours_ago

    | round(hours_ago/24) as days_ago

    | .05 * regularity * hours_ago as certainty  // .05 is a configurable parameter

    | if(certainty > 99,99,certainty) as certainty | if(certainty < 1,1,certainty) as certainty

    | round(certainty) | concat(certainty,"%") as confidence

    | where hours_ago > 3

    | sort certainty desc

    | concat(regularity,"%") as regularity

    | fields sourcehost,hours_ago,days_ago,regularity,confidence

    0
    Comment actions Permalink

Please sign in to leave a comment.