Histogram chart

Comments

7 comments

  • Official comment
    Avatar
    David Woller

    Hi Gordon,

    I believe the timeslice operator is what you're looking for. It allows you to see a histogram of count over time "slices" that you can specify. For example, "timeslice 15m" will divide the search time range into 15-minute intervals and place the events in their given interval. Here's a link to the timeslice operator help page:

    https://help.sumologic.com/Search/Search-Query-Language/Search-Operators/timeslice

    Please let me know if you need any additional help with the operator or if this is not what you're looking for, thank you!

    Comment actions Permalink
  • Avatar
    Gordon Zheng

    Hey David,

    Thanks for the quick reply.  Since I don't care about how these counts changes over time, the timeslice operator isn't what I'm looking for.  I want to see the frequency of the counts for the total time range of the search.

    0
    Comment actions Permalink
  • Avatar
    Matt Sullivan

    If I'm interpreting correctly, the chart is fine, just that floor will give you zero for much of your data and that bucket seems to be by far the largest. Since histogram buckets don't have to have specific size ranges, you might make something more readable using the if operator as well as sum .  Here's a snippet I just tested out that creates 4 buckets based on the count.  I hope it helps 

    // flag each count total as being in a particular bucket
    | if(_count > 0 and _count < 1000, 1, 0) as to_1k
    | if(_count >= 1000 and _count < 5000, 1, 0) as to_5k
    | if(_count >= 5000 and _count < 10000, 1, 0) as to_10k
    | if(_count >= 10000, 1, 0) as over10k
     
    // sum up the #hits per bucket
    | sum(to_1k) as sum_to1k, sum(to_5k) as sum_to5k, sum(to_10k) as sum_to10k, sum(over10k) as sum_over10k
     
    edited to remove unnecessary code and fix a typo
    0
    Comment actions Permalink
  • Avatar
    Gordon Zheng

    Thanks for the thoughtful suggestion, Matt!  Larger bins of unequal size might provide a more helpful view of this data.

    At the same time, I'd still like to have bins of equal size. It seems that Sumo Logic isn't the best tool to produce this visualization. Instead, it's probably best to export the data to another tool like a Python notebook.

    It'd be nice if Sumo Logic supported histograms with equally sized bins.

    0
    Comment actions Permalink
  • Avatar
    Christian Beedgen

    hey Gordon - have a look at the fillmissing operator:

    https://help.sumologic.com/Search/Search-Query-Language/aaGroup/fillmissing

    I think it might do the trick. let us know - otherwise, would love to see a screenshot from your notebook so we have a reference point to think about a solution!

     

    chr.

    0
    Comment actions Permalink
  • Avatar
    Gordon Zheng

    Hey Christian,

    Thanks for the suggestion! It looks like the fillmissing operator would be helpful here.  However, I'm running into a problem when I'm sorting by the bucket column after the missing rows are inserted.  The search hangs and never completes.  Here's my query:

    parse "[*]" as blah
    | count by blah
    | floor(_count / 1000) as bucket
    | count by bucket
    | fillmissing values("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100") in bucket with 0 for _count
    | sort by bucket asc

    I suspect the issue has something to do with the fact that the bucket column having both integers and strings.  When I try casting the bucket column with num(), I get an error:

    parse "[*]" as blah
    | count by blah
    | floor(_count / 1000) as bucket 
    | count by bucket 
    | fillmissing values("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100") in bucket with 0 for _count
    | num(bucket)
    The following errors were reported when executing your search:<br>Error while processing data element (400)

    Any idea what's going wrong here?  It'd also be nice if there were a fillmissing generator for a range of integer values, so that you wouldn't have to manually enumerate a range of numbers as a list of string literals.  Then you could do something like:

    fillmissing numbers(min(bucket), max(bucket))

     

    0
    Comment actions Permalink
  • Avatar
    Piotr Woch

    Hi Gordon,

    I would like to propose an approach that hopefully will change your mind in terms of applicability of Sumo to your specific use case.

    The key to the solution I am proposing is to break the search down into two steps:

    1) Generate buckets data and save the results

    2) Generate the histogram's X-axis data and join the buckets data 

    Below you will find a working example I tested in our internal instance. This example shows how to produce a histogram of lengths of our customer names.

    Step 1: Buckets data generation

    _view=<here goes the view name>
    | length(org_name) as len
    | fields len
    | count by len
    | sort by len asc
    | concat(toString(len),".0") as len   //this line is just to ensure the data matches in the join operation in Step 2
    | save myFolder/Peter/Histogram

     

    Step 2: X-data generation and merging with the buckets data

    _view=<here goes the view name>
    | length(org_name) as len
    | fields len
    | count by len
    | "||||" as dup_rows //the number of pipe characters determines how many missing data points this algorithm can handle calculated as follows: 100% - (100%/# of pipes)
    | parse regex field=dup_rows "(?<duped>:|)" multi
    | 1 as row_count
    | accum row_count as row_num //generate the labels of the X-axis
    | where row_num <= 56 // generate only as many X labels as is needed based on the results of the Step 1
    | toString(row_num) as row_num
    | lookup len, _count from myFolder/Peter/Histogram on len = row_num   // combine the X-axis data with histogram buckets
    | fields row_num, _count

    The end-result:

     

    Hopefully you will find this solution more acceptable.

    Please do let us know if you were able to generate the histogram you needed or if you have any other question or concern.

    Best regards,

    Peter Woch

    0
    Comment actions Permalink

Please sign in to leave a comment.