Histogram chart
Hi there,
Is there a way to visualize the frequency of values over some range? I'd like to see a histogram of counts.
In an attempt to generate a histogram, I wrote a quick query which buckets the values into intervals of 1000 (these counts range from 0 - 250k).
parse "[*]" as blah
| count by blah
| floor(_count / 1000) as bucket
| count by bucket
| sort by bucket asc
I'm able to produce something that looks like a histogram, but with missing sections of the x axis, since bins with 0 frequency aren't created.
Is there a better way to generate a histogram in Sumo Logic? I scoured the documentation but couldn't find anything relevant.
-
Official comment
Hi Gordon,
I believe the timeslice operator is what you're looking for. It allows you to see a histogram of count over time "slices" that you can specify. For example, "timeslice 15m" will divide the search time range into 15-minute intervals and place the events in their given interval. Here's a link to the timeslice operator help page:
https://help.sumologic.com/Search/Search-Query-Language/Search-Operators/timeslice
Please let me know if you need any additional help with the operator or if this is not what you're looking for, thank you!
Comment actions -
If I'm interpreting correctly, the chart is fine, just that floor will give you zero for much of your data and that bucket seems to be by far the largest. Since histogram buckets don't have to have specific size ranges, you might make something more readable using the if operator as well as sum . Here's a snippet I just tested out that creates 4 buckets based on the count. I hope it helps
// flag each count total as being in a particular bucket| if(_count > 0 and _count < 1000, 1, 0) as to_1k| if(_count >= 1000 and _count < 5000, 1, 0) as to_5k| if(_count >= 5000 and _count < 10000, 1, 0) as to_10k| if(_count >= 10000, 1, 0) as over10k// sum up the #hits per bucket| sum(to_1k) as sum_to1k, sum(to_5k) as sum_to5k, sum(to_10k) as sum_to10k, sum(over10k) as sum_over10kedited to remove unnecessary code and fix a typo -
Thanks for the thoughtful suggestion, Matt! Larger bins of unequal size might provide a more helpful view of this data.
At the same time, I'd still like to have bins of equal size. It seems that Sumo Logic isn't the best tool to produce this visualization. Instead, it's probably best to export the data to another tool like a Python notebook.
It'd be nice if Sumo Logic supported histograms with equally sized bins.
-
hey Gordon - have a look at the fillmissing operator:
https://help.sumologic.com/Search/Search-Query-Language/aaGroup/fillmissing
I think it might do the trick. let us know - otherwise, would love to see a screenshot from your notebook so we have a reference point to think about a solution!
chr.
-
Hey Christian,
Thanks for the suggestion! It looks like the fillmissing operator would be helpful here. However, I'm running into a problem when I'm sorting by the bucket column after the missing rows are inserted. The search hangs and never completes. Here's my query:
parse "[*]" as blah
| count by blah
| floor(_count / 1000) as bucket
| count by bucket
| fillmissing values("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100") in bucket with 0 for _count
| sort by bucket ascI suspect the issue has something to do with the fact that the bucket column having both integers and strings. When I try casting the bucket column with num(), I get an error:
parse "[*]" as blah
| count by blah
| floor(_count / 1000) as bucket
| count by bucket
| fillmissing values("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100") in bucket with 0 for _count
| num(bucket)The following errors were reported when executing your search:<br>Error while processing data element (400)
Any idea what's going wrong here? It'd also be nice if there were a fillmissing generator for a range of integer values, so that you wouldn't have to manually enumerate a range of numbers as a list of string literals. Then you could do something like:
fillmissing numbers(min(bucket), max(bucket))
-
Hi Gordon,
I would like to propose an approach that hopefully will change your mind in terms of applicability of Sumo to your specific use case.
The key to the solution I am proposing is to break the search down into two steps:
1) Generate buckets data and save the results
2) Generate the histogram's X-axis data and join the buckets data
Below you will find a working example I tested in our internal instance. This example shows how to produce a histogram of lengths of our customer names.
Step 1: Buckets data generation
_view=<here goes the view name>
| length(org_name) as len
| fields len
| count by len
| sort by len asc
| concat(toString(len),".0") as len //this line is just to ensure the data matches in the join operation in Step 2
| save myFolder/Peter/HistogramStep 2: X-data generation and merging with the buckets data
_view=<here goes the view name>
| length(org_name) as len
| fields len
| count by len
| "||||" as dup_rows //the number of pipe characters determines how many missing data points this algorithm can handle calculated as follows: 100% - (100%/# of pipes)
| parse regex field=dup_rows "(?<duped>:|)" multi
| 1 as row_count
| accum row_count as row_num //generate the labels of the X-axis
| where row_num <= 56 // generate only as many X labels as is needed based on the results of the Step 1
| toString(row_num) as row_num
| lookup len, _count from myFolder/Peter/Histogram on len = row_num // combine the X-axis data with histogram buckets
| fields row_num, _countThe end-result:
Hopefully you will find this solution more acceptable.
Please do let us know if you were able to generate the histogram you needed or if you have any other question or concern.
Best regards,
Peter Woch
Please sign in to leave a comment.
Comments
7 comments