How to aggregate and show a list of values
Hi,
I would like to aggregate results but instead of using the typical "stats" functions (avg, first, last, ...) I would like to obtain the list of values of that field. Is it possible?
To illustrate what I want probably an example will help.
Suppose I have the following search results:
=====================
Event User IP
Failed login Test 1.1.1.1
Failed login Test 1.1.1.2
Failed login Test 1.1.1.3
Failed login Test 1.1.1.4
====================
I would want to aggregate them to obtain something like this:
======================
Event User ListOfIP
Failed login Test 1.1.1.1, 1.1.1.2, 1.1.1.3, 1.1.1.4
======================
Is it possible?
Thanks! Regards,
Marc
-
Official comment
Sounds like you want a version of concat (https://help.sumologic.com/Search/Search-Query-Language/Search-Operators/concat) but that works across messages (e.g. as an aggregator) rather than to produce a new user-defined field within a single message. I cannot think of a clean way to do this. Looks like it's been asked before, so I would recommend upvoting this idea: https://sumologic.aha.io/ideas/ideas/SL-I-1525. It could be too that someone has a way now that I'm just not aware of.
Comment actions -
You should be able to get this type of aggregation to work by using a combination of the 'transactionize' and 'merge' operators:
https://help.sumologic.com/Search/Search-Query-Language/Transaction-Analytics/Merge-Operator_sourceCategory=<replace_this>
//the below parse statement will not be needed if you've already extracted the field via a Field Extraction Rule
| parse regex "(?<ip>[0-9]+\.[0-9]+\.[0-9]+\.[0-9])"
| transactionize event, user (merge event takeFirst, user takeFirst, ip join with ",")
The only thing I will caution is that this is very compute-intensive so search performance could be an issue for a large time-frame and/or data set. -
In case performance does become an issue with applying Latimer's elegant solution, a more convoluted, but in some circumstances potentially more performant approach would be to export the Sumo query results using API:
eg. https://github.com/SumoLogic/sumo-java-client
or command shell:
https://github.com/SumoLogic/sumoshell
then process the data using DB engine-native string aggregation method, eg.:
SQL Server:
https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql
Oracle:
https://oracle-base.com/articles/misc/string-aggregation-techniques
and then feed the resulting file back to Sumo using dedicated Sumo collector.
Each of the above steps should be executed serially.
This approach introduces additional complexity by injecting external dependency in the process, so it probably makes sense to consider it when the volume of transactions causes Sumo's merge operator to execute in more than a couple of minutes.
-
Thank you Matt, Latimer and Piotr!
I have voted SL-I-1525 idea.
For now I'll use Latimer approach, although it can only be used when there is a single type of "event" it will serve some of the ideas I had in mind. When there are two fields to "transactionize" due to the transitive nature of the function it does not behave as expected.
I had also thought about using API to process the results with a script, but I would prefer to keep all alerts inside Sumo Logic portal.
Please sign in to leave a comment.
Comments
5 comments