Lossless grouping
When working with Pig and Hadoop, I could aggregate data in the way below, and I am not able to reproduce this on Sumologic. Is it possible and if so how?
I have keys and values, simple enough:
Keys; Values
--------
K1; V1a
K1; V1b
K1; V1c
K2; V2a
K3; V3a
K3; V3b
K3; V3c
K3; V3d
As an example, using "|count by Keys" would give me
Keys; _count
-------------
K1; 3
K2; 1
K3; 4
I want to get
Keys;group_of_Values
-------------------
K1; {V1a; V1b; V1c}
K2; {V2a}
K3; {V3a; V3b; V3c; V3d}
Any way to manage this?
-
Yes, definitively there is a way to achieve this type of grouping in Sumo by using a combination of two operators: transactionize and merge.
Please take a look at the transactionize operator here: https://help.sumologic.com/05Search/Search-Query-Language/Transaction-Analytics/Transactionize-operator
and some Merge examples here: https://help.sumologic.com/05Search/Search-Query-Language/Transaction-Analytics/Merge-Operator
Hope this helps!
-
You could also use the transpose operator to accomplish this, depending on how exactly you wanted it formatted. Something like this:
| count by keys, values
| transpose row keys column valuesThanks,
Nick
Sumo Logic - Customer Success -
Hi, and thanks to both of you. The transactionize/merge method looks like should be able to do what I need, I *think* I've got it (but I want to state that I am disappointed by the lack of formal documentation, maybe I just haven't found it . . .)
I start with
| count by k, v, team, service, environment
| fields -_count // just to make explicit that I don't need ti
| toLong(0) as _messagetime // either transactionize or merge absolutely wants this hidden fieldIf I do
| transactionize k (merge k, v join with ", ")
then it looks good except that all the team/service/environment values are ignored and lost.
If I try to correct by using
| transactionize team, service, environment, k (merge k, v join with ", ")then team/service/environment are still lost and I get {group of k}, {group of v}, which is useless.
Doing
| transactionize k (merge k takeFirst, v join with ", ", team takeFirst, service takeFirst, environment takeFirst)
looks like it does what I want! I still have to test with overlapping keys in different sets (for the moment I only have my test team / test service / test environment). I'm wary of the takeFirst eating values that are different.
Again, thanks for your help.
Please sign in to leave a comment.
Comments
4 comments