Setting up an outlier against a query looking at distinct file names downloaded (Google Drive).
Hello,
I've written a log search that essentially let's me see how many distinct files have been downloaded from Google Drive and gets counted by user. I'm wanting to create an outlier now so I can have it alert when the number of downloads spikes out of the ordinary (potentially implying that one of our users is dumping a bunch of files to take outside of the organization). However, I'm having a hard time getting this component in.
In order to create and use an outlier, there is a requirement that timeslice is needed. When I try to incorporate timeslice into my log search and run it, it simply throws up the error that timeslice cannot be found even though SumoLogic looks to recognize it. Anyone else run into something similar or have already gone through setting this up for Google Drive?
Thanks,
-
Hi Will,
Could you provide the query you are currently trying to use, the one that presents the error? It sounds like you have not referenced the timeslice in your aggregations or the timeslice gets overwritten with later aggregations. Knowing the query will help to identify where the issue may be occurring.
The basic format of an outlier query would be something like below.
| timeslice by 1h
| count_distinct(fiile_names) as file_names by _timeslice, user
| outlier file_names window=5,threshold=3,consecutive=1,direction=+-
In the above example the "by _timeslice" portion of the count_distinct operation will create an _timeslice field, which is then used by the outlier operation. If this is missing you will potentially encounter the error you described. -
That makes sense. The following is my query and I release part of it may be a little redundant:
_sourceCategory = company/gsuite/drive "\"applicationName\": \"drive\"" "\"type\": \"access\"" "\"name\": \"download\""
| json "id", "ipAddress", "events", "actor", "4.name" nodrop
| json field=actor "email"
| json field=id "applicationName"
| json "events[0].type" as eventType nodrop | json "events[0].name" as eventName nodrop | json "events[0].parameters[4].value" as fileName nodrop
| where applicationName="drive" and eventType="access" and eventName="download"
| count_distinct(fileName) as distinct_fileName by email,fileName
| count(distinct_fileName) by email,distinct_fileNameWhat I was looking to accomplish with the above was to count all the files being downloaded per user but wrote it so that it would also ignore instances of a user downloading the same file name multiple times.
-
Might try something like the following.
_sourceCategory = company/gsuite/drive "\"applicationName\": \"drive\"" "\"type\": \"access\"" "\"name\": \"download\""
| json "id", "ipAddress", "events", "actor", "4.name" nodrop
| json field=actor "email"
| json field=id "applicationName"
| json "events[0].type" as eventType nodrop | json "events[0].name" as eventName nodrop | json "events[0].parameters[4].value" as fileName nodrop
| where applicationName="drive" and eventType="access" and eventName="download"
| timeslice by 15m
| count as by email, fileName, _timeslice
| count as count by email, _timeslice
| outlier count ........
The first count operation should get you a count of times a specific file is opened by a user within the timeslice bucket. The second count then counts the number of different file names there were per user and time bucket. You can then outlier based on that last count field. -
I managed to get the timeslice to take but it's giving me insanely high numbers. Not entirely sure how it's coming up with these and they hardly seem usable or relevant for what I was aiming for. Thanks Kevin Keech - thanks for the counts though. Here's a screen shot of what I mean:
-
Will,
From your screenshot it looks like your performing the outlier operation on the _timeslice field instead of the count field. Those really high numbers are actually epoch times, which is what the _timeslice contains.
Can you try the below query over the 24 hour period? This should show you upper/lower values based on the counts.
_sourceCategory=company/gsuite/drive "\"applicationName\": \"drive\"" "\"type\": \"access\"" "\"name\": \"download\""
| json "id", "ipAddress", "events", "actor", "4.name" nodrop
| json field=actor "email"
| json field=id "applicationName"
| json "events[0].type" as eventType nodrop
| json "events[0].name" as eventName nodrop
| json "events[0].parameters[4].value" as fileName nodrop
| where applicationName="drive" and eventType="access" and eventName="download"
| timeslice by 15m
| count by email, fileName, _timeslice
| count as count by email, _timeslice
| outlier count by email direction=+You may need to play with the outlier operation a bit more to get the specific results you need. You can then also add the following to the end of the query to see where there is a possible violation, which is what you may ultimately want to alert on.
| where count_violation > 0
-
I tried what you had provided above and it does generate numbers that seem more reasonable. However, I'm losing the distinct's from before so I'm getting multiple lines of the same email rather than a single users multiple instances getting combined into one line.
Sorry if the screenshot is squished too small.
Please sign in to leave a comment.
Comments
7 comments