Setting up an outlier against a query looking at distinct file names downloaded (Google Drive).

Comments

7 comments

  • Avatar
    Kevin Keech

    Hi Will,

    Could you provide the query you are currently trying to use, the one that presents the error? It sounds like you have not referenced the timeslice in your aggregations or the timeslice gets overwritten with later aggregations. Knowing the query will help to identify where the issue may be occurring. 

    The basic format of an outlier query would be something like below. 

    | timeslice by 1h
    | count_distinct(fiile_names) as file_names by _timeslice, user
    | outlier file_names window=5,threshold=3,consecutive=1,direction=+-

    In the above example the "by _timeslice" portion of the count_distinct operation will create an _timeslice field, which is then used by the outlier operation. If this is missing you will potentially encounter the error you described. 

    0
    Comment actions Permalink
  • Avatar
    Will Bolt

    That makes sense. The following is my query and I release part of it may be a little redundant:

     

    _sourceCategory = company/gsuite/drive "\"applicationName\": \"drive\"" "\"type\": \"access\"" "\"name\": \"download\""
    | json "id", "ipAddress", "events", "actor", "4.name" nodrop
    | json field=actor "email"
    | json field=id "applicationName"
    | json "events[0].type" as eventType nodrop | json "events[0].name" as eventName nodrop | json "events[0].parameters[4].value" as fileName nodrop
    | where applicationName="drive" and eventType="access" and eventName="download"
    | count_distinct(fileName) as distinct_fileName by email,fileName
    | count(distinct_fileName) by email,distinct_fileName

     

    What I was looking to accomplish with the above was to count all the files being downloaded per user but wrote it so that it would also ignore instances of a user downloading the same file name multiple times.

    0
    Comment actions Permalink
  • Avatar
    Kevin Keech

    Might try something like the following. 

    _sourceCategory = company/gsuite/drive "\"applicationName\": \"drive\"" "\"type\": \"access\"" "\"name\": \"download\""
    | json "id", "ipAddress", "events", "actor", "4.name" nodrop
    | json field=actor "email"
    | json field=id "applicationName"
    | json "events[0].type" as eventType nodrop | json "events[0].name" as eventName nodrop | json "events[0].parameters[4].value" as fileName nodrop
    | where applicationName="drive" and eventType="access" and eventName="download"
    | timeslice by 15m
    | count as by email, fileName, _timeslice
    | count as count by email, _timeslice
    | outlier count ........

    The first count operation should get you a count of times a specific file is opened by a user within the timeslice bucket. The second count then counts the number of different file names there were per user and time bucket. You can then outlier based on that last count field. 

    1
    Comment actions Permalink
  • Avatar
    Will Bolt

    I gave that a try and those counts pretty much don't bring me back any results, even if I up the timeslice to a day. I feel like I'm missing something about how the timeslice interacts with the data.

    0
    Comment actions Permalink
  • Avatar
    Will Bolt

    I managed to get the timeslice to take but it's giving me insanely high numbers. Not entirely sure how it's coming up with these and they hardly seem usable or relevant for what I was aiming for. Thanks Kevin Keech - thanks for the counts though. Here's a screen shot of what I mean:

     

    0
    Comment actions Permalink
  • Avatar
    Kevin Keech

    Will,

    From your screenshot it looks like your performing the outlier operation on the _timeslice field instead of the count field. Those really high numbers are actually epoch times, which is what the _timeslice contains. 

    Can you try the below query over the 24 hour period? This should show you upper/lower values based on the counts. 

    _sourceCategory=company/gsuite/drive "\"applicationName\": \"drive\"" "\"type\": \"access\"" "\"name\": \"download\""
    | json "id", "ipAddress", "events", "actor", "4.name" nodrop
    | json field=actor "email"
    | json field=id "applicationName"
    | json "events[0].type" as eventType nodrop
    | json "events[0].name" as eventName nodrop
    | json "events[0].parameters[4].value" as fileName nodrop
    | where applicationName="drive" and eventType="access" and eventName="download"
    | timeslice by 15m
    | count by email, fileName, _timeslice
    | count as count by email, _timeslice
    | outlier count by email direction=+

    You may need to play with the outlier operation a bit more to get the specific results you need. You can then also add the following to the end of the query to see where there is a possible violation, which is what you may ultimately want to alert on. 

     

    | where count_violation > 0
    0
    Comment actions Permalink
  • Avatar
    Will Bolt

    I tried what you had provided above and it does generate numbers that seem more reasonable. However, I'm losing the distinct's from before so I'm getting multiple lines of the same email rather than a single users multiple instances getting combined into one line. 

    Sorry if the screenshot is squished too small.

    0
    Comment actions Permalink

Please sign in to leave a comment.