How do I parse Amazon S3 usage logs?

Comments

2 comments

  • Avatar
    Ben Newton

    I was just working with a customer on this last week. As a reference, here is a query that parses out every field:

    _sourceCategory=S3* | parse regex "^(?<bucket\_owner>\S+)\s+(?<bucket>\S+)\s+(?:\S+\s+){2}(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s+(?<requester>\S+)\s+(?<requestID>\S+)\s+(?<operation>\S+)\s+(?<key>\S+)\s+(?<request>\S+\s+\S+\s+\S+)\s+(?<HTTP\_Status>\d{3})\s+(?<ErrorCode>\S+)\s+(?<BytesSent>\S+)\s+(?<ObjectSize>\S+)\s+(?<totalTime>\S+)\s+(?<turnAroundTime>\S+)\s+\"(?<referrer>\S+)\"\s+\"(?<userAgent>\S+)\"\s+(?<version\_id>\S+)$"

    I would suggest using every field, though. So here are something other example queries to get you started:

    Geolocation on S3 Logs

    _sourceCategory=S3* | parse regex "(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s+" _

    _| lookup latitude, longitude, country_code, country_name, region, city, postal_code, area_code, metro_code from geo://default on ip = remoteIP


    | count by latitude, longitude, country_code, country_name, region, city, postal_code, area_code, metro_code

    | sort _count

    S3 Requests over Time

    _sourceCategory=S3* and GET | parse regex "^(?<bucket\_owner>\S+)\s+(?<bucket>\S+)\s+(?:\S+\s+){2}(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s+(?<requester>\S+)\s+(?<requestID>\S+)\s+(?<operation>\S+)\s+(?<path>[a-z.\/]+)\s+" | timeslice by 15m | count by _timeslice

    S3 Response Times

    _sourceCategory=S3* | parse regex "\s+(?<totalTime>\S+)\s+(?<turnAroundTime>\S+)\s+\"(?<referrer>\S+)\"\s+\"(?<userAgent>\S+)\"\s+(?<version\_id>\S+)$"

    | number(totalTime)

    Errors over time

    _sourceCategory=S3* | parse regex "^(?<bucket\_owner>\S+)\s+(?<bucket>\S+)\s+(?:\S+\s+){2}(?<remoteIP>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\s+(?<requester>\S+)\s+(?<requestID>\S+)\s+(?<operation>\S+)\s+(?<key>\S+)\s+(?<request>\S+\s+\S+\s+\S+)\s+(?<HTTP\_Status>\d{3})\s+(?<ErrorCode>\S+)\s+(?<BytesSent>\S+)\s+(?<ObjectSize>\S+)\s+(?<totalTime>\S+)\s+(?<turnAroundTime>\S+)\s+\"(?<referrer>\S+)\"\s+\"(?<userAgent>\S+)\"\s+(?<version\_id>\S+)$" | where HTTP_Status <> "200" | timeslice by 15m | count by _timeslice,HTTP_Status | transpose row _timeslice column http_status

     

    I hope this helps!

    0
    Comment actions Permalink
  • Avatar
    Elliott Schroeder

    That's great. Thank you!

    0
    Comment actions Permalink

Please sign in to leave a comment.