Parsing URL correctly

Comments

3 comments

  • Official comment
    Avatar
    Jason

    Hi Mark,

    Great question! We can do this in Sumo Logic through the use of the parse regex operator like so:

    | parse regex "(?<url_base>.*\.[a-zA-Z0-9]*\/)"

    This will create a new column on the fly called "url_base" which will contain only the top level domain and the .com/ or .io/ or .org/, etc. you get the picture :)

    If you want to get rid of the "/" at the end after applying parse regex, you can add another line to your search statement so it looks like this:

    | parse regex "(?<url_base>.*\.[a-zA-Z0-9]*\/)"

    | parse field=url_base "*/" as url

    The second line extracts your top level domain from "url_base" without the slash and saves it out as another column called "url". This is a great example of applying regular expressions to match difficult patterns in your logs.

    If you don't feel like seeing the "url_base" column after getting "url" in your results, you can use:

    | fields - url_base

    to strike that column from your schema after using it to get the "url" column.

    Please let us know if this works for you!

    Cheers,

    Jason

    Comment actions Permalink
  • Avatar
    Mark Babatunde

    Hi Jason,

    Thanks for your response! If I'm parsing the URL without regex, is there a way for me to extract the top level domain with parse field? Currently, I have the query starting as follows:
    _sourceCategory=Threat/Malc0de
    | parse "URL: *, IP Address: *, Country: *, ASN: *, MD5: *</description>" as url,ip,country,asn,md5
    | parse field=url "*/" as url_base
    This is not correctly parsing url.

     

    EDIT: Jason, this works, I just didn't update everything the following lines to be url_base. Thanks for your help!

     

    Best,

    Mark

    0
    Comment actions Permalink
  • Avatar
    Jason

    Hi Mark,

    No problem, glad to be of service!

    Cheers,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.