Anchor parse VS Regex performance? Is it faster to anchor parse and then drop parsed fields, or go pure Regex?

Comments

4 comments

  • Avatar
    Cory Singleton

    Hi Jonathan, 

    You are correct that parse anchor gets a bit more tricky when you have multiple fields using the same anchor, in this case pipes. Generally, in these cases I suggest parsing the entire message using parse anchor like this:

    | parse anchor "|*|*|*|*" as level,identity,stuff,message

    If that wont work in your case, as you mentioned parse regex is another option though that might be a bit more complex than necessary. Sumo does have an operator that will help with what you are trying to do here - split. The split operator allows you to target specific fields in your parse while ignoring others.  So using your example, 

    | split _raw delim='|' extract 4 as identity

    As you can see, using split we are able to extract based on the index or position of the pipe delimited field we are interested in. This can also be used for CSV etc. Hope this helps. 

    Cory

    0
    Comment actions Permalink
  • Avatar
    Jonathan Korol

    Corey,

    Regarding your suggestion of parsing the entire message, do you mean I should be doing this on data ingestion, so "message" becomes one of my fields?  This final "message" item could be an enormous amount of text of an unknown format.  On the other hand, if you refer to doing this parse in a log search then aren't I doubling up on fields?  What happens when I duplicate the parsing of those first couple fields (in a field extraction rule I create "level", and then later in a log search I parse "level" again)?

    Thanks,

     

    Jonathan

    0
    Comment actions Permalink
  • Avatar
    Cory Singleton

    Yes, I was referring to parsing the whole log message in a field extraction rule. If the message portion of the log is the last field and it is not needed for searching you can simply leave it out. I will reach out to you separately via email and we can chat a bit more. 

    0
    Comment actions Permalink
  • Avatar
    Jonathan Korol

    Cory,

    The "split _raw" solution you proposed works well and has simplified my search.  Thank you.

     

    Jonathan

    0
    Comment actions Permalink

Please sign in to leave a comment.