Understanding Parse Regex
Hello,
I am new to Sumo Logic and am trying to understand how to use the parse regex operator. I would like to be able to parse variable information but am having trouble understanding the expressions to use and what each symbol signifies. I'm fairly new to coding as well, so I'm not sure if Sumo Logic is based off another language that I can look up and study, but it would be appreciated if someone could point me in the right direction of where to find a breakdown of expressions. I think having a good grasp of Regex will be very useful for future queries so any help is appreciated! An example of something I would like to Regex would be the account name of the user that failed to login.
Computer = "AWSSouth-ADFS02.fakedomain.local";
EventCode = 4625;
EventIdentifier = 4625;
Logfile = "Security";
RecordNumber = 3069646;
SourceName = "Microsoft-Windows-Security-Auditing";
TimeGenerated = "20170930124836.000000-000";
TimeWritten = "20170930124836.000000-000";
Type = "Audit Failure";
EventType = 5;
Category = 12544;
CategoryString = "Logon";
Message = "An account failed to log on.
Subject:
Security ID: S-1-5-21-1074439237-558803454-2912513782-17121
Account Name: svc-adfs
Account Domain: FDOMAIN
Logon ID: 0x15BC79
Logon Type: 3
Account For Which Logon Failed:
Security ID: S-1-0-0
Account Name: tuser@fakedomain.com
Account Domain:
-
Hi Blake,
our regex engine uses the same syntax as most others, there are many resources out there, I personally like these 2:
https://www.regular-expressions.info/tutorial.html
https://zeroturnaround.com/rebellabs/java-regular-expressions-cheat-sheet/
As for Windows logs, they are tricky, here is an example to parse all the relevant fields from 4625 events:
| parse regex "Logon\sType:\t+(?<logon_type>\d{1,2})\r"
| parse regex "Subject:[\s\S]+?Account\sName:\t+(?<src_user>[^\r]+)"
| parse regex "Subject:[\s\S]+?Account\sDomain:\t+(?<src_domain>[^\r]+)"
| parse regex "Account\sFor\sWhich\sLogon\sFailed:[\s\S]+?Account\sName:\t+(?<dest_user>[^\r]+)"
| parse regex "Account\sFor\sWhich\sLogon\sFailed:[\s\S]+?Account\sDomain:\t+(?<dest_domain>[^\r]+)"
| parse regex "Network\sInformation:[\s\S]+?Workstation\sName:\t+(?<wkstation>[^\r]+)"
| parse regex "Network\sInformation:[\s\S]+?Source\sNetwork\sAddress:\t+(?<src_ip>[^\r]+)"
| parse regex "Failure\sInformation:[\s\S]+?Failure\sReason:\t+(?<fail_reason>[^\r]+)"
| parse regex "Failure\sInformation:[\s\S]+?Status:\t+(?<fail_status>[^\r]+)"
| parse regex "Logon\sProcess:\t+(?<logon_process>[^\r]+)"This looks complicated but is actually always the same. Two things of note:
1) The regex itself leverages the key/value pair nature of these logs and the fact that there is always a new line after the value
(?<logon_process>[^\r]+)
After we have arrived at the beginning of the value we just match everything until we hit a new line
2) Some Keys show up twice, for example Account Name, which is the one you asked about. It is in the Subject (we call this src_user) and in the actual message (dest_user). In this case we leverage static text in the message before the two values to find both accurately.
| parse regex "Subject:[\s\S]+?Account\sName:\t+(?<src_user>[^\r]+)"
| parse regex "Account\sFor\sWhich\sLogon\sFailed:[\s\S]+?Account\sName:\t+(?<dest_user>[^\r]+)"
Hope this helps
Olaf
Please sign in to leave a comment.
Comments
2 comments