I'm currently trying to utilize the Sumologic SDK for Python with one of my queries and after testing my script, I noticed that it could not resolve the hostname for all the sites.
Looking into the query, I get urls with executables, zips, and even characters that unicode doesn't properly decipher (examples: "down11246.yzzzn.com/index.html?%2F152926%2Fveryhuo1%2F���������������������������������������������������� (...)", "
My question is how can I remove everything after the top level domain (I just want sitename.com and nothing else). Currently, I start the parsing as "_sourceCategory=Threat/Malc0de | parse "URL: *, IP Address: *, Country: *, ASN: *, MD5: *</description>" as url,ip,country,asn,md5"
Any suggestions would be greatly appreciated!
Please sign in to leave a comment.