Sumologic Cloudwatch Logs Lambda SourceHost and SourceName mapping
As I read the Sumologic description of Source Host and Source Name compared to Amazon CloudWatch Logs description of Log Streams and Log Groups, Source Host seems to map much better to a Log Stream, and Source Name seems to map much better to a Log Group:
Source Host: "...Sumo Logic recommends that you carefully select a meaningful name that uniquely identifies the host from which data is collected..."
Log Stream: "...a log stream is generally intended to represent the sequence of events coming from the application instance or resource..."
Source Name: "... the file path entered when you created your Source..."
Log Group: "...a typical log group organization for a fleet of Apache web servers could be the following: MyWebsite.com/Apache/access_log, or MyWebsite.com/Apache/error_log..."
However, the log collector maps them opposite.
I understand swapping this would probably mess with many clients' queries if they actually consumed it, so I don't have a problem forking this for our own purposes. I don't want to use the overrides or map in my log data, because this is extra duplicate configuration I don't want to manage.
But we are wondering why it wouldn't be mapped like this? What is the logic?
-
Official comment
Hey Zachary,
I reposting my team member Duc's reply from github so the community can see the answer:
Source Name in the case of Sumo local files points to the exact file path, when the Sumo local file source (https://help.sumologic.com/Send-Data/Sources/01Sources-for-Installed-Collectors/Local-File-Source) uses a wildcard syntax. Source Host is the (source) hostname/ip and usually populated automatically by the Sumo collector, and overwritten by users explicitly when that value is not accurate (e.g "localhost")
For Amazon contexts, I think the mapping is not 1 to 1 here. For example, logstream can point to a specific Network interface for VPC flow logs, in that case Source Name is the right match. LogGroup is trickier, by Amazon's definition, I personally think Source Category is a better match than Source Host. Of course, metadata can overlap or be duplicated, depending on your case. Feel free to modify the function, I do think it will need to be customized to the CWLogs it is collecting (e.g VPC Flowlogs will be different from diagnostic logs, etc.).Thanks!
Comment actions
Please sign in to leave a comment.
Comments
1 comment