Problem:
We have been losing log messages at the Sumologic collection receivers using hosted HTTP collectors that are serving as the endpoint for log messages (via log4j2.xml config file)
https://github.com/SumoLogic/sumologic-log4j2-appender
Cause:
Absence of configuration of flushAllBeforeStopping can cause current data in the buffer not to be ingested into Sumo Logic if the appender terminates abnormally.
Absence of configuration for maxQueueSizeBytes would utilize the default 1million bytes for the buffer size and if the size of the data set to be ingested exceeds that, then we have seen data not be ingested into Sumo Logic. Appender logs may or may not (depending on the version of the Appender) display the following example log message
2018-11-21 11:12:21,005 Log4j2-TF-1-AsyncLoggerConfig--1 WARN Evicted 1 messages from buffer
Resolution:
Ensure the following settings are configured correctly
flushAllBeforeStopping - is an optional setting that should be set to true since it will flush all messages before stopping regardless of flushingAccuracyMs and avoid potential loss of data in ingestion.
maxQueueSizeBytes is another optional setting set to 1000000 by default for maximum capacity (in bytes) of the message queue. If your message queue is bigger, it is recommended to increase this setting or else risk loss of data in ingestion
The relevant section in log4j2.xml would look like this
<SumoLogicAppender name="SumoAppender" url="${sumologic.httpsource.url}" flushAllBeforeStopping="true" maxQueueSizeBytes="<suitable_value>">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss,SSS Z} [%t] level: %-5p category: %c - message: %m%n" />
</SumoLogicAppender>
Comments
1 comment
Make sure your pattern layout begins with the proper date format, otherwise the default behavior of the ingest is to search for dates in your message string. Per Sumo support, for the logback appender the correct date format is ...
If this is left out or is incorrect, then this can result in messages being assigned incorrect message times at ingest which means you won't see them in queries in the expected timeframe(unless you specify "Use Receipt Time"). Note that you're free to modify the pattern layout after the date string. Also, I haven't tested anything beyond the default settings, I believe you have flexibility here with respect to how message times are set and what date format is used on ingest, according to this ...
https://help.sumologic.com/03Send-Data/Collector-FAQs/Troubleshooting-time-discrepancies
The following query will show you if you have messages where the receipt time and message time are off by more than a minute.
// be sure to click "Use Receipt Time" when running this query
* |
_receiptTime as receipttime_ms |
_messageTime as messagetime_ms |
messagetime_ms - receipttime_ms as diff_ms |
if(diff_ms < 0, diff_ms * -1, diff_ms) as absolute_time_diff_ms |
where absolute_time_diff_ms > 2 * 60 * 1000000 |
timeslice 1d |
formatDate(_timeslice, "yyyy-MM-dd") as day |
count as count, min(receipttime_ms) as oldest_receiptime_ms, max(receipttime_ms) as latest_receiptime_ms, min(absolute_time_diff_ms) as min_time_diff_ms, max(absolute_time_diff_ms) as max_time_diff_ms group by day, _sourceName |
formatDate(fromMillis(toLong(oldest_receiptime_ms)), "yyyy-MM-dd HH:mm:ss.SSS") as oldest |
formatDate(fromMillis(toLong(latest_receiptime_ms)), "yyyy-MM-dd HH:mm:ss.SSS") as latest |
round(min_time_diff_ms / (1000 * 60)) as min_time_diff_minutes |
round(max_time_diff_ms / (1000 * 60)) as max_time_diff_minutes |
fields day, _sourceName, count, oldest, latest, min_time_diff_minutes, max_time_diff_minutes |
sort by day desc, _sourceName
Please sign in to leave a comment.