With the multi-line option selected the Sumo Logic Collector will attempt to detect a common pattern that denotes the first line of a multi-line message. The Collector will look at each line coming in from a Source and attempt to match that line to the known expression. If the line matches then the Collector will mark this as the start of a new message and any additional lines that do not match the expression will be assumed as part of that message. Once the Collector detects another line matching the expression it will flush the previous lines as a single message and mark that next line as the start of a new message.
When the option for "Infer boundaries" is used the Collector will attempt to use the first xx number of lines and an algorithm to try and determine a pattern that may denote a new message starting line. Auto-detection works best if the log messages contain a common anchor to start the line, such as a timestamp, and the formatting of the messages being received by the source are in a consistent format. In cases where a single Source is being used to collect multiple different types of files of varying formats or If no consistent pattern is detected within the messages being received then it is possible for each line to be flushed as a single message or some messages to be improperly grouped into a single message. Even when ingesting a single Source type Auto-detection is not guaranteed to work for all cases, this is noted within the Source configuration with the following text "Please note, Infer Boundaries may not be accurate for all log types." In this case a custom "Boundary Regex" expression may be required for detecting the start of a new message.
When the option for "Boundary Regex" is used with the multi-line detection the Collector will use the supplied regular expression to try and match the first line of a multi-line message. NOTE the expression supplied must match the entire first line of a message up to, and in some cases including, the trailing line feed or carriage return.
So for example, given the following multi-line message:
2019-09-17 14:39:15,523 -0700 [CPU-ResourceMonitor-1] INFO
com.sumologic.scala.collector.monitoring.CollectorResourceMonitor - With current users:
current usage is 0
Acceptable boundary expressions may be:
Unacceptable boundary expressions would include the following, since they do not match the entire first line: