Problem:
My Collector shows a red exclamation mark or a red icon on the Manage Data -> Collection page
Cause:
The installed collectors and sources show the red exclamation mark indicating the collectors are not communicating with the Sumo Logic service. The collector communicates with the backend service via heartbeats that are sent every minute. When heartbeats stop reaching the backend service for at least 30 minutes, the UI will display the collector under the Manage Data -> Collection tab -> Stopped Collectors.
Answer:
To troubleshoot the issue, the following steps target the collector side service issues and connectivity issues.
Solution 1: Check if the collector service is running on the server and manually start the collector service as needed.
On Linux - from collector directory run either of the following commands from the cli terminal window
./collector status
OR
ps -ef |grep -i coll
On Windows from Task Manager or services.msc check status of sumo-collector
a. If collector service is not running, manually start the service as follows and the check in the Manage Data -> Collection tab the whether the status of the collector changes to green check mark within 15 minutes assuming the collector is able to communicate with the backend service
On Linux - from collector directory run
./collector start
On Windows from Task Manager or services.msc start the sumo-collector service
b. If the collector service starts but stops soon after then it may be failing to start due to a configuration issue - for example incorrect user credentials specified in the config/user.properties file. To identify the cause, please review ERROR or WARN messages in logs/collector.log or logs/collector.out.log files for clues on potential issues.
Solution 2: If the collector service is running and collector is still showing as Stopped, check the network connectivity between the server to Sumo Logic and troubleshoot as needed - for example there could a firewall issue per this link
To verify that there is no connectivity issue with the Sumo Logic service, run the following tests from the host that the Collector is running
$ curl -i https://collectors.sumologic.com
Make sure outbound SSL connectivity exists from the collector to the backend Sumo Logic service on port 443. An example below of connect exceptions in logs/collector.log:
2018-11-22 06:37:04,465 [WrapperSimpleAppMain] WARN com.sumologic.scala.collector.auth.CollectorRegistrationManager - Unexpected when pinging sumo service, retrying in 60 seconds
org.apache.http.conn.HttpHostConnectException: Connect to endpoint1.collection.sumologic.com:443 [endpoint1.collection.sumologic.com/34.202.125.89, endpoint1.collection.sumologic.com/52.44.176.223, endpoint1.collection.sumologic.com/34.236.95.98, endpoint1.collection.sumologic.com/34.193.145.38, endpoint1.collection.sumologic.com/18.205.140.62, endpoint1.collection.sumologic.com/52.1.132.165, endpoint1.collection.sumologic.com/52.202.93.142, endpoint1.collection.sumologic.com/18.214.84.80] failed: Connection refused (Connection refused)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
Solution 3: If the collector service is running and network connectivity is good, restart the collector as a precaution.
If some of the collector's threads were hung, a restart may help. It may help to obtain a thread dump of the collector service as follows before restarting the collector
- First identify the collector process id or pid from the cli command on Linux
- Second run the cli command to generate a collector thread dump on Linux
kill -3 <collector-java-pid>
- Third restart the collector service
./collector restart
If the issue persists, please open a ticket with the Support team at Sumo Logic providing the compressed (zipped for windows or tarball for linux) of the <sumo_install_dir>/logs directory.
Comments
0 comments
Please sign in to leave a comment.