Previous page Next page

S8300 Alarms--_WD

See Table 15 through Table 24 for a list of S8300 Alarms--_WD.

Table 15. Alarm #8�
Number
8
Source
_WD
Event ID
4
Alarm Level
Major
Alarm Text Description
Maximum retries for app start
Possible Causes
Application failed (cannot start) maximum allowed number of times. The application is present but not launching.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
1. On the Web Interface, choose View Process Status and select the appropriate settings
2. From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 16. Alarm #9
Number
9
Source
_WD
Event ID
6
Alarm Level
Major
Alarm Text Description
Cannot open config parameter file
Possible Causes
Watchdog cannot read is configuration file /etc/opt/ecs/watchd.conf
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
1. Get a fresh copy of watchd.conf (from the CD for field, and from remote server or /root2 for the labs).
2. From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 17. Alarm #10�
Number
10
Source
_WD
Event ID
7
Alarm Level
Major
Alarm Text Description
Cannot open exe using config file PID
Possible Causes
Watchdog has a bad path name for an application it is supposed to start.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
1. Verify that the file named in the log exists and is executable.
2. Verify that the string in watchd.conf is correct.
3. From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 18. Alarm #11
Number
11
Source
_WD
Event ID
15
Alarm Level
Major
Alarm Text Description
Detected a rolling reboot
Possible Causes
Watchdog has detected x number of Linux reboots within y minutes, where x and y are configurable in /etc/opt/ecs/watchd.conf. A variety of bad things could have happened to cause a rolling reboot, it's not possible to list them all.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
1. (Lab only) Make sure all the executables listed in the watchd.conf exist and are executable. It has been found that the most common cause for rolling reboot is that files are not where they are expected
2. If everything looks OK with step 1, further investigation of trace log is necessary.

Table 19. Alarm #12�
Number
12
Source
_WD
Event ID
18
Alarm Level
Warning
Alarm Text Description
Application Restarted
Possible Causes
An application has failed and watchdog has restarted it successfully.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 20. Alarm #13
Number
13
Source
_WD
Event ID
19
Alarm Level
Minor
Alarm Text Description
Application failed unintentionally
Possible Causes
Watchdog is bringing the system down because an application has failed to start correctly. The application may have failed to start because the file did not exist (coincident with 7), or required parameters for the application in watchd.conf were missing or invalid.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
1. Verify that the file named in the log exists and is executable.
2. Verify that the string in watchd.conf is correct.
3. From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 21. Alarm #14�
Number
14
Source
_WD
Event ID
20
Alarm Level
Major
Alarm Text Description
Application totally failed
Possible Causes
Application failed maximum allowed number of times.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
1. Access the web page; view summary status
2. If the application is down, use "start -s application" to start the application.
3. From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 22. Alarm #15
Number
15
Source
_WD
Event ID
22
Alarm Level
Minor
Alarm Text Description
Application was shutdown
Possible Causes
Watchdog successfully shut down the named application
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear

Table 23. Alarm #16�
Number
16
Source
_WD
Event ID
23
Alarm Level
Major
Alarm Text Description
Watchd high monitor thread is rebooting the system
Possible Causes
The lo-monitor thread is missing heartbeats (can't get CPU time) and the hi-monitor thread has tried 3 times to recover the system by killing processes in an infinite loop. That is, if after 3 CPU occupancy profiles and recovery, the lo-monitor thread is still not heartbeating, then watchd will reboot the server.
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
Clear alarm: From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear.
Watch to see if alarm returns.
The server should've rebooted by the time a support person can analyze the system. A reboot normally fixes problems with unresponsive software.

Table 24. Alarm #17
Number
17
Source
_WD
Event ID
24
Alarm Level
Major
Alarm Text Description
Watchd high monitor thread is stopping tickling of hw
Possible Causes
This if rebooting the server for alarm 23 does not work. This reboot is done through a Linux system call which may not succeed. This can occur if Linux kernel semaphore is stuck. watchd starts a timer prior to calling reboot. If the timer expires, watchd will stop the HW sanity tickling in hope that the HW sanity watchdog will reboot the processor (i.e. a hard reboot).
Determining Cause
Go to the Web Interface; choose Diagnostics; View System Logs; select Watchdog Logs
Resolution
Clear alarm: From the Web Interface, choose Alarms and Notification; select the appropriate alarm; choose Clear.
Watch to see if alarm returns.
The server should've rebooted by the time a support person can analyze the system. A reboot normally fixes problems with unresponsive software.


Previous page Next page