Avaya Support Forums

Avaya Support Forums (http://support.avaya.com/forums/index.php)
-   Avaya Aura & Unified Communications (http://support.avaya.com/forums/forumdisplay.php?f=2)
-   -   CS1K:All releases BERR0705 EXC 1Exception 18 in Task for CPP4 cards only (http://support.avaya.com/forums/showthread.php?t=3985)

smaiyaur 12-09-2013 12:48 AM

CS1K:All releases BERR0705 EXC 1Exception 18 in Task for CPP4 cards only
 
The Pentium 4 and P6 family processors implement a machine-check architecture that provides a mechanism for detecting and reporting hardware (machine) errors, such as system bus errors, ECC (Error correction code) errors, parity errors, cache errors, and TLB (Translation lookaside buffer) errors.
BERR705 has happened due to Exception 18. The source for Exception (vector number) 18 are error codes (if any) and source are model dependent. (This exception was introduced in the Pentium processor and enhanced in the P6 family processors). The vector 18 is an "abort" type of exception (An abort is an exception that does not always report the precise location of the instruction causing the exception and does not allow restart of the program or task that caused the exception. Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal values in system tables).
The Exception 18 - Machine-Check Exception. A Machine Check Exception (MCE) is a type of computer hardware error that occurs when a computer's central processing unit detects an unrecoverable hardware (!) problem.

Technical info:
http://en.wikipedia.org/wiki/Machine_Check_Exception
http://en.wikipedia.org/wiki/Machine_check_architecture
Similar issues:
(2006) wi00603174 / Q01323064
(2006) wi00606795 / Q01505207
(2007) wi00615767 / Q01675165
(2009) wi00616257 / Q01981027
(2009) wi00623509 / Q01984122

Suggestions:
Check CPP4 pack for the object to possible overheating. Change CPP4 pack if necessary.

Resume:
Nothing can be done from the software perspective for this hardware issue. Operating System cannot proceed with normal working if some hardware problem occurs with CPU or RAM, like overheating. So, the system just restarts the currently activ Solution

Replace the card.Nothing can be done from the software perspective for this hardware issue. Operating System cannot proceed with normal working if some hardware problem occurs with CPU or RAM, like overheating. So, the system just restarts the currently active task. In our case the task was "tSL1" and the system decides to restart the whole system. This is the design intent from CS1000 software perspective.

roberto 12-09-2013 05:34 AM

There seems like a lot of history on this type of error. Intel does not publish MTBF data for its Pentium processors, but they have become progressively more resilient over the years.

Before doing a full rip and replace of hardware, some other things worthwhile checking:

1) Clean out all dust that may have clogged the ventilation into the box. It may be worthwhile opening it up and using an pressurized air canister to blow it all out. Dust is usually the main cause of overheating which can result in hardware failures such as MCEs.

2) Check for power fluctuations. If a CPU or motherboard components, such as memory, are not receiving clean and steady power, they will not behave properly. Automated Voltage Regulators (AVGs) or other power filtering devices can help with narrowing the power fluctuation spectrum.


All times are GMT -7. The time now is 07:10 AM.