Site encountered an issue where during MIDN's, both Call Server's INIed, due to a bad compact flash in the RMD slot. They have noticed it on a few sites, and replacing the RMD corrects the issue. The Call Server should not INI due to a bad RMD. Site is on release 7.6, current SP. Platform is CPPIV. CF was an Astek 256MB, and was the same as previously in use on 7.5 prior to upgrade. They have replaced it with a Nortel 1GB.
NI every 5 minutes (we are in an INI loop). WARM START IN PROGRESS - Reason 42 . BUG7060 SWD: HARDWARE WATCHDOG INTERRUPT EVENT . BUG7058 SWD: Swd watchdog timer expired on task tLS . SRPT0107 Hardware reset reason = Watchdog L1 . Patch disables semWatch
EXAMPLE FROM SITE: ------------------ [1021] 10/08/2013 12:44:29 SRPT0107 Hardware reset reason = Watchdog L1 [register value = 0x20] edi = 1139ab34 esi = 0090006b ebp = 1139ab14 esp = 1139aae4 ebx = 1139aaec edx = 00000101 ecx = 1139ab1f eax = 0032a6f6 eflags = 00000212 pc = 01a65484
Return Address Stack: 011fc7ec (sysPhase1+25c) 0228a90c (vxTaskEntry+c) Stack (top = 0x1139aae4):
1139aae4: 00000002 00000003 1139ab34 0090006b 1139ab14 1139aae4 1139aaec 00000101 1139ab04: 1139ab1f 0032a6f6 00000212 01a65484 1139ab48 011fc7ec 0090006b 0032a6f6 1139ab24: 00000020 00000000 00000000 00000000 4379742f 00312f6f eeeeeeee eeeeeeee 1139ab44: 0028a900 00000000 0228a90c 00000000 00000000 00000000 00000000 00000000 We can also observe software watch dog expiry on tLS task as well. [1059] 10/08/2013 12:48:23 BUG7058 SWD: Swd watchdog timer expired on task tLS, TID: 0x10e57fc8, 1 time(s), PC=0x23d78f1, PRI=60, STATUS=DELAY edi = 10e57fc8 esi = 0000012c ebp = 10e57f6c esp = 10e57f54 ebx = 10e57fc8 edx = 00000084 ecx = 0000002c eax = 00000000 eflags = 00000246 pc = 023d78f1 Return Address Stack: 00fa7d96 (hiPJobServer+1e6) 0228a90c (vxTaskEntry+c) Stack (top = 0x10e57f54): 10e57f54: 05391e08 05391e20 011ead9d 05391dc8 05391e08 10e57f98 10e57f98 00fa7d96 10e57f74: 0000012c 00000000 00000000 00000000 0000003c 00534c74 00000000 00000000 It seems like the
S/w watch dog is not kicking the H/W watch dog and hence the H/W watch dog is resetting the system.
reviewed the issue, and found hardware INI relating to HWD timeout, which can occur when drive reads take longer than expected. Site was using 3rd party CF on a PPIV, so that is a time when drive reads would take the longest and potentially be exposed to this bug. Patch MPLR32939 has been created to address the issue and will be in the next service pack.
NI every 5 minutes (we are in an INI loop). WARM START IN PROGRESS - Reason 42 . BUG7060 SWD: HARDWARE WATCHDOG INTERRUPT EVENT . BUG7058 SWD: Swd watchdog timer expired on task tLS . SRPT0107 Hardware reset reason = Watchdog L1 . Patch disables semWatch
EXAMPLE FROM SITE: ------------------ [1021] 10/08/2013 12:44:29 SRPT0107 Hardware reset reason = Watchdog L1 [register value = 0x20] edi = 1139ab34 esi = 0090006b ebp = 1139ab14 esp = 1139aae4 ebx = 1139aaec edx = 00000101 ecx = 1139ab1f eax = 0032a6f6 eflags = 00000212 pc = 01a65484
Return Address Stack: 011fc7ec (sysPhase1+25c) 0228a90c (vxTaskEntry+c) Stack (top = 0x1139aae4):
1139aae4: 00000002 00000003 1139ab34 0090006b 1139ab14 1139aae4 1139aaec 00000101 1139ab04: 1139ab1f 0032a6f6 00000212 01a65484 1139ab48 011fc7ec 0090006b 0032a6f6 1139ab24: 00000020 00000000 00000000 00000000 4379742f 00312f6f eeeeeeee eeeeeeee 1139ab44: 0028a900 00000000 0228a90c 00000000 00000000 00000000 00000000 00000000 We can also observe software watch dog expiry on tLS task as well. [1059] 10/08/2013 12:48:23 BUG7058 SWD: Swd watchdog timer expired on task tLS, TID: 0x10e57fc8, 1 time(s), PC=0x23d78f1, PRI=60, STATUS=DELAY edi = 10e57fc8 esi = 0000012c ebp = 10e57f6c esp = 10e57f54 ebx = 10e57fc8 edx = 00000084 ecx = 0000002c eax = 00000000 eflags = 00000246 pc = 023d78f1 Return Address Stack: 00fa7d96 (hiPJobServer+1e6) 0228a90c (vxTaskEntry+c) Stack (top = 0x10e57f54): 10e57f54: 05391e08 05391e20 011ead9d 05391dc8 05391e08 10e57f98 10e57f98 00fa7d96 10e57f74: 0000012c 00000000 00000000 00000000 0000003c 00534c74 00000000 00000000 It seems like the
S/w watch dog is not kicking the H/W watch dog and hence the H/W watch dog is resetting the system.
reviewed the issue, and found hardware INI relating to HWD timeout, which can occur when drive reads take longer than expected. Site was using 3rd party CF on a PPIV, so that is a time when drive reads would take the longest and potentially be exposed to this bug. Patch MPLR32939 has been created to address the issue and will be in the next service pack.