CM5.2.. "statapp" shows all processes are down in standby server.


Doc ID    SOLN265673
Version:    1.0
Status:    Published
Published date:    23 Mar 2015
Author:   
pcanonce
 

Problem Clarification

CM5.2.1 / S8730

Customer already reinstall CM (after they found out that all processes are down) but issue still persist. Customer tried switching to "sw" duplication and all processes went UP after this.

In the logs, weare seeing segmentation fault / EIP errors.

20150228:113059584:287:hmm(8055):MED:[pr_sigmapbt: release R015x.02.1.016.4:drint-sp03:alawint:/usr/add-on/field_base521/cm5.2.1/SP/cm5-016_4-20102SP.pj@10/17/12 02:14:35 PM]

20150228:113059584:288:hmm(8055):MED:[pr_sigmapbt: signal=11 (Segmentation fault)]

20150228:113059584:289:hmm(8055):MED:[pr_siginfo: SIGNAL=11 (Segmentation fault)]

20150228:113059584:290:hmm(8055):MED:[   pr_siginfo: si_signo=11 si_errno=0 si_code=1 ]

20150228:113059584:291:hmm(8055):MED:[(SEGV_MAPERR - address not mapped)]

20150228:113059584:292:hmm(8055):MED:[   fault address=0x8188000]

20150228:113059584:293:hmm(8055):MED:[Registers:]

20150228:113059585:294:hmm(8055):MED:[ EIP: 0x001a5cfc      CS: 0x00000073     ESP: 0xbf9c186c   SS: 0x0000007b  UESP: 0xbf9c186c]

20150228:113059585:295:hmm(8055):MED:[ ERR: 0x00000004  EFLAGS: 0x00010216  TRAPNO: 0x0000000e  CR2: 0x08188000 (trap address)]

20150228:113059585:296:hmm(8055):MED:[ EAX: 0x00000001     EBX: 0x00d62000     ECX: 0x0004e000  EDX: 0x00139000]

20150228:113059585:297:hmm(8055):MED:[ EBP: 0xbf9c1898     ESI: 0x08188000     EDI: 0x00603000]

20150228:113059585:298:hmm(8055):MED:[  DS: 0x0000007b      ES: 0x0000007b      FS: 0x00000000   GS: 0x00000033]

20150228:113059585:299:hmm(8055):MED:[Link Map:]

20150228:113059585:300:hmm(8055):MED:[ start  : name]

20150228:113059585:301:hmm(8055):MED:[0x00000000: ]

20150228:113059585:302:hmm(8055):MED:[0x00355000: ]

20150228:113059585:303:hmm(8055):MED:[0x00000000: /opt/defty/lib/libndup.so.1]

20150228:113059585:304:hmm(8055):MED:[0x003e1000: /opt/defty/lib/liboryx.so.1]

20150228:113059585:305:hmm(8055):MED:[0x0038a000: /opt/ecs/lib/libarb.so.1]

20150228:113059585:306:hmm(8055):MED:[0x008fb000: /usr/lib/libstdc++.so.6]

20150228:113059585:307:hmm(8055):MED:[0x00110000: /lib/tls/libm.so.6]

20150228:113059585:308:hmm(8055):MED:[0x00133000: /lib/libgcc_s.so.1]

20150228:113059585:309:hmm(8055):MED:[0x0013b000: /lib/tls/libc.so.6]

20150228:113059585:310:hmm(8055):MED:[0x0051b000: /opt/ecs/lib/libinads.so.1]

20150228:113059585:311:hmm(8055):MED:[0x005fe000: /opt/defty/lib/libsync.so.1]

20150228:113059585:312:hmm(8055):MED:[0x007fe000: /lib/ld-linux.so.2]

20150228:113059585:313:hmm(8055):MED:[Stack Backtrace:]

20150228:113059585:314:hmm(8055):MED:[   Frame(0): 500073f1]

20150228:113059585:315:hmm(8055):MED:[   Frame(1): 080e1730]

20150228:113059585:316:hmm(8055):MED:[   Frame(2): 080df4e8]

20150228:113059585:317:hmm(8055):MED:[   Frame(3): 080db872]

20150228:113059585:318:hmm(8055):MED:[   Frame(4): 08068a7a]

20150228:113059585:319:hmm(8055):MED:[   Frame(5): 0806a783]

20150228:113059585:320:hmm(8055):MED:[   Frame(6): 0806ba5c]

20150228:113059585:321:hmm(8055):MED:[   Frame(7): 003f6181]

20150228:113059585:322:hmm(8055):MED:[   Frame(8): 003f1f1f]

20150228:113059585:323:hmm(8055):MED:[   Frame(9): 003f2703]

20150228:113059585:324:hmm(8055):MED:[   Frame(10): 080ded14]

20150228:113059585:325:hmm(8055):MED:[   Frame(11): 0014fe23]

20150228:113059585:326:prc_mgr(8038):HIGH:[_op_sighandler: received signal 17 (Child exited)]

20150228:113059585:327:prc_mgr(8038):HIGH:[signalAcp: sent Stopped (signal) signal to all O/P processes]

20150228:113059586:328:prc_mgr(8038):HIGH:[opdebug: called for pid 65546]

20150228:113059586:329:prc_mgr(8038):HIGH:[mcd_close: name of mini-coredump file: 2015-0228-113059.586.mcd]

20150228:113059586:330:prc_mgr(8038):HIGH:[mcd_close: sending SIGDEFGENCORE (36) to pid 6990]

20150228:113059586:331:prc_mgr(8038):HIGH:[debug done]

20150228:113059586:332:prc_mgr(8038):HIGH:[signalAcp: sent Stopped (signal) signal to all O/P processes]

20150228:113059586:333:prc_mgr(8038):HIGH:[restart_req: source software, request WARM, current escalation level 4, escalated, doing REBOOT]

20150228:113059586:334:prc_mgr(8038):HIGH:[restart_req: restart request WARM while a restart UNKNOWN is in progress]

 

Cause

Defective DAL2 memory card.

Solution

Replaced the 512 DDR memory card.  Or, replaced the entire DAL2 card.


Avaya -- Proprietary. Use pursuant to the terms of your signed agreement or Avaya policy