Avaya Knowledge - Communication Manager: Media Gateway 1 out of service.

Details

All Communication Manager versions

Problem Clarification

Media Gateway 1 is out of service. Unable to recover the gateway remotely.

Cause

Findings.

display alarms Page 1

ALARM REPORT

Port Mtce On Alt Alarm Svc Ack? Date Date

Name Brd? Name Type State 1 2 Alarmed Resolved

001 MED-GTWY n MINOR y 09/10/15:09 09/10/16:00
001 MED-GTWY n MINOR y 09/10/15:14 09/10/16:00

display errors Page 1

HARDWARE ERROR REPORT - ACTIVE ALARMS

Port Mtce Alt Err Aux First/Last Err Err Rt/ Al Ac

Name Name Type Data Occurrence Cnt Rt Hr St

001       MED-GTWY                 769   0      09/06/16:19 1   0   1   r y
                                                09/06/16:19
001       MED-GTWY                 769   0      09/10/15:09 1 0 0   r y
                                                09/10/15:09
001       MED-GTWY                 1    0     09/10/15:14 1 0   0   r y
                                                09/10/15:14

Where Error codes, per attached Avaya manual alarms page 619.
Error Type 769 is a transient error, indicating that the link has unregistered with the Media Gateway. If the Media Gateway re-registers, the alarm is resolved. If the Link Loss Delay Timer (LLDT) on the primary server expires, Error Type 1 is logged. (H.248 Link Loss Delay Timer is set to 1 minute longer than the gateway primary search timer to ensure gateway calls are left up as long as possible. This change is made on the "change system-parameters ip-options" page of Communication Manager)
.
Error Type 1: failure of the H.248 link keep alive messages between the server and the Media Gateway. This is an indication that the LAN or the platform is down.

MG keep Alive mechanism:

After an MG is registered, Keep Alive (KA) messages will be used to check the status of the communication link between the MG and the MGC. If the link goes down, the the MG needs to be “unregistered” in the MGC. The responsibility for sending the KA message lies with the MG and will be sent only when there is no traffic on the TCP link. It is the function of the MGC to respond to the KA message. The KA message will be sent as an H.248 NotifyRequest and the MGC reply will be in the format of an H.248 NotifyReply message

The current keep alive strategy is, if there is no traffic on the TCP link, the MG would send a Notify message addressing the Root Termination ID with a keepalive event every twenty seconds.

I done remote access to MG 1 just for double check to review MG logs findings.

g4501a-001(develop)# show sys

System Name :

System Location :

System Contact :

Uptime (d,h:m:s) : 200,05:07:20 >>> MG has 200 days uptime >>>

MG internal logs

0001982706 09/10-15:11:32.00 GWG-STAMAJNO-00160 0 0x27 0 keepAliveFailed() - Close H248 socket >> Per CM error 769 loss register it is validated internal MG logs Close H248 >>

0001982705 09/10-15:11:14.00 SMG-LGCINFNO-00000 0 0xd15d4a0 0xd15d470 SMG:Sanity Timeout has occured 1 of 5 TaskName:tLogTffs >> Timeout that induced logical loss register H 248 >>

0001982701 09/10-15:11:04.00 SMG-LGCINFNO-00000 0x3d0002 0xd322680 0xd322650 SMG:Sanity Timeout has occured 1 of 5 TaskName:tLogMgr >> Timeout that induced logical loss register H 248 >>

0001982700 09/10-15:10:57.00 SMG-LGCINFNO-00000 0 0xf2cfe90 0xf2cfe6 SMG:Sanity Timeout has occured 1 of 5 TaskName:tstackCheck >> Timeout that induced logical loss register H 248 >>

MG not has internal hw/sw faults

g4501a-001(develop)# show faults

CURRENTLY ACTIVE FAULTS

--------------------------------------------------------------------------

No Fault Messages

Current Alarm Indications, ALM LED is off

--------------------------------------------------------------------------

None

Done!

g4501a-001(develop)#

Also can be seen with:
display error

2055 Reset MG - Pkt Send Err 20 1CC7C7A4 09/16/17:31 12/10/08:54 29

2055 Reset MG - Pkt Send Err 1F 1CC7C474 10/17/21:36 12/10/08:51 9

Definition: 2055 Reset MG - Pkt Send Err - Reset the media gateway Signaling Link due to error in Sending packets

Which indicate that we have a network issue

/var/log/ecs

20151210:085126685:15142881:capro(6455):MED:[ConnClosed: MG=#31 disconnected: socket closure, moved to link-bounce stateɿԺ near_ipaddr = 10.0.10.16, far_ipaddr = 10.0.30.15]

20151210:085201907:15143035:capro(6455):MED:[ConnClosed: MG=#32 disconnected: socket closure, moved to link-bounce stateɿԺ near_ipaddr = 10.0.10.16, far_ipaddr = 10.0.30.16]

Per CM and MG logs Avaya solution worked correct, just the logical connectivity path network had a transient/intermittent timeout/delay that induced MG loss register.

If it was once time event, customer could involved their data team lan/wan devices end to end that makes this connectivity in sense if they can review some data devices logs that could provide some clue about root cause.

Per my experience normal data devices historical logs sometimes do not provide information for this specific H.248 logical loss connectivity. Sometimes data devices need to have running internal data traces full time, if it appears again.

As proactive/preventive action, incase the issue reoccurs, the customer needs to involved their data team to install a data capture analyzer ( sniffer, wireshark, etc) at the clan/procr side where MG 1 is registered and a second data capture analyzer MG1 side where it is connected to lan network. If the issue appears again with both captures the customer's team will have the captured traces for analysis. If it is required customer could open a new ticket.

Solution

Issue was network induced. The Media Gateway recovered on it's own after network path to CM core was restored by customer's network personnel.

Additional Relevant Phrases

Problem with stability of MG G450, cutting the voice stream during the call.