Communication Manager: Media Gateway 1 out of service.


Doc ID    SOLN236443
Version:    9.0
Status:    Published
Published date:    27 Feb 2020
Created Date:    13 Sep 2013
Author:   
Juan Gabriel Ayala Cedillo
 

Details

All Communication Manager versions

Problem Clarification

Media Gateway 1 is out of service. Unable to recover the gateway remotely.

Cause

Findings.

 

display alarms                                                         Page   1

 

                                 ALARM REPORT

 

Port      Mtce      On   Alt          Alarm   Svc   Ack? Date        Date

          Name      Brd? Name         Type    State 1 2  Alarmed     Resolved

001       MED-GTWY  n                 MINOR         y    09/10/15:09 09/10/16:00
001       MED-GTWY  n                 MINOR         y    09/10/15:14 09/10/16:00


 

display errors                                                         Page   1

 

                     HARDWARE ERROR REPORT - ACTIVE ALARMS        

 

Port      Mtce     Alt             Err   Aux    First/Last   Err Err Rt/ Al Ac

          Name     Name            Type  Data   Occurrence   Cnt Rt  Hr  St

 

001       MED-GTWY                 769   0      09/06/16:19  1   0   1   r  y
                                                09/06/16:19
001       MED-GTWY                 769   0      09/10/15:09  1   0   0   r  y
                                                09/10/15:09
001       MED-GTWY                 1     0      09/10/15:14  1   0   0   r  y
                                                09/10/15:14


Where Error codes, per attached Avaya manual alarms page 619.
Error Type 769 is a transient error, indicating that the link has unregistered with the Media Gateway. If the Media Gateway re-registers, the alarm is resolved. If the Link Loss Delay Timer (LLDT) on the primary server expires, Error Type 1 is logged. (H.248 Link Loss Delay Timer is set to 1 minute longer than the gateway primary search timer to ensure gateway calls are left up as long as possible. This change is made on the "change system-parameters ip-options" page of Communication Manager)
.
Error Type 1: failure of the H.248 link keep alive messages between the server and the Media Gateway. This is an indication that the LAN or the platform is down.

MG keep Alive mechanism:

After an MG is registered, Keep Alive (KA) messages will be used to check the status of the communication link between the MG and the MGC. If the link goes down, the the MG needs to be “unregistered” in the MGC. The responsibility for sending the KA message lies with the MG and will be sent only when there is no traffic on the TCP link. It is the function of the MGC to respond to the KA message. The KA message will be sent as an H.248 NotifyRequest and the MGC reply will be in the format of an H.248 NotifyReply message

The current keep alive strategy is, if there is no traffic on the TCP link, the MG would send a Notify message addressing the Root Termination ID with a keepalive event every twenty seconds.
 

 I done remote access to MG 1 just for double check to review MG logs findings.

g4501a-001(develop)# show sys
System Name             :
System Location         :
System Contact          :
Uptime (d,h:m:s)        : 200,05:07:20  >>> MG has 200 days uptime  >>>
 
MG internal logs
0001982706 09/10-15:11:32.00 GWG-STAMAJNO-00160 0          0x27       0 keepAliveFailed() - Close H248 socket  >> Per CM error 769 loss register it is validated internal MG logs Close H248 >>
0001982705 09/10-15:11:14.00 SMG-LGCINFNO-00000 0          0xd15d4a0  0xd15d470 SMG:Sanity Timeout has occured 1 of 5 TaskName:tLogTffs  >> Timeout that induced logical loss register H 248 >>
0001982701 09/10-15:11:04.00 SMG-LGCINFNO-00000 0x3d0002   0xd322680  0xd322650 SMG:Sanity Timeout has occured 1 of 5 TaskName:tLogMgr >> Timeout that induced logical loss register H 248 >>
0001982700 09/10-15:10:57.00 SMG-LGCINFNO-00000 0          0xf2cfe90  0xf2cfe6 SMG:Sanity Timeout has occured 1 of 5 TaskName:tstackCheck >> Timeout that induced logical loss register H 248 >>
 
MG not has internal hw/sw faults
g4501a-001(develop)# show faults
CURRENTLY ACTIVE FAULTS
--------------------------------------------------------------------------
No Fault Messages
Current Alarm Indications, ALM LED is off
--------------------------------------------------------------------------
None
Done!
g4501a-001(develop)#

Also can be seen with:
display error
2055  Reset MG -  Pkt Send Err  20        1CC7C7A4 09/16/17:31 12/10/08:54  29
2055  Reset MG -  Pkt Send Err  1F        1CC7C474 10/17/21:36 12/10/08:51  9
 
Definition: 2055 Reset MG - Pkt Send Err - Reset the media gateway Signaling Link due to error in Sending packets
Which indicate that we have a network issue

/var/log/ecs
20151210:085126685:15142881:capro(6455):MED:[ConnClosed: MG=#31 disconnected: socket closure, moved to link-bounce stateɿԺ near_ipaddr = 10.0.10.16, far_ipaddr = 10.0.30.15]
20151210:085201907:15143035:capro(6455):MED:[ConnClosed: MG=#32 disconnected: socket closure, moved to link-bounce stateɿԺ near_ipaddr = 10.0.10.16, far_ipaddr = 10.0.30.16]
 
Per CM and MG logs Avaya solution worked correct, just the logical connectivity path network had a transient/intermittent timeout/delay that induced MG loss register.
 
If it was once time event, customer could involved their data team lan/wan devices end to end that makes this connectivity in sense if they can review some data devices logs that could provide some clue about root cause.
 
Per my experience normal data devices historical logs  sometimes do not provide information for this specific H.248 logical loss connectivity. Sometimes data devices need to have running internal data traces full time, if it appears again.
 
As proactive/preventive action, incase the issue reoccurs, the customer needs to involved their data team to install a data capture analyzer ( sniffer, wireshark, etc) at the clan/procr side where MG 1 is registered and a second data capture analyzer MG1 side where it is connected to lan network.  If the issue appears again with both captures the customer's  team will have the captured traces for analysis. If it is required customer could open a new ticket.

Solution

Issue was network induced. The Media Gateway recovered on it's own after network path to CM core was restored by customer's network personnel.

 

Additional Relevant Phrases

Problem with stability of MG G450, cutting the voice stream during the call.

Avaya -- Proprietary. Use pursuant to the terms of your signed agreement or Avaya policy