AVAYA Aura Web Gateway (AAWG): CSACassandra status is FAILED, Not running, and getting errors on web page of AAWG


Doc ID    SOLN325646
Version:    7.0
Status:    Published
Published date:    10 Oct 2023
Created Date:    24 May 2018
Author:   
Charles Kuhn
 

Details

AAWG web interface inaccessible, console indicates Cassandra not running. Device was working fine for many weeks since installation. Reboot and service restarts resulted in no change.
 

CallSignallingAgent:3.3.0.0.683 (can affect any version of aawg)

Similar process can break with Device Services(AADS), and Mulitimedia Messaging (AMM)

Problem Clarification

When looking at the state of services running on aawg we see issue with Call Signaling Agent (CSA) and specifically Cassandra:

[admin@aawg ~]$ app status

2018-05-15_10:00:43 Displaying status for Avaya Aura Web Gateway Services Application
2018-05-15_10:00:43 ulimit file count ................... [ OK ]
2018-05-15_10:00:43 ulimit process count ................ [ OK ]
2018-05-15_10:00:44 firewalld status ..................... [ OK ]
2018-05-15_10:00:44 net-SNMP status ..................... [ OK ]
2018-05-15_10:00:44 CSAKeepalived status ................. [INACTIVE]
2018-05-15_10:00:44 CSATomcat status ..................... [ OK ]
2018-05-15_10:00:44 CSANginx status ..................... [ OK ]
2018-05-15_10:00:44 CSACassandra status ................. [FAILED]
2018-05-15_10:00:44 CSATelportal status .................. [ OK ]
 
[admin@aawg ~]$ svc csa status
Status of Avaya Aura Web Gateway Services Application
 
CSATomcat       Running         n/a             16685
CSATelportal    Running         n/a             16763
CSANginx        Running         n/a             15634
CSACassandra    Not running     Activating      n/a
 
 

From aawg's /opt/Avaya/CallSignallingAgent/3.3.0.0.683/logs/CSA_utility.log

May 23 12:12:01 aawg.company.domain bash[27255]: Starting Cassandra ..................
May 23 12:12:01 aawg.company.domain bash[27255]: Cassandra failed to start because NTP is not synchronized.
May 23 12:12:01 aawg.company.domain systemd[1]: CSACassandra.service: control process exited, code=exited status=1
May 23 12:12:01 aawg.company.domain systemd[1]: Failed to start Cassandra Service (Avaya).

 In the case of an AMM, the AMM_utility.log will show that Cassandra service unable to start due to NTP not synchronized.

Cause

Company NTP services set for the AAWG were not trusted, not syncing the date and time despite the date and time being correct on the AAWG device there is a health check of NTP that takes place during boot or service startup and if the NTP is not in sync then Cassandra will not render into a running state.

 

[admin@aawg ~]$ ntpstat
unsynchronised
   polling server every 8 s

Solution

NTP admin needs to either fix the NTP resource or try a new NTP server until criteria is validated

 

Some basics on NTP:

 

[admin@aawg ~]$ ntpq -p
     remote           refid      st t when poll reach   delay   offset jitter
==============================================================================
 ntp1.company. 192.168.110.100 2 u   41   64 377   26.181 4200.95   1.485

 

The ntpstat command will show if basic criteria met, synchronized. If unsynchronized then something is not healthy with the ntp resource. The ntpq -p command will show but if there is no * next to the remote name then the resource is not trusted for some reason. The likely reason is high root dispersion.

 

[admin@aawg ~]$ ntpq -nc ass
 
ind assid status conf reach auth condition last_event cnt
===========================================================
 1 28029 9024   yes   yes none    reject   reachable 2
 
[admin@aawg ~]$ ntpq
ntpq> rv 28029
associd=28029 status=9024 conf, reach, sel_reject, 2 events, reachable,
srcadr=ntp1.company.domain, srcport=123, dstadr=172.16.24.33,
dstport=123, leap=00, stratum=2, precision=-23, rootdelay=41.397,
rootdisp=10243.896, refid=192.168.110.100,
reftime=deb177e7.966eb265 Thu, May 24 2018 12:41:59.587,
rec=deb17a6e.cd5dbf44 Thu, May 24 2018 12:52:46.802, reach=037,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=313,
flash=400 peer_dist, keyid=0, offset=1.268, delay=25.702,
dispersion=0.925, jitter=0.764, xleave=0.083,
filtdelay=    25.70   25.76   25.70   26.08   25.97   25.96   25.67   26.14,
filtoffset=   1.27    1.00    0.76    0.62    0.44    0.40    0.22    0.38,
filtdisp=      0.00    1.02    2.01    3.03    3.89    3.92    3.95    3.98

Above shows the root dispersion is 10243.896 ms, this is too high. In basic terms the root dispersion is the longest amount of time the client and server have been out of communication at any one point during the span of the connection. If the root dispersion is too high then the client will REJECT it as a valid source.


There is also a way to get ntpd startup to essentially ignore the high root dispersion. Here's an example that corrected this problem on an AMM server:

sudo /bin/systemctl stop ntpd
sudo ntpd -gq
sudo /bin/systemctl start ntpd

The -gq flags are useful if customer is using a Windows time server

If customer does not have a good NTP server in internal network, they can try to use a global NTP server address, for example : 88.147.254.230, we have validated this global ntp server address in one of customer's system already. 

Add the NTP server below in /etc/ntp.conf in AAWG server.

server 88.147.254.230 iburst maxpoll 10

 

restrict 88.147.254.230 mask 255.255.255.255 nomodify notrap noquery

 

 

Additional Relevant Phrases

CSACassandra status is FAILED on AAWG

Avaya -- Proprietary. Use pursuant to the terms of your signed agreement or Avaya policy