AVAYA Aura Web Gateway (AAWG): CSACassandra status is FAILED, Not running, and getting errors on web page of AAWG

Doc ID		SOLN325646
Version:		7.0
Status:		Published
Published date:		10 Oct 2023
Created Date:		24 May 2018

Author:

Charles Kuhn

Details

AAWG web interface inaccessible, console indicates Cassandra not running. Device was working fine for many weeks since installation. Reboot and service restarts resulted in no change.

CallSignallingAgent:3.3.0.0.683 (can affect any version of aawg)

Similar process can break with Device Services(AADS), and Mulitimedia Messaging (AMM)

Problem Clarification

When looking at the state of services running on aawg we see issue with Call Signaling Agent (CSA) and specifically Cassandra:

[admin@aawg ~]$ app status

2018-05-15_10:00:43 Displaying status for Avaya Aura Web Gateway Services Application

2018-05-15_10:00:43 ulimit file count ................... [ OK ]

2018-05-15_10:00:43 ulimit process count ................ [ OK ]

2018-05-15_10:00:44 firewalld status ..................... [ OK ]

2018-05-15_10:00:44 net-SNMP status ..................... [ OK ]

2018-05-15_10:00:44 CSAKeepalived status ................. [INACTIVE]

2018-05-15_10:00:44 CSATomcat status ..................... [ OK ]

2018-05-15_10:00:44 CSANginx status ..................... [ OK ]

2018-05-15_10:00:44 CSACassandra status ................. [FAILED]

2018-05-15_10:00:44 CSATelportal status .................. [ OK ]

[admin@aawg ~]$ svc csa status

Status of Avaya Aura Web Gateway Services Application

CSATomcat Running n/a 16685

CSATelportal Running n/a 16763

CSANginx Running n/a 15634

CSACassandra Not running Activating n/a

From aawg's /opt/Avaya/CallSignallingAgent/3.3.0.0.683/logs/CSA_utility.log

May 23 12:12:01 aawg.company.domain bash[27255]: Starting Cassandra ..................

May 23 12:12:01 aawg.company.domain bash[27255]: Cassandra failed to start because NTP is not synchronized.

May 23 12:12:01 aawg.company.domain systemd[1]: CSACassandra.service: control process exited, code=exited status=1

May 23 12:12:01 aawg.company.domain systemd[1]: Failed to start Cassandra Service (Avaya).

In the case of an AMM, the AMM_utility.log will show that Cassandra service unable to start due to NTP not synchronized.

Cause

Company NTP services set for the AAWG were not trusted, not syncing the date and time despite the date and time being correct on the AAWG device there is a health check of NTP that takes place during boot or service startup and if the NTP is not in sync then Cassandra will not render into a running state.

[admin@aawg ~]$ ntpstat

unsynchronised

polling server every 8 s

Solution

NTP admin needs to either fix the NTP resource or try a new NTP server until criteria is validated

Some basics on NTP:

[admin@aawg ~]$ ntpq -p

remote refid st t when poll reach delay offset jitter

==============================================================================

ntp1.company. 192.168.110.100 2 u 41 64 377 26.181 4200.95 1.485

The ntpstat command will show if basic criteria met, synchronized. If unsynchronized then something is not healthy with the ntp resource. The ntpq -p command will show but if there is no * next to the remote name then the resource is not trusted for some reason. The likely reason is high root dispersion.

[admin@aawg ~]$ ntpq -nc ass

ind assid status conf reach auth condition last_event cnt

===========================================================

1 28029 9024 yes yes none reject reachable 2

[admin@aawg ~]$ ntpq

ntpq> rv 28029

associd=28029 status=9024 conf, reach, sel_reject, 2 events, reachable,

srcadr=ntp1.company.domain, srcport=123, dstadr=172.16.24.33,

dstport=123, leap=00, stratum=2, precision=-23, rootdelay=41.397,

rootdisp=10243.896, refid=192.168.110.100,

reftime=deb177e7.966eb265 Thu, May 24 2018 12:41:59.587,

rec=deb17a6e.cd5dbf44 Thu, May 24 2018 12:52:46.802, reach=037,

unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=313,

flash=400 peer_dist, keyid=0, offset=1.268, delay=25.702,

dispersion=0.925, jitter=0.764, xleave=0.083,

filtdelay= 25.70 25.76 25.70 26.08 25.97 25.96 25.67 26.14,

filtoffset= 1.27 1.00 0.76 0.62 0.44 0.40 0.22 0.38,

filtdisp= 0.00 1.02 2.01 3.03 3.89 3.92 3.95 3.98

Above shows the root dispersion is 10243.896 ms, this is too high. In basic terms the root dispersion is the longest amount of time the client and server have been out of communication at any one point during the span of the connection. If the root dispersion is too high then the client will REJECT it as a valid source.

There is also a way to get ntpd startup to essentially ignore the high root dispersion. Here's an example that corrected this problem on an AMM server:

sudo /bin/systemctl stop ntpd
sudo ntpd -gq
sudo /bin/systemctl start ntpd

The -gq flags are useful if customer is using a Windows time server

If customer does not have a good NTP server in internal network, they can try to use a global NTP server address, for example : 88.147.254.230, we have validated this global ntp server address in one of customer's system already.

Add the NTP server below in /etc/ntp.conf in AAWG server.

server 88.147.254.230 iburst maxpoll 10

restrict 88.147.254.230 mask 255.255.255.255 nomodify notrap noquery

Additional Relevant Phrases

CSACassandra status is FAILED on AAWG

Avaya -- Proprietary. Use pursuant to the terms of your signed agreement or Avaya policy