Log Insight 2.5 - The worker node sending this alert was unable to contact the standalone node

I have identified an issue in Log Insight 2.5 where alerts passed via email or to vROPS contain the following text in the message:

“Notification event – The worker node sending this alert was unable to contact the standalone node. You may receive duplicate notifications for this alert.”

I also confirmed that DNS resolution and reverse lookup functions are working as expected. I was also able to reproduce this issue successfully in a lab environment, with DNS working correctly.

The following information was collected from the lab environment:

<LOGINSIGHTNODE>/storage/var/loginsight/runtime.log shows:

[2015-09-16 17:04:07.981+0000] [ScheduledQueryServiceThread/192.168.1.33 ERROR] [com.vmware.loginsight.notifications.AlertNotifier] [Failed to send alert to standalone, 2 retries remaining.]
        at com.vmware.loginsight.notifications.AlertNotifier.relayToMaster(AlertNotifier.java:181)
        at com.vmware.loginsight.notifications.AlertNotifier.sendAlertNotification(AlertNotifier.java:127)
        at com.vmware.loginsight.notifications.AlertNotifier.sendAlertNotification(AlertNotifier.java:98)
        at com.vmware.loginsight.web.background.ScheduledQueryService$ScheduledQueryServiceImpl.searchAndRaiseAlertIfNeeded(ScheduledQueryService.java:372)

[2015-09-16 17:04:07.982+0000] [ScheduledQueryServiceThread/192.168.1.33 ERROR] [com.vmware.loginsight.notifications.AlertNotifier] [Failed to send alert to standalone, 1 retries remaining.]
        at com.vmware.loginsight.notifications.AlertNotifier.relayToMaster(AlertNotifier.java:181)
        at com.vmware.loginsight.notifications.AlertNotifier.sendAlertNotification(AlertNotifier.java:127)
        at com.vmware.loginsight.notifications.AlertNotifier.sendAlertNotification(AlertNotifier.java:98)
        at com.vmware.loginsight.web.background.ScheduledQueryService$ScheduledQueryServiceImpl.searchAndRaiseAlertIfNeeded(ScheduledQueryService.java:372)

[2015-09-16 17:04:07.983+0000] [ScheduledQueryServiceThread/192.168.1.33 ERROR] [com.vmware.loginsight.notifications.AlertNotifier] [Failed to send alert to standalone, 0 retries remaining.]
        at com.vmware.loginsight.notifications.AlertNotifier.relayToMaster(AlertNotifier.java:181)
        at com.vmware.loginsight.notifications.AlertNotifier.sendAlertNotification(AlertNotifier.java:127)
        at com.vmware.loginsight.notifications.AlertNotifier.sendAlertNotification(AlertNotifier.java:98)
       at com.vmware.loginsight.web.background.ScheduledQueryService$ScheduledQueryServiceImpl.searchAndRaiseAlertIfNeeded(ScheduledQueryService.java:372)

[2015-09-16 17:04:07.984+0000] [ScheduledQueryServiceThread/192.168.1.33 INFO] [com.vmware.loginsight.notifications.AlertNotifier] [Could not connect to Master, sending alert notifications directly.]

The original Log Insight configuration file contains:

(File is located at: /storage/core/loginsight/config/loginsight-config.xml#33)

  <distributed overwrite-children="true">
    <daemon host="vrlimn01.spiesr.com" port="16520" token="d015a445-76c0-42a4-807c-c68f1485642c">
      <service-group name="standalone" />
    </daemon>
    <daemon host="192.168.1.34" port="16520" token="f3a3d23d-8d37-4e15-a4ee-451044841cbd">
      <service-group name="workernode" />
    </daemon>
  </distributed>

The configuration file contains the FQDN for the master/standalone node when the cluster was created by joining a new data worker node to the cluster using the UI. However, it looks as if there is a bug where the alert thread fails to successfully resolve the host name in DNS, even if DNS is configured and working properly. Strangely, most of the errors logged in our environment is logged by the master(standalone) node, indicating that it was unable to contact the standalone node (itself!).

The really strange thing is, I've been through all of the documentation and also searched high and low on the internet for a fix, yet I was unable to find anyone else who had documented this issue prior. Also, none of the VMware PSO and GSS staff working alongside us on site had seen the issue before. So, I had to go away and do some testing.

Following on from some testing and digging, in order to fix the issue, I found that when the “standalone” node address is changed from the FQDN to the node’s IP address, the issue goes away.

The fix:

1. Using the Admin UI, place the worker nodes in maintenance mode
2. Stop the Log Insight service on each of the worker nodes
service loginsight stop

3. stop the Log Insight service on the master node
service loginsight stop

4. On each node change the configuration in the configuration file for the standalone node from the FQDN to the IP address
vi /storage/core/loginsight/config/loginsight-config.xml#NN

(Where NN matches the file with the highest number)

<distributed overwrite-children="true">
    <daemon host="192.168.1.33" port="16520" token="d015a445-76c0-42a4-807c-c68f1485642c">
      <service-group name="standalone" />
    </daemon>
    <daemon host="192.168.1.34" port="16520" token="f3a3d23d-8d37-4e15-a4ee-451044841cbd">
      <service-group name="workernode" />
    </daemon>
  </distributed>

5. Start the Log Insight service on the mater node
service loginsight start

6. Start the Log Insight service on each worker node
service loginsight start

7. With the changes made to the configuration file and the services restarted, log into the admin UI and re-apply the license key by removing the licence key and re-adding the key back into Log Insight

The cluster status in the UI view will show (I know I only have 2 nodes where 3 is recommended, but this was just a quick test):

Messages now arrive as:

Errors in the runtime event logs relating to the “standalone” node not being contactable no longer appear.

[UPDATE 18 September 2015 13:52 BST]:

VMware has now confirmed that this change is supported. I expect VMware to release a KB article for this issue soon.

Log Insight 2.5 - The worker node sending this alert was unable to contact the standalone node

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List