Hot Standby Fault Failover




Overview

Web NMS has two BE servers :

The Primary and Standby BE servers are redundant configurations designed to serve the same functionality. They both have access to the same database. When the primary server fails or is brought down, the standby server takes over the functions that were being performed by the primary. The primary server may be brought down for scheduled maintenance.

The standby server of Web NMS offers warm standby support.  Any operation or request in the Network during the intervening period (i.e., the time period between the failure of primary and the subsequent complete take over by standby) will be lost. Also during this failover period, the critical traps/notification that are sent from the Agents could be missed. To avoid the loss of notifications you can enable the hot standby fault failover in standby server.

For more information on Failover mechanism in WebNMS, look up Failover Service in the Web NMS Developer Guide.
 

Working Mechanism

          The primary server failure identification should be based on the user environment. Hence we provide a framework where the customers can plug in their own failure identification mechanism. This is because, the sooner you identify the primary failure, the lesser is the failover period. We provide a default implementation for failure identification (based on the TCP communication failure mechanism) to identify the primary server failure. This default implementation (FailOverTransmitter.java) is available under the <Web NMS Home>/default_impl/failover directory.

The user implementation should extend the abstract class com.adventnet.nms.util.AbstractFailOverTransmitter and provide the implementation for failover identification.The entry for the implementation should be given in FailOver.xml file and in SERVER_PARAMS tag as shown below.

        <SERVER_PARAMS>
   <PARAM NAME="HOTSTANDBY_MONITOR" VALUE="test.UserImplementation"/>
   </SERVER_PARAMS>

          To start your own module as hot standby, make a PROCESS entry in FailOver.xml file in the HOTSTANDY_PROCESSES tag as shown below. To get prior and post notification about failover, the process should extend com.adventnet.nms.util.HotStandbyListener.

        <HOTSTANDBY_PROCESSES>
   <PROCESS_NAME="test.MyProcess" />
   </HOTSTANDBY_PROCESSES>
 

How to setup Hot Standby Fault Failover ?

To enable hot standby fault failover, follow the steps given below.

Step 1

Start one server in primary mode. The FailOver.xml file contains tags for starting the server in primary or standby mode. For starting the server in primary mode (as primary server) , entries in FailOver.xml should be provided as below.

Note: These are the entries in the default implementation we have provided. If you write your own implementation for primary server failure, then the implementation class name should be given in the PARAM tag.
 

<FAILOVER> 
     <PRIMARY HEART_BEAT_INTERVAL="60" /> 
     <SERVER_PARAMS>
        <PARAM NAME="HOTSTANDBY_MONITOR" VALUE="com.adventnet.nms.example.FailOverTransmitter"/>
        <PARAM NAME="MONITOR_PORT" VALUE="2014"/>
     </SERVER_PARAMS> 
</FAILOVER>

Step 2

Start the other server in standby mode (as standby server).  To start the server in standby mode with fault hot standby , entries in the FailOver.xml should be provided as below.

Note: These are the entries in the default implementation we have provided. If you write your own implementation for primary server failure, then the implementation class name should be given in the PARAM tag.
 

<FAILOVER>
        <STANDBY FAIL_OVER_INTERVAL="60" 
                 RETRY_COUNT="0">
          <BACKUP ENABLED="TRUE"
                  BACKUP_INTERVAL="600" />
             <SEND_EMAIL SMTP_SERVER="mail-server1"
                        TO_ADDRESS="nms-support@india.adventnet.com"
                        FROM_ADDRESS="nms-support@india.adventnet.com"
                        SUBJECT="Web NMS Primary Server Failed"
                        BODY="The Web NMS Back End Server has failed and is taken over by the HotStandBy Server"/>
                <HOTSTANDBY_PROCESSES>
                   <PROCESS NAME="com.adventnet.nms.util.RunRmiRegistry"/> 
                    <PROCESS NAME="com.adventnet.nms.ms.NMSMServer" />
                    <PROCESS NAME="com.adventnet.nms.eventdb.EventMgr" />
                   <PROCESS NAME="com.adventnet.nms.eventdb.tl1.TL1EventProcess" />
                </HOTSTANDBY_PROCESSES>
       </STANDBY> 
       <SERVER_PARAMS>
          <PARAM NAME="HOTSTANDBY_MONITOR" VALUE="com.adventnet.nms.example.FailOverTransmitter"/>
          <PARAM NAME="MONITOR_PORT" VALUE="2014"/>
       </SERVER_PARAMS> 
</FAILOVER>

When this is done, hot standby failover support for Fault will be successfully enabled.

How to show the received alarms in the Client during Failover period ?

We provide the default implementation for showing the received alarms during failover period in the WebNMS Client. The default implementation is available under <Web NMS Home>/default_impl/failover directory, and is explained below.

The FlashAlarmBrowser will act as a FailOverListener. Hence, when the primary server fails, it will add the FlashAlarmPanel dynamically into the tree under Alarms view. This FlashAlarmPanel will act as an AlertListener. Hence whenever an alert is added or updated, it will be notified. Consequently it will update the alarms in the client List View. After the standby server has taken over, FlashAlarmPanel will be removed from tree.

The following steps need to be carried out at both primary and standby server for enabling the default implementation :

Entry in Configuration Files

          <PANEL className="com.adventnet.nms.alertui.FlashAlarmBrowser" TABLE-SELECTION-BGCOLOR="153-153-153" PANEL-KEY="FailOverAlarms" />

        ENABLE_HOTSTANDBY="true"

Update the client jar

Go to <Web NMS Home>/classes directory and execute the following commands to update NmsClientClasses.jar.

jar -uvf NmsClientClasses.jar com/adventnet/nms/alertui/*.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/alertdb/AlertAPIImpl_Stub.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/alertdb/Alert.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/alertdb/AlertActionInformer.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/alertdb/AlertAnnotation.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/alertdb/AlertListener.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/alertdb/AlertObserver.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/fault/FaultException.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/util/TimeoutException.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/util/AccessDeniedException.class
jar -uvf NmsClientClasses.jar com/adventnet/management/transaction/UserTransactionException.class
jar -uvf NmsClientClasses.jar com/adventnet/nms/admin/InvalidParameterException.class


Copyright © 1996-2004, AdventNet Inc. All Rights Reserved.