Forum Discussion

br1's avatar
br1
Level 3
12 years ago

VCS Cluster not starting.

Hello All,
I am having difficulties trying to get VCS started on this system.
I have attached what I have got so far. I apperciate any comments or suggestions as to go from here.
Thank you

The hostnames in the main.cf corrosponds to that of the servers.

hastatus -sum
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available

hasys -state
VCS ERROR V-16-1-10600 Cannot connect to VCS engine

hastop -all -force
VCS ERROR V-16-1-10600 Cannot connect to VCS engine

hastart   / hastart -onenode
dmesg: Exiting: Another copy of VCS may be running


engine_A.log
2013/10/22 15:16:43 VCS NOTICE V-16-1-11051 VCS engine join version=4.1000
2013/10/22 15:16:43 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 03/03/05-14:58:00
2013/10/22 15:16:43 VCS NOTICE V-16-1-10114 Opening GAB library
2013/10/22 15:16:43 VCS NOTICE V-16-1-10619 'HAD' starting on: db1
2013/10/22 15:16:45 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2013/10/22 15:17:00 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding

#gabconfig -a
GAB Port Memberships
===============================================================

#lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
   * 0 db1          OPEN
                                  bge1   UP      00:03:BA:15
                                  bge2   UP      00:03:BA:15
     1 db2          CONNWAIT
                                  bge1   DOWN
                                  bge2   DOWN

bash-2.05$ lltconfig
LLT is running

ps -ef | grep had
    root   826     1  0 15:16:43 ?        0:00 /opt/VRTSvcs/bin/had
    root   836     1  0 15:16:45 ?        0:00 /opt/VRTSvcs/bin/hashadow
 

  • If only one of two nodes can connect through llt (see your lltstat -nvv where one node is present and the other is down) then the cluster will attempt to start but will wait for both nodes to be available.

    This is done to ensure in a heartbeat disconnection scenario or split-brain condition that you do not have 2 seperate clusters starting.

    If this is a known condition, you can run the command

    # gabconfig -C -X

    This removes the number of nodes needed to seed a cluster, but this command should only be performed if you are certain the other node does not already have a running cluster.  You should also diagnose why the other nodes' heartbeat links are not visable from llt.

     

  • If only one of two nodes can connect through llt (see your lltstat -nvv where one node is present and the other is down) then the cluster will attempt to start but will wait for both nodes to be available.

    This is done to ensure in a heartbeat disconnection scenario or split-brain condition that you do not have 2 seperate clusters starting.

    If this is a known condition, you can run the command

    # gabconfig -C -X

    This removes the number of nodes needed to seed a cluster, but this command should only be performed if you are certain the other node does not already have a running cluster.  You should also diagnose why the other nodes' heartbeat links are not visable from llt.

     

  • That command worked, VCS Server came up online.
    Currently there is network issues with the secondary server.

    out of curisourity, hastart -onenode didnt work..

    Thank you very much.

  • From the error you posted:

    hastart   / hastart -onenode
    dmesg: Exiting: Another copy of VCS may be running

    ... unless you killed the previous version of had/hastart (that would have started automatically on boot and was probably still running waiting for gab membership from the other node(s)), it looks like hastart -onenode failed as it found the other copy of had already running.

    Note the following from the man page:

    -onenode

    Use this option only to start VCS on a single system where LLT and GAB are not required. Do not use this option to start VCS on a node in a multisystem cluster.

    So you shouldn't be using this option to start had in a cluster with more than one node - use the gabconfig -c -x procedure provided by AHerr if there are known LLT/network issues (note you need to ensure the second/other node is definitely down or you may end up with some issues eg: a split brain cluster)