Forum Discussion

tanislavm's avatar
tanislavm
Level 6
10 years ago

gco fencing

hi,

i think that vxfencing could be setup only within evey cluster from every site.i am not aware if vxfencing could be setup as global fencing.suppose i have a node in site 1 and other node in site 2,and a failover group uses node1 or node2.if the heartbeat link between the node1 and node2 fails,then a split brain happens.then how this issue is solved not to corrupt data?

in my opinion the advantage of gco over classic disaster recovery is the speed of recovery,i mean the failover groups will failover fast on the other site if a site goes down.right?

tnx a lot.

  • Hi,

    Scope of VxFen is limited to nodes within same cluster and cannot be used across sites/clusters in GCO. In this case, Node1 and Node2 belongs to 2 different sites/clusters. In case of communication failure between the node1 and node2, VxFen cannot be used. For addressing split-brain between clusters in GCO, VCS uses Steward process.

     

     

                Steward process to minimize chances of a wide-area split-brain

    Failure of all heartbeats between any two clusters in a global cluster indicates one of the following:

    1. The remote cluster is faulted.
    2. All communication links between the two clusters are broken.

    In a two-cluster setup, VCS uses the Steward process to minimize chances of a wide-area split-brain. The process runs as a standalone binary on a system outside of the global cluster configuration. Figure depicts the Steward process to minimize chances of a split brain within a two-cluster setup.

    Steward_0.jpg

    When all communication links between any two clusters are lost, each cluster contacts the Steward with an inquiry message. The Steward sends an ICMP ping to the cluster in question and responds with a negative inquiry if the cluster is running or with positive inquiry if the cluster is down. The Steward can also be used in configurations with more than two clusters.

    More details about GCO configuration and Steward process’s configuration is available at: https://sort.symantec.com/public/documents/sf/5.1/aix/html/vcs_admin/ch_vcs_globalcluster22.html

     

     

    Thanks & Regards,

    Sunil Y

  • GCO is usually configured as manually initiated, but it is still quicker as VCS does all the work, you just confirm for VCS to go ahead.  You can configure as automatic if you use a Steward, but I wouldn't recommend unless you really have an independent link to your steward as generally if you loose connection between Prod and DR site you will often loose connection to your Steward as well.

    Configuring ClusterFailOverPolicy as "Connected" is another option so that if a service groups fails on all systems in its system list then if one node is up in the cluster, then failover is atuomatic and as one node is up, you know splt-brain is not an issue.  

    So for example if you configure clusters as in your other post where failoversg runs on nodea in cluster1 and fails over to nodec in cluster2, then if nodea fails and nodeb in cluster1 is up, then nodeb knows nodea is actually down (as should have at least 2 independent LLT links between nodea and nodeb and you could configured local fencing also) and so with ClusterFailOverPolicy as "Connected", then failoversg will automatically fails to nodec in cluster2.

    Mike

     

  • Hi,

    Scope of VxFen is limited to nodes within same cluster and cannot be used across sites/clusters in GCO. In this case, Node1 and Node2 belongs to 2 different sites/clusters. In case of communication failure between the node1 and node2, VxFen cannot be used. For addressing split-brain between clusters in GCO, VCS uses Steward process.

     

     

                Steward process to minimize chances of a wide-area split-brain

    Failure of all heartbeats between any two clusters in a global cluster indicates one of the following:

    1. The remote cluster is faulted.
    2. All communication links between the two clusters are broken.

    In a two-cluster setup, VCS uses the Steward process to minimize chances of a wide-area split-brain. The process runs as a standalone binary on a system outside of the global cluster configuration. Figure depicts the Steward process to minimize chances of a split brain within a two-cluster setup.

    Steward_0.jpg

    When all communication links between any two clusters are lost, each cluster contacts the Steward with an inquiry message. The Steward sends an ICMP ping to the cluster in question and responds with a negative inquiry if the cluster is running or with positive inquiry if the cluster is down. The Steward can also be used in configurations with more than two clusters.

    More details about GCO configuration and Steward process’s configuration is available at: https://sort.symantec.com/public/documents/sf/5.1/aix/html/vcs_admin/ch_vcs_globalcluster22.html

     

     

    Thanks & Regards,

    Sunil Y

  • GCO is usually configured as manually initiated, but it is still quicker as VCS does all the work, you just confirm for VCS to go ahead.  You can configure as automatic if you use a Steward, but I wouldn't recommend unless you really have an independent link to your steward as generally if you loose connection between Prod and DR site you will often loose connection to your Steward as well.

    Configuring ClusterFailOverPolicy as "Connected" is another option so that if a service groups fails on all systems in its system list then if one node is up in the cluster, then failover is atuomatic and as one node is up, you know splt-brain is not an issue.  

    So for example if you configure clusters as in your other post where failoversg runs on nodea in cluster1 and fails over to nodec in cluster2, then if nodea fails and nodeb in cluster1 is up, then nodeb knows nodea is actually down (as should have at least 2 independent LLT links between nodea and nodeb and you could configured local fencing also) and so with ClusterFailOverPolicy as "Connected", then failoversg will automatically fails to nodec in cluster2.

    Mike

     

  • hi,

    i like to have also any comments on advantages and disadvantages of gco over disaster recovery.tnx a lot.

  • Not sure what you are asking.... GCO is used for disaster recovery.
  • hi, classic DR is when we have 2 sites far apart,one site is production and the other one is only in standby.

    gco as i see uses the both sites as production,so if one site goes down,i think the service group failover faster then in the above mentioned. 

  • Atleast 2 VCS cluster together form VCS Global Cluster Option, which is popularly known as GCO. GCO is a mean to achieve Disaster Recovery(DR). Where a cluster is Production/Primary cluster and other clusters act a DR/Standby. In case of Service Group/Cluster fault, Service group failover from Production/Primary to DR/Standby.

     

    GCO and DR are kind of same thing. GCO is a mean to achieve Disaster Recovery(DR).

     

    Observation/finding on faster failover doesn’t seem factual.

     

    Thanks & Regards,
    Sunil Y