Forum Discussion

c0derx's avatar
c0derx
Level 3
9 years ago

unsuccessful cluster failover occured because of nic faulted

Hello, 

Platform: solaris 11

Logs: 

Jun 23 16:36:49 nodeA Had[5211]: [ID 702911 daemon.notice] VCS ERROR V-16-1-54031 Resource csgnic (Owner: Unspecified, Group: ClusterService) is FAULTED on sys nodeA

Jun 23 16:37:49 nodeA Had[5211]: [ID 702911 daemon.notice] VCS ERROR V-16-1-54031 Resource nic_proxy_aggr1 (Owner: Unspecified, Group: oracle) is FAULTED on sys nodeA

 

Question 1)

I need more detail about the problem. I tried to check /var/log/messages, /var/adm/messages, /var/fm/fmd/* files and I can’t see anything related with this error. 

Which logs should be checked on the solaris 11 system for this situation?

Question 2)

What kind of method do you advise to investigate the nic problem for getting more information on platform?

Question 3) 

What kind of configuration should I do for handling with nic failures?

  • Did you try looking at the system logs like /var/adm/messages around the time when VCS complainted about resoruce is fault. Typically VCS would pick up faults if there is any actual fault  with the underlying resource. Did you get in touch with OS vendor?

    I dont think a detailed RCA can be possible in this forum, since it would require loads of evidences and thorough investigation, a better route would be to get in touch with technical support with requested evidences.

  • Did you try looking at the system logs like /var/adm/messages around the time when VCS complainted about resoruce is fault. Typically VCS would pick up faults if there is any actual fault  with the underlying resource. Did you get in touch with OS vendor?

    I dont think a detailed RCA can be possible in this forum, since it would require loads of evidences and thorough investigation, a better route would be to get in touch with technical support with requested evidences.

  • It is not new setup, it is a production system. We brought resource online. It is working fine but sometimes i see:,

    Jun 23 16:36:49 nodeA Had[5211]: [ID 702911 daemon.notice] VCS ERROR V-16-1-54031 Resource csgnic (Owner: Unspecified, Group: ClusterService) is FAULTED on sys nodeA

    Jun 23 16:37:49 nodeA Had[5211]: [ID 702911 daemon.notice] VCS ERROR V-16-1-54031 Resource nic_proxy_aggr1 (Owner: Unspecified, Group: oracle) is FAULTED on sys nodeA

    on logs.

  • Is this a new setup or was it working earlier?

    Did you try to clear the fault and bring the resource online?

  • There is no failed state on ifconfig -a output. I can't copy / paste easily.

    And it happens sometimes.

     

  • do you see any failed flag in the ifconfig output? You will need to paste the entire output for us to figure that out.

  • nic name: aggr1

    OS version: SunOS nodeA 5.11 11.2 sun4v sparc sun4v

    VRTSvcs version: 6.2.1.0,REV=6.2.1.0

  • Run the following commands and give us the output:

    ifconfig -a on nodeA (since the nic has faulted on nodeA as per logs above)?

    uname -a

    pkginfo -l VRTSvcs