Forum Discussion

Dackey's avatar
Dackey
Level 4
5 years ago

Master Server lost connection with all the Media Server

Hello all,

The last Week all was nice,

After check the backup today, the master have lost the connection with all the media server.

On the BAR > Host Properties > Media Server, When i  try to connect to a media Server i have "Socket open failed"

Lose for Appliance and Windows Media Server.

There is no change for the Network.

I try to connect to the BAR  on a Media server, it's very long and i never see the interface completely.

I can't attached the bprd log because of the sensibility of the environnement. but the error i have found is this :

Unable to perform peer host name validation. Curl error has occurred for peer name: @IP, self name: MasterServer, nbu status = 8506, severity = 2, Additional Message: [PROXY] Encountered error (VALIDATE_PEER_HOST_PROTOCOL_RUNNING) while processing(ValidatePeerHostProtocol)., nbu status = 1, severity = 1

00:00:00.392 [12372.25432] <16> daemon_proxy_proto: vnet_proxy_socket_swap() failed: vnet status 8506, nb status 21
00:00:00.392 [12372.25432] <4> db_error_add_to_file: secure proxy protocol failed

00:02:08.580 [14876.13520] <16> vnet_proxy_protocol_from_legacy: proxy returned status: 8506 msg: {"status": 8506, "local_proxy_info": {}, "domain_constraints_set": {"process_hint": "e1ed483e-f5c0-4151-b3c5-610f74b05454", "process_hint_reason": "insecure connections are only valid for the domain of the primary master", "process_hint_server_name": 

00:02:15.627 [9888.18756] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
00:02:16.721 [20108.23712] <16> dump_proxy_info: ----process_hint_reason: insecure connections are only valid for the domain of the primary master


00:02:24.877 [8460.7528] <32> bprd: secure proxy protocol failed
00:02:24.877 [17752.23348] <16> vnet_proxy_socket_swap: vnet_proxy_protocol_from_legacy() failed: 8506
00:02:24.877 [17752.23348] <16> daemon_proxy_proto: vnet_proxy_socket_swap() failed: vnet status 8506, nb status 21
00:02:24.877 [17752.23348] <4> db_error_add_to_file: secure proxy protocol failed
00:02:24.877 [17752.23348] <32> bprd: secure proxy protocol failed

00:02:54.830 [9888.18756] <16> dump_proxy_info: ----ca_roots: e1ed483e-f5c0-4151-b3c5-610f74b05454
00:02:54.830 [9888.18756] <16> dump_proxy_info: ----ca_roots_excluded: UNCONSTRAINED

Test command line bptestbpcd -client DC.FQDN -debug -verbose :

PS F:\Veritas\NetBackup\bin\admincmd> .\bptestbpcd.exe -client DC.FQDN -debug -verbose
13:22:59.696 [4684.1100] <2> bptestbpcd: VERBOSE = 3
13:23:21.775 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:23:30.807 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:23:53.823 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:24:02.839 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:24:26.855 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:24:35.886 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:25:01.903 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:25:10.919 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:25:40.935 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:25:49.951 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c
13:25:49.951 [4684.1100] <16> connect_to_service: connect failed STATUS (18) CONNECT_FAILED status: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA pbx status: FAILED, (42) CON
NECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA vnetd
13:25:49.982 [4684.1100] <16> connect_to_service: JSON data = {"allow_large_status": {"timestamp": 1589368979, "who": "vnet_tss_init", "line_number": 32, "comment": "allow vnet status > 255", "data": true}, "direct_connect": {"timestamp": 1589368979, "who": "c
onnect_to_service", "line_number": 838, "comment": "connect parameters", "data": {"who": "vnet_connect_to_bpcd", "host": "DC.FQDN", "service": "bpcd", "override_required_interface": null, "extra_tries_on_connect": 0, "getsock_disable_to": 0, "
overide_connect_timeout": 0, "connect_options": {"server": null, "callback_kind": {"number": 1, "symbol": "NBCONF_CALLBACK_KIND_VNETD", "description": "Vnetd"}, "daemon_port_type": {"number": 0, "symbol": "NBCONF_DAEMON_PORT_TYPE_AUTOMATIC", "description": "Au
tomatic"}, "reserved_port_kind": {"number": 0, "symbol": "NBCONF_RESERVED_PORT_KIND_LEGACY", "description": "Legacy"}}}}, "status": {"timestamp": 1589369149, "who": "connect_to_service", "line_number": 985, "comment": "vnet status", "data": 18}, "connect_recs"
: {"timestamp": 1589369149, "who": "vnet_tss_get", "line_number": 97, "comment": "connect rec status messages", "data": "connect failed STATUS (18) CONNECT_FAILED\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO r
gscdc1.wsd.tadfr.thales @IP-DC bpcd VIA pbx\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA vnetd"}}
13:25:50.092 [4684.1100] <8> vnet_connect_to_bpcd: [vnet_connect.c:569] connect_to_service() failed 18 0x12
13:25:50.092 [4684.1100] <16> local_bpcr_connect: vnet_connect_to_bpcd(DC.FQDN) failed: 18
13:25:50.092 [4684.1100] <2> local_bpcr_connect: Can't connect to client DC.FQDN
13:25:50.107 [4684.1100] <2> ConnectToBPCD: bpcd_connect_and_verify(DC.FQDN, DC.FQDN) failed: 25
13:25:50.107 [4684.1100] <16> bptestbpcd main: JSON proxy message = {"allow_large_status": {"timestamp": 1589368979, "who": "vnet_tss_init", "line_number": 32, "comment": "allow vnet status > 255", "data": true}, "direct_connect": {"timestamp": 1589368979, "wh
o": "connect_to_service", "line_number": 838, "comment": "connect parameters", "data": {"who": "vnet_connect_to_bpcd", "host": "DC.FQDN", "service": "bpcd", "override_required_interface": null, "extra_tries_on_connect": 0, "getsock_disable_to"
: 0, "overide_connect_timeout": 0, "connect_options": {"server": null, "callback_kind": {"number": 1, "symbol": "NBCONF_CALLBACK_KIND_VNETD", "description": "Vnetd"}, "daemon_port_type": {"number": 0, "symbol": "NBCONF_DAEMON_PORT_TYPE_AUTOMATIC", "description
": "Automatic"}, "reserved_port_kind": {"number": 0, "symbol": "NBCONF_RESERVED_PORT_KIND_LEGACY", "description": "Legacy"}}}}, "status": {"timestamp": 1589369149, "who": "connect_to_service", "line_number": 985, "comment": "vnet status", "data": 18}, "connect
_recs": {"timestamp": 1589369150, "who": "vnet_tss_get", "line_number": 97, "comment": "connect rec status messages", "data": "connect failed STATUS (18) CONNECT_FAILED\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.
0 TO DC.FQDN @IP-DC bpcd VIA pbx\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA vnetd"}}
<16>bptestbpcd main: Function ConnectToBPCD(DC.FQDN) failed: 25
13:25:50.217 [4684.1100] <16> bptestbpcd main: Function ConnectToBPCD(DC.FQDN) failed: 25
<16>bptestbpcd main: cannot connect on socket
13:25:50.248 [4684.1100] <16> bptestbpcd main: cannot connect on socket
<2>bptestbpcd: cannot connect on socket
13:25:50.248 [4684.1100] <2> bptestbpcd: cannot connect on socket
<2>bptestbpcd: EXIT status = 25
13:25:50.248 [4684.1100] <2> bptestbpcd: EXIT status = 25
cannot connect on socket

I continue to search, thx for your help ^^

Command line nbemmcmd -listhosts -verbose :

For the MasterServer the option MachineState  = active for disk jobs (12)

For the other :

MachineState  = no reachable by master

MachineState  =not active

MachineState  = no reachable by master

MachineState  = no reachable by master

 

  • Did you restart nbwmc after performing the steps in the TechNote? You show one media server as 'Not Active', but the others are all shown as unreachable. What is different about that media server vs. the others?

    Either way, the certificate (either Tomcat or web service, perhaps both) has expired on your master server. You need to run the commands from the TechNote you linked on the master server, not the media servers.

    If you're in a non-English locale, look at this TechNote.

    Once you renew the certificate on the master server, you should be able to restart services on the media servers and have them reconnect. If you're still have issues, run the below commands to override the existing certificate with a new one.

    On the media server, run:

    nbcertcmd -getCACertificate

    nbcertcmd -getCertificate -force

    From there you'll get prompted for a reissue token

    From your Admin Console, go to Host Management > right-click the media server > Generate Reissue Token

    Paste the reissue token into the prompt from where you ran the nbcertcmd -getCertificate -force command

    Once the certificate is reissued, restart services on the media server and you should now see it active.

  • Did you restart nbwmc after performing the steps in the TechNote? You show one media server as 'Not Active', but the others are all shown as unreachable. What is different about that media server vs. the others?

    Either way, the certificate (either Tomcat or web service, perhaps both) has expired on your master server. You need to run the commands from the TechNote you linked on the master server, not the media servers.

    If you're in a non-English locale, look at this TechNote.

    Once you renew the certificate on the master server, you should be able to restart services on the media servers and have them reconnect. If you're still have issues, run the below commands to override the existing certificate with a new one.

    On the media server, run:

    nbcertcmd -getCACertificate

    nbcertcmd -getCertificate -force

    From there you'll get prompted for a reissue token

    From your Admin Console, go to Host Management > right-click the media server > Generate Reissue Token

    Paste the reissue token into the prompt from where you ran the nbcertcmd -getCertificate -force command

    Once the certificate is reissued, restart services on the media server and you should now see it active.

    • Dackey's avatar
      Dackey
      Level 4

      My reply have been deleted by the moderator...

      In this reply i haved found the solution.

      And it's exactly like you said.

      So thank's so much to you.

      • Marianne's avatar
        Marianne
        Level 6

        Dackey 

        I found your post in the Spam Quarantine - this is automatically done by the spam filter. 

        I tried to mark it as 'Not Spam', but received ' Request Entity Too Large ' error.

        So, no idea what you tried to post. Just bear in mind if any kind of output is fairly big, best to put it in a text file and then post as attachment.