Master Server lost connection with all the Media Server
Hello all,
The last Week all was nice,
After check the backup today, the master have lost the connection with all the media server.
On the BAR > Host Properties > Media Server, When i try to connect to a media Server i have "Socket open failed"
Lose for Appliance and Windows Media Server.
There is no change for the Network.
I try to connect to the BAR on a Media server, it's very long and i never see the interface completely.
I can't attached the bprd log because of the sensibility of the environnement. but the error i have found is this :
Unable to perform peer host name validation. Curl error has occurred for peer name: @IP, self name: MasterServer, nbu status = 8506, severity = 2, Additional Message: [PROXY] Encountered error (VALIDATE_PEER_HOST_PROTOCOL_RUNNING) while processing(ValidatePeerHostProtocol)., nbu status = 1, severity = 1 00:00:00.392 [12372.25432] <16> daemon_proxy_proto: vnet_proxy_socket_swap() failed: vnet status 8506, nb status 21 00:00:00.392 [12372.25432] <4> db_error_add_to_file: secure proxy protocol failed 00:02:08.580 [14876.13520] <16> vnet_proxy_protocol_from_legacy: proxy returned status: 8506 msg: {"status": 8506, "local_proxy_info": {}, "domain_constraints_set": {"process_hint": "e1ed483e-f5c0-4151-b3c5-610f74b05454", "process_hint_reason": "insecure connections are only valid for the domain of the primary master", "process_hint_server_name": 00:02:15.627 [9888.18756] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 00:02:16.721 [20108.23712] <16> dump_proxy_info: ----process_hint_reason: insecure connections are only valid for the domain of the primary master 00:02:24.877 [8460.7528] <32> bprd: secure proxy protocol failed 00:02:24.877 [17752.23348] <16> vnet_proxy_socket_swap: vnet_proxy_protocol_from_legacy() failed: 8506 00:02:24.877 [17752.23348] <16> daemon_proxy_proto: vnet_proxy_socket_swap() failed: vnet status 8506, nb status 21 00:02:24.877 [17752.23348] <4> db_error_add_to_file: secure proxy protocol failed 00:02:24.877 [17752.23348] <32> bprd: secure proxy protocol failed 00:02:54.830 [9888.18756] <16> dump_proxy_info: ----ca_roots: e1ed483e-f5c0-4151-b3c5-610f74b05454 00:02:54.830 [9888.18756] <16> dump_proxy_info: ----ca_roots_excluded: UNCONSTRAINED
Test command line bptestbpcd -client DC.FQDN -debug -verbose :
PS F:\Veritas\NetBackup\bin\admincmd> .\bptestbpcd.exe -client DC.FQDN -debug -verbose 13:22:59.696 [4684.1100] <2> bptestbpcd: VERBOSE = 3 13:23:21.775 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:23:30.807 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:23:53.823 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:24:02.839 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:24:26.855 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:24:35.886 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:25:01.903 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:25:10.919 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:25:40.935 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:25:49.951 [4684.1100] <8> async_connect: [vnet_connect.c:2085] getsockopt SO_ERROR returned 10060 0x274c 13:25:49.951 [4684.1100] <16> connect_to_service: connect failed STATUS (18) CONNECT_FAILED status: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA pbx status: FAILED, (42) CON NECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA vnetd 13:25:49.982 [4684.1100] <16> connect_to_service: JSON data = {"allow_large_status": {"timestamp": 1589368979, "who": "vnet_tss_init", "line_number": 32, "comment": "allow vnet status > 255", "data": true}, "direct_connect": {"timestamp": 1589368979, "who": "c onnect_to_service", "line_number": 838, "comment": "connect parameters", "data": {"who": "vnet_connect_to_bpcd", "host": "DC.FQDN", "service": "bpcd", "override_required_interface": null, "extra_tries_on_connect": 0, "getsock_disable_to": 0, " overide_connect_timeout": 0, "connect_options": {"server": null, "callback_kind": {"number": 1, "symbol": "NBCONF_CALLBACK_KIND_VNETD", "description": "Vnetd"}, "daemon_port_type": {"number": 0, "symbol": "NBCONF_DAEMON_PORT_TYPE_AUTOMATIC", "description": "Au tomatic"}, "reserved_port_kind": {"number": 0, "symbol": "NBCONF_RESERVED_PORT_KIND_LEGACY", "description": "Legacy"}}}}, "status": {"timestamp": 1589369149, "who": "connect_to_service", "line_number": 985, "comment": "vnet status", "data": 18}, "connect_recs" : {"timestamp": 1589369149, "who": "vnet_tss_get", "line_number": 97, "comment": "connect rec status messages", "data": "connect failed STATUS (18) CONNECT_FAILED\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO r gscdc1.wsd.tadfr.thales @IP-DC bpcd VIA pbx\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA vnetd"}} 13:25:50.092 [4684.1100] <8> vnet_connect_to_bpcd: [vnet_connect.c:569] connect_to_service() failed 18 0x12 13:25:50.092 [4684.1100] <16> local_bpcr_connect: vnet_connect_to_bpcd(DC.FQDN) failed: 18 13:25:50.092 [4684.1100] <2> local_bpcr_connect: Can't connect to client DC.FQDN 13:25:50.107 [4684.1100] <2> ConnectToBPCD: bpcd_connect_and_verify(DC.FQDN, DC.FQDN) failed: 25 13:25:50.107 [4684.1100] <16> bptestbpcd main: JSON proxy message = {"allow_large_status": {"timestamp": 1589368979, "who": "vnet_tss_init", "line_number": 32, "comment": "allow vnet status > 255", "data": true}, "direct_connect": {"timestamp": 1589368979, "wh o": "connect_to_service", "line_number": 838, "comment": "connect parameters", "data": {"who": "vnet_connect_to_bpcd", "host": "DC.FQDN", "service": "bpcd", "override_required_interface": null, "extra_tries_on_connect": 0, "getsock_disable_to" : 0, "overide_connect_timeout": 0, "connect_options": {"server": null, "callback_kind": {"number": 1, "symbol": "NBCONF_CALLBACK_KIND_VNETD", "description": "Vnetd"}, "daemon_port_type": {"number": 0, "symbol": "NBCONF_DAEMON_PORT_TYPE_AUTOMATIC", "description ": "Automatic"}, "reserved_port_kind": {"number": 0, "symbol": "NBCONF_RESERVED_PORT_KIND_LEGACY", "description": "Legacy"}}}}, "status": {"timestamp": 1589369149, "who": "connect_to_service", "line_number": 985, "comment": "vnet status", "data": 18}, "connect _recs": {"timestamp": 1589369150, "who": "vnet_tss_get", "line_number": 97, "comment": "connect rec status messages", "data": "connect failed STATUS (18) CONNECT_FAILED\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0. 0 TO DC.FQDN @IP-DC bpcd VIA pbx\n\tstatus: FAILED, (42) CONNECT_REFUSED; system: (10060) Connection timed out.; FROM 0.0.0.0 TO DC.FQDN @IP-DC bpcd VIA vnetd"}} <16>bptestbpcd main: Function ConnectToBPCD(DC.FQDN) failed: 25 13:25:50.217 [4684.1100] <16> bptestbpcd main: Function ConnectToBPCD(DC.FQDN) failed: 25 <16>bptestbpcd main: cannot connect on socket 13:25:50.248 [4684.1100] <16> bptestbpcd main: cannot connect on socket <2>bptestbpcd: cannot connect on socket 13:25:50.248 [4684.1100] <2> bptestbpcd: cannot connect on socket <2>bptestbpcd: EXIT status = 25 13:25:50.248 [4684.1100] <2> bptestbpcd: EXIT status = 25 cannot connect on socket
I continue to search, thx for your help ^^
Command line nbemmcmd -listhosts -verbose :
For the MasterServer the option MachineState = active for disk jobs (12)
For the other :
MachineState = no reachable by master
MachineState =not active
MachineState = no reachable by master
MachineState = no reachable by master
Did you restart nbwmc after performing the steps in the TechNote? You show one media server as 'Not Active', but the others are all shown as unreachable. What is different about that media server vs. the others?
Either way, the certificate (either Tomcat or web service, perhaps both) has expired on your master server. You need to run the commands from the TechNote you linked on the master server, not the media servers.
If you're in a non-English locale, look at this TechNote.
Once you renew the certificate on the master server, you should be able to restart services on the media servers and have them reconnect. If you're still have issues, run the below commands to override the existing certificate with a new one.
On the media server, run:
nbcertcmd -getCACertificate
nbcertcmd -getCertificate -force
From there you'll get prompted for a reissue token
From your Admin Console, go to Host Management > right-click the media server > Generate Reissue Token
Paste the reissue token into the prompt from where you ran the nbcertcmd -getCertificate -force command
Once the certificate is reissued, restart services on the media server and you should now see it active.