Forum Discussion

rookie11's avatar
rookie11
Moderator
10 years ago

vmware backup failure error code 156

Hello experts

NBU master os win2008, nbu version 7.6.0.1, vmware backups goes to NBU appliance 5220. around 40 vm machines are backedup from a policy. its details are: policy type flashbackup-windows, clients selected manually from vcenter, backup selection all local drives, perform offhost backup is ON (media server). Only 1 vm server  failed on past various attempts. 

GUI logs: 

06/03/2015 20:19:41 - Info nbjm (pid=3184) starting backup job (jobid=87915) for client TOYFIN, policy VMWAREPOLICY, schedule Full

06/03/2015 20:19:41 - Info nbjm (pid=3184) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=87915, request id:{721BEA25-7184-46DF-A972-86FC0B0FC5E5})

06/03/2015 20:19:41 - requesting resource STUNBUAPP

06/03/2015 20:19:41 - requesting resource MASTERSERVER.NBU_CLIENT.MAXJOBS.TOYFIN

06/03/2015 20:19:41 - requesting resource MASTERSERVER.NBU_POLICY.MAXJOBS.VMWAREPOLICY

06/03/2015 20:19:41 - Info nbrb (pid=2796) Limit has been reached for the logical resource MASTERSERVER.NBU_POLICY.MAXJOBS.VMWAREPOLICY

06/03/2015 20:27:24 - granted resource  MASTERSERVER.NBU_CLIENT.MAXJOBS.TOYFIN

06/03/2015 20:27:24 - granted resource  MASTERSERVER.NBU_POLICY.MAXJOBS.VMWAREPOLICY

06/03/2015 20:27:24 - granted resource  MediaID=@aaaah;DiskVolume=PureDiskVolume;DiskPool=DISK_NBUAPP;Path=PureDiskVolume;StorageServer=NBUAPP;MediaServer=NBUAPP

06/03/2015 20:27:24 - granted resource  STUNBUAPP

06/03/2015 20:27:24 - estimated 66933276 kbytes needed

06/03/2015 20:27:24 - begin Parent Job

06/03/2015 20:27:24 - begin Flash Backup Windows: Start Notify Script

06/03/2015 20:27:24 - Info RUNCMD (pid=10760) started

06/03/2015 20:27:24 - Info RUNCMD (pid=10760) exiting with status: 0

Operation Status: 0

06/03/2015 20:27:24 - end Flash Backup Windows: Start Notify Script; elapsed time 0:00:00

06/03/2015 20:27:24 - begin Flash Backup Windows: Step By Condition

Operation Status: 0

06/03/2015 20:27:24 - end Flash Backup Windows: Step By Condition; elapsed time 0:00:00

06/03/2015 20:27:24 - begin Flash Backup Windows: Read File List

Operation Status: 0

06/03/2015 20:27:24 - end Flash Backup Windows: Read File List; elapsed time 0:00:00

06/03/2015 20:27:24 - begin Flash Backup Windows: Create Snapshot

06/03/2015 20:27:26 - started process bpbrm (pid=3321)

06/03/2015 20:27:29 - Info bpbrm (pid=3321) TOYFIN is the host to backup data from

06/03/2015 20:27:29 - Info bpbrm (pid=3321) reading file list for client

06/03/2015 20:27:29 - Info bpbrm (pid=3321) start bpfis on client

06/03/2015 20:27:29 - Info bpbrm (pid=3321) Starting create snapshot processing

06/03/2015 20:27:41 - Info bpfis (pid=16960) Backup started

06/03/2015 20:27:41 - snapshot backup of client TOYFIN using method VMware

06/03/2015 20:33:45 - Critical bpbrm (pid=3321) from client TOYFIN: FTL - VMware snapshot failed: SYM_VMC_TASK_REACHED_ERROR_STATE

06/03/2015 20:33:45 - Critical bpbrm (pid=3321) from client TOYFIN: FTL - VMware error received: An error occurred while quiescing the virtual machine. See the virtual machine's event log for details.

06/03/2015 20:33:50 - Critical bpbrm (pid=3321) from client TOYFIN: FTL - snapshot processing failed, status 156

06/03/2015 20:33:50 - Critical bpbrm (pid=3321) from client TOYFIN: FTL - snapshot creation failed, status 156

06/03/2015 20:33:50 - Warning bpbrm (pid=3321) from client TOYFIN: WRN - ALL_LOCAL_DRIVES is not frozen

06/03/2015 20:33:50 - Info bpfis (pid=16960) done. status: 156

06/03/2015 20:33:50 - end Flash Backup Windows: Create Snapshot; elapsed time 0:06:26

06/03/2015 20:33:50 - Info bpfis (pid=16960) done. status: 156: snapshot error encountered

06/03/2015 20:33:50 - end writing

Operation Status: 156

06/03/2015 20:33:50 - end Parent Job; elapsed time 0:06:26

06/03/2015 20:33:50 - begin Flash Backup Windows: Stop On Error

Operation Status: 0

06/03/2015 20:33:50 - end Flash Backup Windows: Stop On Error; elapsed time 0:00:00

06/03/2015 20:33:50 - begin Flash Backup Windows: Delete Snapshot

06/03/2015 20:33:52 - started process bpbrm (pid=4079)

06/03/2015 20:33:55 - Info bpbrm (pid=4079) Starting delete snapshot processing

06/03/2015 20:34:06 - Info bpfis (pid=12592) Backup started

06/03/2015 20:34:06 - Critical bpbrm (pid=4079) from client TOYFIN: FTL - cannot open C:\Program Files\VERITAS\NetBackup\online_util\fi_cntl\bpfis.fim.TOYFIN_1433356044.1.0

06/03/2015 20:34:06 - Info bpfis (pid=12592) done. status: 1542

06/03/2015 20:34:06 - end Flash Backup Windows: Delete Snapshot; elapsed time 0:00:16

06/03/2015 20:34:06 - Info bpfis (pid=12592) done. status: 1542: An existing snapshot is no longer valid and cannot be mounted for subsequent operations

06/03/2015 20:34:06 - end writing

Operation Status: 1542

06/03/2015 20:34:06 - begin Flash Backup Windows: End Notify Script

06/03/2015 20:34:06 - Info RUNCMD (pid=10416) started

06/03/2015 20:34:07 - Info RUNCMD (pid=10416) exiting with status: 0

Operation Status: 0

06/03/2015 20:34:07 - end Flash Backup Windows: End Notify Script; elapsed time 0:00:01

Operation Status: 156

snapshot error encountered  (156)

checked bpfis logs on offhost server:

20:34:06.654 [12592.17416] <32> fis_rebuild_from_db: FTL - cannot open C:\Program Files\VERITAS\NetBackup\online_util\fi_cntl\bpfis.fim.TOYFIN_1433356044.1.0

20:34:06.654 [12592.17416] <4> bpfis main: Overriding original online_util() status <12> with <1542>.

20:34:06.670 [12592.17416] <2> bpfis Exit: Failed to delete snapreplicapair file [C:\Program Files\VERITAS\NetBackup\online_util\fi_cntl\bpfis.fim.TOYFIN_1433356044.1.0.snapreplicapair] return value[-1]                                                                             error [2] [No such file or directory]

20:34:06.670 [12592.17416] <4> keep_alive_and_snap_control: Starting keep alive thread.

 

  • Also check VMs for pending consolidation tasks.  Sometimes the vSphere 'manage snapshots' view doesn't list snapshots, but they do still exist - if you look in the VMFS/NFS datastore folders... and the only way really to see these is either browse the datastore, or change your vSpehere management GUI 'view' to also include the column which shows pending consolidation.

    If you do see pending consolidation, but no 'snapshots' in the GUI, but you do see additional snapshot related VMDK files in the datastore folders... then... do NOT be tempted to delete the snapshot files unless you are absolutely 100% certain that such 'delete' action is approrpriate... BECAUSE...

    ...vSphere/ESXi 'snapshot files' are actually 'pending write logs' - and not at all like discardable VSS delta change logs - i.e. you need to remember that vSphere snaphots contain 'writes' which have not yet been 'consolidated/merged' into the main VMDK files.

    .

    And so, sometimes the pending consolidation stops NetBackup from being able to take its (additional?)  snapshot.

  • Hi,
        This looks like VM snapshot error. Are snapshots getting created properly?
    Kindly check if there are multiple snapshots for this particular VM. If there are many created, kindly consolidate those.

  • I agree - before trying to troubleshoot NBU, you need to check logs in vCentre. Can snapshot be taken in VMware?
  • Also check VMs for pending consolidation tasks.  Sometimes the vSphere 'manage snapshots' view doesn't list snapshots, but they do still exist - if you look in the VMFS/NFS datastore folders... and the only way really to see these is either browse the datastore, or change your vSpehere management GUI 'view' to also include the column which shows pending consolidation.

    If you do see pending consolidation, but no 'snapshots' in the GUI, but you do see additional snapshot related VMDK files in the datastore folders... then... do NOT be tempted to delete the snapshot files unless you are absolutely 100% certain that such 'delete' action is approrpriate... BECAUSE...

    ...vSphere/ESXi 'snapshot files' are actually 'pending write logs' - and not at all like discardable VSS delta change logs - i.e. you need to remember that vSphere snaphots contain 'writes' which have not yet been 'consolidated/merged' into the main VMDK files.

    .

    And so, sometimes the pending consolidation stops NetBackup from being able to take its (additional?)  snapshot.

  • I'll just add another comment about what happens if one were to manually delete the 'snapshot' files...

    ...is that, if the snapshot files are manually deleted... then you have just lost/removed/deleted all of the pending writes which were not consolidated into the virtual machine... i.e. what would have been achieved is to revert the virtual machine to its 'snapshot' state.

    This is why reverting a virtual machine to a previous snapshot is so quick... because all VMware does is to delete the pending writes (which have not yet been merged/consolidated)... and so effectively, reverts a virtual machine back to the 'snapshot' position.

    Which also explains why consolidating a virtual machine which has been in a snapshot state for a long time, may take a long time to consolidate, because there may be lots of 'writes' to merge (consolidate) back in to the base virtual machine VMDK files.

    Which also explains why snapshots can appear to grow so quickly, especially if one deletes a large file (which is not actually removed from the base virtual machine) because the write which is the delete rests in the 'snapshot' VMDK file, and any other new updates or deletes or writes and/or new files created are all also resting/waiting/pending in the 'snapshot' VMDK files.

    All of which just goes to show how the term snapshot could be considered as being a bit mis-leading when trying to work out what is actually going on, under the hood.

    .

    And remember:   VSS is different.  VSS can almost be considered as being the opposite of the above.