Forum Discussion

Nithin_S1's avatar
Nithin_S1
Level 3
11 years ago

Netbackup Restore priority problem over duplications

We are using SYMC Netbackup 7.5.0.4 and EMC Data Domain. Data from the EMC Data Domain gets duplicated to the Tape library for offsite purposes.

With just 10 drives, only 10 duplications jobs runs parallel and others remain in queue.  During this time when we get a restore request, it just remains in teh queue and never gets the require priority. Even if I cancel one of the duplication job to release the drive, it's the another queued job goes active but restore remains in queue again. For some reason Restore is always given less priority than the queued duplicaiton jobs which shouldn't be the case.

Any ideas? I tried suspending the SLP (nbstlutil -lifecycle inactive) and but don't won't to kill/cancel all the duplications as that's not a practical solution.

 

Regards

Nithin

  • before canceling the duplication job, reduce the number of concurent Drives in the tape storage unit to 9 or less, so that once you cancle, they will not get assigned to another SLP job, and avaliable for restore.

    see the netbackup admin guide 2 to understand how EMM allocates the Resources to the jobs.

  • There is always a little confusion when high priority jobs queue when lower priority jobs that are queued appear to get precedence over resource allocation - an issue we've covered a few times on these pages.

    If resources are already in use (e.g. specific media & therefore tape drives) and there are jobs queued that can utilise those resources, then they will take precedence irrespective of the priority of any other queued job that requires some of those same resources (e.g. tape drive).

    NetBackup works this way so that it does not unnecessarily keep loading & unloading media - if queued jobs of a lower priority can utilise the loaded media then they will take precedence over a higher priority queued job that would require different media.

    Understanding the Job Priority setting on Windows
    http://www.symantec.com/business/support/index?page=content&id=HOWTO34237

    The only time I've personally seen a higher priority (restore) job 'jump in', as it were, is when the lower priority jobs required a media change at which point the higher priority job took control of the drive & the lower priority jobs waited until the drive (or another) became free.

    Do you try & mitigate this by wastefully always having one drive free just in case you get a restore request when all drives are being utilised, or do you deal with it as and when it occurs?

  • before canceling the duplication job, reduce the number of concurent Drives in the tape storage unit to 9 or less, so that once you cancle, they will not get assigned to another SLP job, and avaliable for restore.

    see the netbackup admin guide 2 to understand how EMM allocates the Resources to the jobs.

  • It is a know working method of Netbackup. Keeping a tape drive writing seems to have higher priority than a restore request witch require a dismount/mount operation.

    The work around is creating a storage unit with only 9 drives in it - this reserve one tape drives for restores. There is no 1:1 relation between physical drives and storage units in Netbackup. Done this myself.

  • Are you saying that you need to restore from the duplicate copy on tape?

    What is the retention on the DD? 
    One would think that with dedupe, you would be able to keep the backup copy on DD disk for longer, making restores from the DD a non-issue, right?

    Restore from physical tape should really only be needed in a DR environment or if you want to restore from old backups (like a year old...).

  • Thanks Nagalla, that worked:-)

     

    @Nicolai, we have 10 media servers and one drive per media server. So it would be 10 different storage units. Reason being we backup different zones like DMZ/External and they need those media servers in same subnet that can communicate to the clients.

    My main concern, till I upgrade to Netbackup 7.5.0.4 and started using Data Domain, restore had the highest priority. Earler the moment previous job gets over, it would be the restore that's automatically picked from the queue which's no more happening.

     

    @ Marianne, Images are kept on DD only for 2 weeks and if the restore request is prior to that, it would be in tapes that we store in offsite for DR reasons. In this case, it was two weeks older and we recalled the tape and initiated the restore. Restore just be in keep waiting and only the duplications are getting the priority. 

    and Nagalla's method worked but it's definitely a workaround.

     

     

  • There is always a little confusion when high priority jobs queue when lower priority jobs that are queued appear to get precedence over resource allocation - an issue we've covered a few times on these pages.

    If resources are already in use (e.g. specific media & therefore tape drives) and there are jobs queued that can utilise those resources, then they will take precedence irrespective of the priority of any other queued job that requires some of those same resources (e.g. tape drive).

    NetBackup works this way so that it does not unnecessarily keep loading & unloading media - if queued jobs of a lower priority can utilise the loaded media then they will take precedence over a higher priority queued job that would require different media.

    Understanding the Job Priority setting on Windows
    http://www.symantec.com/business/support/index?page=content&id=HOWTO34237

    The only time I've personally seen a higher priority (restore) job 'jump in', as it were, is when the lower priority jobs required a media change at which point the higher priority job took control of the drive & the lower priority jobs waited until the drive (or another) became free.

    Do you try & mitigate this by wastefully always having one drive free just in case you get a restore request when all drives are being utilised, or do you deal with it as and when it occurs?

  • Two weeks seems very little for dedupe device. Data Domain actually recommends 'more than 2 weeks' to ensure good dedupe rates.
    If this is a regular requirement (to restore backups older than 2 weeks), you may want to re-look at retention levels. Or else have a 'next working day' restore policy  for older backups...
    Purchasing another tape drive for restore purposes only may be an option if restores become a daily task.

    You may want to upgrade to NBU 7.6 where the entire SLP processing can be scheduled and (if I remember correctly, can be suspended).

    Apologies - I realize that I am not addressing your priority issue - just wanted to give you additional options...

  • Thanks Andy!

    We don't have dedicated drive for restore, I would need atleast 4 if going with that option as not that all media servers and clients can communicate due to firewall/security reasons.

    Ok, that clarifies why the restore was not getting priority as we had low priority duplication jobs that could make use of already mounted media.

    Can adjust the Storage Unit now to use "0" drives during the restore and cancel the duplication job using the drives for now.

     

     

  • Thanks Marianne, I was never aware of the priority for already mounted tapes even job has low priority.

    It looks the case here as many jobs that are active/queued from disk -> Tape, are marked to be written to the tape loaded until tha's full. When you cancel, the tape still has enough free space which never gets over and those jobs that can duplicate to same tape keep getting prioritized and come on top per Andy's notes and the Restore still in queue.

    I temporarily suspended the duplications, change the SU to use 0 drives, cancelled the active duplication and re-initiated the restore. It's now go the priority.

     

  • You can also inactivate/activate SLP processing as well :

    # nbstlutil inactive -wait -lifecycle name 

    nbstlutil active -wait -lifecycle name 

    http://www.symantec.com/docs/HOWTO43731

    In 7.6 you can defines SLP windows where SLP operation are allowed to take place.