Forum Discussion

backup-botw's avatar
10 years ago

Slow Duplication Jobs Which Then Slow Other Jobs Down

Somewhat tough to explain what I am seeing, but basically the first weekend of each month we do what we call a Monthly Full. This is where our longterm retention requirements are met and we do Full Backups, Replicate data via SLP and then duplicate these backups to tape via SLP which is retained for 1 to 7 years based on requirements. Now each Monday after this full monthly weekend we see 100's of jobs queued up including the Incrementals that should be running and then they start failing with 196's because they are queued for so long. Those backups are written to disk. Now this is a busy time because obviously there is a lot of reading and writing going on from this disk, but the replication SLP jobs still run without issue pretty quickly. I think what we are seeing is some issues with these duplication jobs. I have 4 dedicated LTO 6 drives that dupe this data to tape so we arent fighting for resources between the backups and duplications.

Not sure where to begin with this one so I am all ears and just looking all over. I have noticed that we dont have any lifecycle parameters set at all.

/usr/openv/netbackup/db/config ->more LIFECYCLE_PARAMETERS

#

# Beginning with the NetBackup 7.6 release, Storage Lifecycle configuration

# values are now stored as part of the NetBackup system configuration data

# and can be viewed or changed using the Storage Lifecycle Parameters node

# under Host Properties in the Administration Console or via the bpsetconfig

# and bpgetconfig commands. The prior contents of this file have been automatically

# migrated to the NetBackup system configuration storage. That prior content

# can be viewed in the LIFECYCLE_PARAMETERS.deprecated file for historical purposes.

#

# When modifying values via the bpsetconfig command, be aware that all

# Storage Lifecycle parameter names are now prepended by 'SLP.'. In

# addition, the following parameter names have been changed:

#

# CLEANUP_SESSION_INTERVAL_HOURS              is now SLP.CLEANUP_SESSION_INTERVAL

# IMAGE_EXTENDED_RETRY_PERIOD_IN_HOURS        is now SLP.IMAGE_EXTENDED_RETRY_PERIOD

# MAX_KB_SIZE_PER_DUPLICATION_JOB             is now SLP.MAX_SIZE_PER_DUPLICATION_JOB

# MAX_GB_SIZE_PER_DUPLICATION_JOB             is now SLP.MAX_SIZE_PER_DUPLICATION_JOB

# MAX_GB_SIZE_PER_BACKUP_REPLICATION_JOB      is now SLP.MAX_SIZE_PER_BACKUP_REPLICATION_JOB

# MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB is now SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB

# MIN_KB_SIZE_PER_DUPLICATION_JOB             is now SLP.MIN_SIZE_PER_DUPLICATION_JOB

# MIN_GB_SIZE_PER_DUPLICATION_JOB             is now SLP.MIN_SIZE_PER_DUPLICATION_JOB

# VERSION_CLEANUP_DELAY_HOURS                 is now SLP.VERSION_CLEANUP_DELAY

#

# Size and time values are now specified using units like 'minutes' or 'gigabytes'.

#

# The following parameters have been deprecated due to changes in SLP processing

# and are no longer recognized:

#

# = DUPLICATION_SESSION_INTERVAL_MINUTES

# = IMPORT_SESSION_TIMER

#

# See the NetBackup Adminstrator's Guide, Volume 1 for more information.

#

Also I noticed that our OFFSITE SLP's that do the duplication are not set to preserve multiplexing...

mpx.png

 

This is my current SLP backlog...

Backlog of incomplete SLP Copies

        In Process (Storage Lifecycle State: 2):

                Number of copies:       974

                Total expected size     114655594 MB

 

SLP Name: (state)                                 Number of copies: Size:

BR_OFFSITE_P09 (active)                                       14      861128 MB

XP53TAPE008_SLP (active)                                       9      758711 MB

XP53TAPE008_SLP_OFFSITE (active)                394     29658783 MB

XP53TAPE009_SLP (active)                                      20     4380659 MB

XP53TAPE009_SLP_OFFSITE (active)                 227    12631405 MB

XP53TAPE010_SLP (active)                                      18     8309785 MB

XP53TAPE010_SLP_OFFSITE (active)                 292    58055121 MB

 

Total:                                                       974   114655592 MB

 

The _OFFSITE ones are the ones doing the duplication operations.

  • Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.

    LIFECYCLE_PARAMETERS is  deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue. 

    There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.

    http://www.veritas.com/docs/000023582

    Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.

     

  • Also worth noting that I dont see any data buffer files or anything on the media servers with the disk pools attached in /usr/openv/netbackup/db/config.

    Netbackup Version: 7.6.0.2

    Media Server OS: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

    Master Server OS: Solaris 10

  • Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.

    LIFECYCLE_PARAMETERS is  deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue. 

    There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.

    http://www.veritas.com/docs/000023582

    Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.

     

  • Be aware - SLP windows are not just setting start times, they kill duplications if they are not complete by the end of the window, OW.

    You can still set SLP parameters at the OS level ( at least I do using Solaris, on NBN 7.6.0.4 )

    The files are located in /usr/openv/var/global/

    I set up aliases and files so I can change between configs on the fly.

    # cat /usr/openv/var/global/nbcl.conf.big

    SLP.MIN_SIZE_PER_DUPLICATION_JOB = 64 GB
    SLP.MAX_SIZE_PER_DUPLICATION_JOB = 512 GB
    SLP.JOB_SUBMISSION_INTERVAL = 20 minutes
    SLP.IMAGE_PROCESSING_INTERVAL = 20 minutes
    SLP.IMAGE_EXTENDED_RETRY_PERIOD = 1 hour
    SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB = 20 minutes

     

    # cat /usr/openv/var/global/nbcl.conf.small

    SLP.MIN_SIZE_PER_DUPLICATION_JOB = 32 GB
    SLP.MAX_SIZE_PER_DUPLICATION_JOB = 64 GB
    SLP.JOB_SUBMISSION_INTERVAL = 5 minutes
    SLP.IMAGE_PROCESSING_INTERVAL = 5 minutes
    SLP.IMAGE_EXTENDED_RETRY_PERIOD = 10 minutes
    SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB = 2 minutes

     

     

    I set up aliases, so I can type slpbig or slpsmall and change the config on the fly.

    slpbig='/usr/openv/netbackup/bin/admincmd/bpsetconfig /usr/openv/var/global/nbcl.conf.big'

    slpsmall='/usr/openv/netbackup/bin/admincmd/bpsetconfig /usr/openv/var/global/nbcl.conf.small'

     

     

  • In my case I have an entire month before the duplication jobs would be queued up again so if I cant get them all done in a week its no big deal if a few get stopped here and there.