Slow Duplication Jobs Which Then Slow Other Jobs Down
Somewhat tough to explain what I am seeing, but basically the first weekend of each month we do what we call a Monthly Full. This is where our longterm retention requirements are met and we do Full Backups, Replicate data via SLP and then duplicate these backups to tape via SLP which is retained for 1 to 7 years based on requirements. Now each Monday after this full monthly weekend we see 100's of jobs queued up including the Incrementals that should be running and then they start failing with 196's because they are queued for so long. Those backups are written to disk. Now this is a busy time because obviously there is a lot of reading and writing going on from this disk, but the replication SLP jobs still run without issue pretty quickly. I think what we are seeing is some issues with these duplication jobs. I have 4 dedicated LTO 6 drives that dupe this data to tape so we arent fighting for resources between the backups and duplications.
Not sure where to begin with this one so I am all ears and just looking all over. I have noticed that we dont have any lifecycle parameters set at all.
/usr/openv/netbackup/db/config ->more LIFECYCLE_PARAMETERS
#
# Beginning with the NetBackup 7.6 release, Storage Lifecycle configuration
# values are now stored as part of the NetBackup system configuration data
# and can be viewed or changed using the Storage Lifecycle Parameters node
# under Host Properties in the Administration Console or via the bpsetconfig
# and bpgetconfig commands. The prior contents of this file have been automatically
# migrated to the NetBackup system configuration storage. That prior content
# can be viewed in the LIFECYCLE_PARAMETERS.deprecated file for historical purposes.
#
# When modifying values via the bpsetconfig command, be aware that all
# Storage Lifecycle parameter names are now prepended by 'SLP.'. In
# addition, the following parameter names have been changed:
#
# CLEANUP_SESSION_INTERVAL_HOURS is now SLP.CLEANUP_SESSION_INTERVAL
# IMAGE_EXTENDED_RETRY_PERIOD_IN_HOURS is now SLP.IMAGE_EXTENDED_RETRY_PERIOD
# MAX_KB_SIZE_PER_DUPLICATION_JOB is now SLP.MAX_SIZE_PER_DUPLICATION_JOB
# MAX_GB_SIZE_PER_DUPLICATION_JOB is now SLP.MAX_SIZE_PER_DUPLICATION_JOB
# MAX_GB_SIZE_PER_BACKUP_REPLICATION_JOB is now SLP.MAX_SIZE_PER_BACKUP_REPLICATION_JOB
# MAX_MINUTES_TIL_FORCE_SMALL_DUPLICATION_JOB is now SLP.MAX_TIME_TIL_FORCE_SMALL_DUPLICATION_JOB
# MIN_KB_SIZE_PER_DUPLICATION_JOB is now SLP.MIN_SIZE_PER_DUPLICATION_JOB
# MIN_GB_SIZE_PER_DUPLICATION_JOB is now SLP.MIN_SIZE_PER_DUPLICATION_JOB
# VERSION_CLEANUP_DELAY_HOURS is now SLP.VERSION_CLEANUP_DELAY
#
# Size and time values are now specified using units like 'minutes' or 'gigabytes'.
#
# The following parameters have been deprecated due to changes in SLP processing
# and are no longer recognized:
#
# = DUPLICATION_SESSION_INTERVAL_MINUTES
# = IMPORT_SESSION_TIMER
#
# See the NetBackup Adminstrator's Guide, Volume 1 for more information.
#
Also I noticed that our OFFSITE SLP's that do the duplication are not set to preserve multiplexing...
This is my current SLP backlog...
Backlog of incomplete SLP Copies
In Process (Storage Lifecycle State: 2):
Number of copies: 974
Total expected size 114655594 MB
SLP Name: (state) Number of copies: Size:
BR_OFFSITE_P09 (active) 14 861128 MB
XP53TAPE008_SLP (active) 9 758711 MB
XP53TAPE008_SLP_OFFSITE (active) 394 29658783 MB
XP53TAPE009_SLP (active) 20 4380659 MB
XP53TAPE009_SLP_OFFSITE (active) 227 12631405 MB
XP53TAPE010_SLP (active) 18 8309785 MB
XP53TAPE010_SLP_OFFSITE (active) 292 58055121 MB
Total: 974 114655592 MB
The _OFFSITE ones are the ones doing the duplication operations.
Since you are on 7.6 I would implement SLP windows so duplication occurs during day time. I would also ensure backup has higher priority than duplications. Doing writing and reading on the same disk will impact overall performance so best to separate the two work loads.
LIFECYCLE_PARAMETERS is deprecated in 7.6. The same setting can now be found under the master server settings. But you won't find a setting that solve youre issue, however tewaking workload may ease the issue.
There is a TN describing the meaning of the SLP setting, it written for 7.5 that still use the LIFECYCLE_PARAMETERS file. Just ignore that.
http://www.veritas.com/docs/000023582
Doing duplication from disk to tape is per definition non-multiplexed.Preserve multiplexing is for duplicating tapes only.