Blog Post

Protection
5 MIN READ

Recovery Assistance with Rapid Recommendations

souvik-dutta's avatar
27 days ago

The hardest time to make decisions is when it matters the most. Informed, targeted recommendations help.

Making decisions is tough — especially with multiple options and with the urgency to
act quickly. Indeed, this is the situation when initiating a recovery after a cyber-security
breach, where a swift decisive action is essential without hesitation or missteps. This blog
explains how Veritas assists a backup administrator make this decision using recovery point recommendations, in-silo or as part of an automated recovery blueprint. In this blog, we review these FAQs and some more:

  • How are recovery points recommended?
  • I noticed the recommended backup image change over time -- is this normal?
  • Are you taking into account attacks that are happening in real time?
  • As a user, what are the action items for me?

Multi-variate recommendations

Suppose you need to commute to work everyday, the shortest distance path to which, from your home, is 10 miles. Every morning, you take that same route -- which takes 15 mins on good days, but takes up to 40 mins on days with more traffic. 
To remove guesswork from the commute, your Maps app solves for the least time path. On some days, it may overlap with the shortest distance path, but on other days, it helps you save time with another optimal choice. 

Choosing an appropriate recovery point (RP) is surprisingly similar. Many recovery workflows today rely on selecting the recovery point with last known good malware scan. This is analogous to the selecting the shortest distance path every time. However, combining multiple alternate security indicators on your data/metadata can reliably recommend RPs within seconds and with potentially a better recovery point objective (RPO). This is akin to an optimized least time path. Sure, sometimes the recommended RP may coincide with the last known good RP, but that is assured to be worst case scenario.

Consider the situation below: the RP on 10/22 is known as infected, and the last known clean malware scanned RP is a week older (10/15).  Suppose you also have an estimate of how much new data gets backed up per day (and therefore, an estimate of how much data will be potentially lost on recovery), which RP should we choose to recover from?

A typical set of recovery point candidates -- shown in green is the last "clean" image, and in red is the recent infected image. Recovering from older images usually lead to a larger loss in data.

One option is to scan RPs in reverse chronological order since infection: 10/21, 10/20, ..., and select the most recent RP for recovery. But there are two glaring problems with this strategy:

  1. Malware scans are expensive -- when every minute counts during a cyber-incident, performing multiple malware scans is equivalent to losing precious time. 
  2. 75% (and growing!) of all malware attacks are living-off-the-land (Crowdstrike 2024 report); i.e., malware scans can also lead to false negatives. So what appears clean may not be clean after all, and complementary signals are required to be sure.

I: Sanctity confidence for recovery points

We resolve both concerns above by attaching a probability of sanctity for every RP, expressed as a percentage, between 0 and 100. These probabilities are obtained from a statistical model, by considering trends among a wide variety of features, including
granular file-entropy and job anomalies.

The same set of recovery point candidates, but with sanctity probabilities in [0, 100] indicated.

We note, in this case, that the malware scanned RP on 10/15 turns out to have a high confidence (97.8%) of being pristine. But so are many of the other RPs -- this enables us to reduce our reliance entirely on malware scans, and have an increased set of images, potentially more recent, that can be chosen as the RP to recover from. 

II: Incorporating RPO-RTO awareness

In Veritas Alta View, only those RPs are short-listed for further scrutiny, for whom our confidence of image sanctity is above 95%. Users may be able to override and select any other candidates if they so wish. Over this filtered set of RPs, we convolve two other important metrics that we aim to minimize:

  • Time-to-recover: we utilize actual recovery and rehearsal statistics, storage media
    type, image location, etc., to estimate the time to perform a full recovery
  • Data-loss upon recovery: we regress on prior image sizes and deduplication ratios to
    estimate the data loss after recovery is complete. Typically, recovering from older
    images lead to larger data loss.

Below, we show the recovery point scores (normalized between 0 and 100) obtained by a joint consideration over all the previous attributes. Note that the image from 10/20 is neither malware-scanned, nor with the highest sanctity score, nor the most recent – however when considered in tandem, this image is considered the best.

The backup image on 10/20 has the best score, and will be ranked as #1.

On Veritas Alta View, these recommendations are supplemented with some easy-to-
interpret attributes, as pictured below, to enable the user make a more judicious decision
beyond our suggestion. Hovering over the 👍 displays a succinct explanation of the recommendation.

The most recent backup image is confirmed as infected. The anomaly risk and entropy score
are also elevated, as may be expected if the malware has started encrypting our data. The last scanned
clean copy is from Oct 11. Our recommendation engine suggests a more recent image (Oct 14) based on
several attributes, some of which are listed above (and expanded upon in FAQ #1).

FAQs

Q1: How are recovery points recommended?
A: We consider a wide range of features: security metrics (e.g., granular file-entropy, job anomalies), age of backup images, backup schedule details (e.g., frequency of full and incremental backups), estimated time to complete recovery, estimated data loss upon recovery, etc. In many cases, the final recommendation may be trivially explainable, but in some cases may require more insight, which can be obtained by hovering over the 👍 icon.

Q2: I noticed the recommended backup image change over time -- is this normal?
A: 
Yes, backup times (relative to the current time) and the frequency of backups are an important factor in the recommendation. As such, the output of the model is also dynamic in time, and may change to best reflect the recommendation at the present instant in time.

Q3: Are you taking into account attacks that are happening in real time?
A: We are constantly monitoring CISA, FBI, MITRE and other sources for up-to-date information on ransomware lifecycles. We also test our anomaly detection features against live ransomware binaries in our RED-Lab, and make regular improvements to our algorithms to enable day-0 detection. These enhancements directly affect the sanctity probability of RPs, and thereby, our recommendations. 

Q4: As a user, what are the action items for me?
A: At recovery time, you may choose to perform a malware scan on the recommended recovery point for additional peace of mind. But you can be rest assured that the RPO from this feature improves upon or (at worst) equals the RPO from that of the last known good malware scanned copy.

Updated 26 days ago
Version 2.0
No CommentsBe the first to comment