RPO and RTO

RPO is the “recovery point objective”, the acceptable amount of data loss ¬†from the point of failure.

RTO is the “recovery time objective”, the acceptable amount of time available to restore data within the RPO from the time of the point of failire,

The RTO is often dictated by an SLA or KPI or the like and often this is unrealistic in the event of a real disaster scenario

The RPO is often dictated by the backup policy, it should instead be dictated by the SLA as a data loss acceptance level from the business.If a system is backed up once per day then the RPO is automatically 24 hours.

Backup Basics (Don’t throw the baby out with the bath-water!)

In part 2 of my backup basics snippets, I introduce you to the concept of “not throwing the baby out with the bath-water” and what this means in backup-terms is that you do not delete your last-good backups before you have secured your next backup.

It seems for many so tempting to just wrench the retention period down to 0 and obliterate what might be in some cases a situation where a non-zero retention is the only valid recovery plan from the previous backup failure.

The mitigation of this risk has a capacity consequence, which should be calculated at the time of system design. The mistake illustrated here is most frequently encountered on database systems where there is often insufficient space to back-up the database once it breaches 50% of it’s data LUN.

If the data set is expected to be up to s Gb and c cycles are required then the backup volume or device needs to have the capacity of at least (c * s)+s Gb in order to hold sufficient copies of a backup; preferably on cheaper, albeit slower storage.

Do not skimp on backup storage if you want your DBAs to love you, they hate application-based backup agents and will give their left sock for a flat-file on-disk backup any-day!

Most importantly – it will also improve your RTO and if archive logs are available it will improve your RPO too.

The thing is to think ahead and always provision at least (c * s) + s storage for the backup volume

Keep this golden rule and your backup solution should remain fit for purpose given the finite storage resource available to the production data itself.

Good Backup (The basics)

Backup Blues

 

Tonight, I extol the virtue of good backups. It is not for me to say how frequently you should take a backup – that is your risk to assess. What I can say is that I believe that backups should meet a certain criteria to be considered a valid backup. I find it desperately lacking how the holistic view is missed by most backup admins who fail to consider the sum of the following as the minimum criteria for success:-

  1. Was the data in a suitable state for backup prior to the backup taking place?
  2. Did the backup start on-time as scheduled?
  3. Did the backup complete without error within it’s defined backup window? e.g. did it finish on-time?
  4. Did the backup drop any files? See dropped files
  5. Did the post-backup processes (if required) succeed?

Any failure of this 5-point plan constitutes a backup failure.

Only a backup which fulfills this criteria in full can be considered a good backup.

 

Dropped Files

 

All dropped files should be investigated and excluded if not required for a successful DR in order to not encourage a culture of accepting the failure of dropped files – excluding files not required for DR also improve the backup times because excluding temporary or re-constructable data means that there is less critical data to back-up.

This is an iterative process which should tend to zero dropped files as sources of temporary files are identified.

Some files may require that the backup is run at a different scheduled time or it may require that an application based scheduled task is moved to a different time outside of the backup window if the data cannot be quiesced.