DFS Backup

(version 0.1)

Mission Statement

Due to serious deficiencies in the standard DFS backup system, which makes is unsuitable for CERN requirements, new backup system will be developed locally in CERN. The developed backup system will be flexible, easily maintainable and will offer simple interface for operator tasks and for end-user tasks. The procedures for dealing with unexpected situations will be included. It will be possible to reuse the package with different uderlaying backup mechanism in future. The alpha version of the backup system should be ready in September 97, followed by beta version in October 97. The full package should be in operation in till the end of 97.

Case Study

Normal Operation

Backup Initialisation: The Database is created, jobs are scheduled.
Backup Dump: The Database is dumped.
Backup Recovery: The corrupted Database is restored.
Backup: Backup runs unattended under normal circumstances. The only required operator actions are visual labeling of new tapes and changing tapes in certain times.
Recent Restore: Restore of entities backuped within current period works without operator attendance. User will reclaim those entities via standard interface.
Old Restore: For restore of entities from previous periods, operator may be required to put certain tapes into drive. For user, this operation looks the same as Recent Restore.

Problems

Filesystem Crash: Restore after aggregate or partition is lost is just normal Restore.
Tape Unreadable: If any tape is unreadable, restore will be done with most recent available backup and new backup (new phase) will be started immediately to replace bad tape.
Server Crash: If server crashes, the full set of data will be restored on another (new) server.
Database Lost: If the Database is lost, one of backup copies of that database will be used.
Backup Failure: If any backup phase fails, it will be restarted immediately when possible. Next phase may be postponed till failed phase is performed.

Requirements

Definitions

Period: The time when subsequent backups are placed in one set of tapes.
Current Period: Period, with tapes still inside the drives.
Entity: Set of data, which is backuped or restored together.
Schema: The prescription, how and when are certain Entities backuped.
Database: The system which keeps all information about Schemas and performed backups.

Requirements

Functional Requirements

Standard Backup: Under normal circumstances backup should run without operator intervention, except for periodic visual labeling and changing tapes in the begining of period.
Restore of Recent Backup: Under normal circumstances restore of entities backuped within current period should run without operator attendance.
Restore of Old Backup: Restore entity which is not present on current period (either at all or the required version) should be possible. Operator attendance may be required.
Entity Description: Each Entity should have unique description.
Database Lost: There should be at least two copies of the Database on different mediae.
Database Structure: Database should be structured in human readable format.
Schema Change: It should be possible to change Schema between two Periods.
Backup Entities: These Entities should be available for Backup:

Fileset.

Restore Entities: These Entities should be available for Restore:

Single file.
Single directory.
Fileset.

Period Duration: Period should last at least one week.
Period Periodicity: All periods should last the same time.
Period Content: Set of tapes for one Period should contain at least one Full Backup of backuped Entities.
Backup UI: User interface for Backup operation should exist.
Restore UI: User interface for Restore operation should exist.
User Documentation: User Documentation should be available.
Operator Documentation: Operator Documentation should be available.
Developper Documentation: Developper Documentation should be available.
Database Backup: The backup information (Database) itself should be backuped using at least two different techniques.

System Properties
Constrains

Implementation Language: System should be implemented in scripting language.
Modularity: Each tasks should be performed by single program unit.

Design

Design Choises

Database Content: There are two real databases:

Dump:

Items: record of tape of period
Attributes:

date
fileset
level: backup level (full, differential)

Entity:

Items:

fileset
directory of fileset
file of directory of fileset

Attributes:

dump: link to Dump database entry
children
parents
place: position of Entity in in original subsystem

Database Format: There are several possibilities how to store database:

Real database (relational or OO) programs:

+: Efficient.
+: Fast.
+: Small (compressed).
-: May cost money (if commercial).
-: May require significant amount of work (if home-made).
-: Not human readable.
-: Recovery after corruption problematic.

Plain file:

+: Easy to implement.
+: Human readable.
+: Recovery after corruption possible.
-: Unefficient.
-: Slow.
-: Quite big.

Directory structure:

+: Efficient.
+: Fast.
+: Easy to implement.
+: Human readable.
+: Recovery after corruption possible.
-: Quite big.

Dump:

Structure: <period>/<tape>/<record>
Attributes:

date:<yyyy:mm:dd:hh:mm:ss>
fileset:<xxx>
level:<aaa[:aaa]>

Example:

000

fileset:user.hrivnac
000

date:1997:09:05:02:00:00
level:full

001

date:1997:09:06:02:00:00
level:full:first

001

Entity:

Structure: <fileset>[/<directory>[/<directory>]][/<file>]
Attributes (stored in .../):

dump:<period>:<tape>:<record> - link to Dump database
children - defined by structure
parents - defined by structure
place - defined by structure

Example:

user.hrivnac

...

dump:000:0:001 -> $dbHome/$dbName/Dump/000/0/001
dump:000:3:001 -> $dbHome/$dbName/Dump/000/3/001

work

Schema Content:

backup:

when (what time)
what (what Entity)
how (what level)

command:

when (what time)
what (what to perform):

ChangeTape: instruction to operator

Schema Format: <cron time> [<fileset> <level>[/<level>]|command] [# comment]

Example:

02 10 * * 2 ChangeTapes

22 2 * * 3 user.hrivnac full

22 2 * * 4 user.hrivnac full:first

22 2 * * 5 user.hrivnac full:first:second

22 2 * * 6 user.hrivnac full:first

22 2 * * 0 user.hrivnac full:first:second

22 2 * * 1 user.hrivnac full:first

22 2 * * 2 user.hrivnac full:first:second

23 2 * * 4 user.rtb full

23 2 * * 5 user.rtb full:first

23 2 * * 6 user.rtb full:first:second

23 2 * * 0 user.rtb full:first

23 2 * * 1 user.rtb full:first:second

23 2 * * 2 user.rtb full:first

23 2 * * 3 user.rtb full:first:second

Options:

Items:

dbHome: <directory>
tape: <dev-file>

Example:

dbHome: /var/backupDB

tape: /dev/rmt8

Proccess Decomposition

BackupEngine: The proccess taking care of backup. Accepts calls, usually from cron.

updateDB(<entity>, <dbName>): Updates both databases. New information is valid only after cerification.
certifyDB(<entity>, <dbName>): Certifies the information in both databases if dump is successfull.
dump(<entity>): Performs the dump.
reDump(<entity>): Redoes the dump in case of failure.

RestoreEngine: The program taking care of restore. Runs only on demand. Containes GUI. Is available to end-users.

restore(<entity>, <dbName>): Performs the restore.
find(<entity>, <dbName>): Finds the Entity to restore.

DatabaseManager: The only program talking to databases. Usually not used directly by users.

create(<dbName>, <dbPlace>): Creates new database (both levels).
add(<entity>, <dbName>): Adds Entity.
remove(<entity>, <dbName>): Removes Entity.
find(<entity>, <dbName>): Finds Entity.
certify(<entity>, <dbName>): Certify Entity.

SystemInterface: The interface to system functions:

dump(<fileset>, <tape>, <version>): Performs dump.
restore(<fileset>, <tape>, <server>, <aggregate>): Performs restore.
tape(<number>): Puts tape into drive.

ConfigurationManager: The only program updating the configuration data. Containes GUI. Stores all information in standard place.

dataBase(<dbName>): Creates database.
setSchema(<dbName>): Reads the whole schema from STDIN.
getSchema(<dbName>): Writes the whole schema into STDOUT.
setOptions(<dbName>): Reads options from STDIN.
getOptions(<dbName>): Writes options into STDOUT.

Functional Decomposition

Backup Initialisation:

ConfigurationManager.setOptions(dbName) < Options.txt
ConfigurationManager.dataBase(dbName)

DatabaseManager.create(dbName, dbPlace)

ConfigurationManager.setSchema(dbName) < Schema.txt
ConfigurationManager.setPeriod(dbName) < Period.txt

Backup:

BackupEngine.updateDB(entity, dbName)

DatabaseManager.add(entity, dbName)

BackupEngine.dump(entity)
BackupEngine.certifyDB(entity, dbName)

DatabaseManager.certify(entity, dbName)

Implementation

Implementation Choises

Implementation Language: The system will be implemented in Tcl language. This allows easy access into DCE/DFS functionality and allows later easy implementation of GUI front-end based on Tk package.

Project Diary

21Jul-25Jul: Initial Case Study and Requirements Definition (version 0.1)

General Case Study and URD created.

4Aug-8Aug: Initial Design (version 0.1)

Database format desided (filesystem).
Schema format desided.
Proccess and Functional Decomposition done.
Implementation languaga chosen (Tcl).
Proccess skeletons created.

Project Plan

21Jul-25Jul: Initial Case Study and Requirements Definition (version 0.1)
4Aug-8Aug: Initial Design (version 0.1)
25Aug-29Aug: Design Review (version 0.1)
September: Implementation (version 0.1 = alpha)
October: Full Cycle (version 0.2 = beta1)
November: Full Cycle (version 1.0)
December: Deployment.

J.Hrivnac, 6Oct97