DFS Backup
(version 0.2)
Mission Statement
Due to serious deficiencies in the standard
DFS backup system, which makes is unsuitable for CERN requirements, new
backup system will be developed locally in CERN. The developed backup system
will be flexible, easily maintainable and will offer simple interface for
operator tasks and for end-user tasks. The procedures for dealing with
unexpected situations will be included. It will be possible to reuse the
package with different uderlaying backup mechanism in future.
The alpha version of the backup
system should be ready in September 97, followed by beta version in October
97. The full package should be in operation in till the end of 97.
Case Study
-
Normal Operation
-
Backup Initialisation: The Database is created, jobs are scheduled.
-
Backup Dump: The Database is dumped.
-
Backup: Backup runs unattended under normal circumstances. The only
required operator actions are visual labeling of new tapes and changing
tapes in certain times.
-
Recent Restore: Restore of entities backuped within current period
works without operator attendance. User will reclaim those entities via
standard interface.
-
Old Restore: For restore of entities from previous periods, operator
may be required to put certain tapes into drive. For user, this operation
looks the same as Recent Restore.
-
Problems
-
Backup Recovery: The corrupted Database is restored.
-
Filesystem Crash: Restore after aggregate or partition is lost is
just normal Restore.
-
Tape Unreadable: If any tape is unreadable, restore will be done
with most recent available backup and new backup (new phase) will be started
immediately to replace bad tape.
-
Server Crash: If server crashes, the full set of data will be restored
on another (new) server.
-
Database Lost: If the Database is lost, one of backup copies of
that database will be used. The database can be recreated from information
stored in tapes.
-
Backup Failure: If any backup phase fails, it will be restarted
immediately when possible. Next phase may be postponed till failed phase
is performed.
Requirements
Definitions
-
Period: The time when subsequent backups are placed in one set of
tapes.
-
Current Period: Period, with tapes still inside the drives.
-
Entity: Set of data, which is backuped or restored together.
-
Schema: The prescription, how and when are certain Entities backuped.
-
Database: The system which keeps all information about Schemas and
performed backups.
Requirements
-
Functional Requirements
-
Standard Backup: Under normal circumstances backup should run without
operator intervention, except for periodic visual labeling and changing
tapes in the begining of period.
-
Restore of Recent Backup: Under normal circumstances restore of
entities backuped within current period should run without operator attendance.
-
Restore of Old Backup: Restore entity which is not present on current
period (either at all or the required version) should be possible. Operator
attendance may be required.
-
Entity Description: Each Entity should have unique description.
-
Database Lost: There should be at least two copies of the Database
on different mediae.
-
Database Structure: Database should be structured in human readable
format.
-
Schema Change: It should be possible to change Schema between two
Periods.
-
Backup Entities: These Entities should be available for Backup:
-
Fileset.
-
Restore Entities: These Entities should be available for Restore:
-
Single directory.
-
Fileset.
-
Period Duration: Period should last at least one week.
-
Period Periodicity: All periods should last the same time.
-
Period Content: Set of tapes for one Period should contain at least
one Full Backup of all backuped Entities.
-
Backup UI: User interface for Backup operation should exist.
-
Restore UI: User interface for Restore operation should exist.
-
User Documentation: User Documentation should be available.
-
Operator Documentation: Operator Documentation should be available.
-
Developper Documentation: Developper Documentation should be available.
-
System Properties
-
Constrains
-
Implementation Language: System should be implemented in scripting
language.
-
Modularity: Each tasks should be performed by single program unit.
Design
Design Choises
-
Database Content: There are two parts of the database:
-
Dump:
-
Items: record of tape of period
-
Attributes:
-
date
-
fileset
-
level: backup level (full, differential)
-
Entity:
-
Items:
-
fileset
-
directory of fileset
-
Attributes:
-
dump: connection to Dump database entry
-
children
-
parents
-
place: position of Entity in in original subsystem
-
Database Format: There are several possibilities how to store database:
-
Real database (relational or OO) programs:
-
+: Efficient.
-
+: Fast.
-
+: Small (compressed).
-
-: May cost money (if commercial).
-
-: May require significant amount of work (if home-made).
-
-: Not human readable.
-
-: Recovery after corruption problematic.
-
Plain file:
-
+: Easy to implement.
-
+: Human readable.
-
+: Recovery after corruption possible.
-
-: Unefficient.
-
-: Slow.
-
-: Quite big.
-
Directory structure:
-
+: Efficient.
-
+: Fast.
-
+: Easy to implement.
-
+: Human readable.
-
+: Recovery after corruption possible.
-
-: Quite big.
The directory structure format seems to satisfy best our requirements.
Possibilities of using other approaches may be added later, including functions
making trasformations between various formats.
The database is structured in following way::
-
Dump:
-
Structure: <period>/<tape>/<record>
-
Attributes:
-
date:<yyyy:mm:dd:hh:mm:ss>
-
fileset:<xxx>
-
level:<aaa[:aaa]>
-
Example:
-
A
-
0
-
fileset:user.hrivnac
-
000
-
date:1997:09:05:02:00:00
-
level:full
-
001
-
date:1997:09:06:02:00:00
-
level:full:first
-
1
-
B
-
Entity:
-
Structure: <fileset>[/<directory>[/<directory>]][/.../<attributes>]
-
Attributes:
-
<period>:<tape>:<record>:<data>:<level>
-
Example:
-
user.hrivnac
-
...
-
A:0:001:1997:09:05:02:00:00:full
-
A:3:001:1997:09:06:02:00:00:full:first
-
work
Not all attributes are stored for all Entities. Some attributes may have
different format depending on other attributes. If any entry doesn't contain
any needed attribute, the attribute situated higher in the hierarchy is
used. Non-existence of Entity children doesn't mean that they are not backuped,
it only means that they cannot be recovered as single Entities.
-
Schema Content:
-
backup:
-
when (what time)
-
what (what Entity)
-
how (what level)
-
command:
-
when (what time)
-
what (what to perform)
-
Schema Format: <cron time> [<fileset> <level>[/<level>]|command]
[# comment]
Schema is formated like standard crontab. It allows using of the
Schema as input to cron for scheduling.
-
Example:
22 2 * * 3 BackupEngine dump user.a* full
22 2 * * 4 BackupEngine dump user.a* full:first
22 2 * * 5 BackupEngine dump user.a* full:first:second
22 2 * * 6 BackupEngine dump user.a* full:first
22 2 * * 0 BackupEngine dump user.a* full:first:second
22 2 * * 1 BackupEngine dump user.a* full:first
22 2 * * 2 BackupEngine dump user.a* full:first:second
23 2 * * 4 BackupEngine dump user.b* full
23 2 * * 5 BackupEngine dump user.b* full:first
23 2 * * 6 BackupEngine dump user.b* full:first:second
23 2 * * 0 BackupEngine dump user.b* full:first
23 2 * * 1 BackupEngine dump user.b* full:first:second
23 2 * * 2 BackupEngine dump user.b* full:first
23 2 * * 3 BackupEngine dump user.b* full:first:second
-
Options:
-
Items:
-
dbHome: <directory>
-
tape: <dev-file>
-
Example:
dbHome: /var/backupDB
tape: /dev/rmt8
Proccess Decomposition
-
BackupEngine: The proccess taking care of backup. Accepts
calls, usually from cron.
-
clone(<entity>): Creates backup clone.
-
updateDB(<entity>, <dbName>): Updates database.
-
prepareDump(<entity>): Prepares dump.
-
dump(<entity>, <level>): Performs the dump.
-
RestoreEngine: The program taking care of restore. Runs
only on demand. Containes GUI. Is available to end-users.
-
restore(<entity>, <dbName>): Performs the restore.
-
find(<entity>, <dbName>): Finds the Entity to restore.
-
DatabaseManager: The only program talking to databases.
Usually not used directly by users.
-
create(<dbName>, <dbPlace>): Creates new database (both
levels).
-
add(<entity>, <dbName>): Adds Entity.
-
remove(<entity>, <dbName>): Removes Entity.
-
find(<entity>, <dbName>): Finds Entity.
-
certify(<entity>, <dbName>): Certifies Entity.
-
check(<dbName>, <period>:<tape>:<record>): Checks
information from tape against database.
-
update(<dbName>, <period>:<tape>:<record>): Updates
information in database from tape.
-
dump (<dbName>, <file>): Backups database itself.
-
restore(<dbName>, <file>): Restores database itself.
-
SystemInterface: The interface to system functions:
-
dump(<fileset>, <tape>, <version>): Performs dump.
-
restore(<fileset>, <tape>, <server>, <aggregate>):
Performs restore.
-
tape(<period>:<tape>:<record>): Puts tape into drive
and position it. Ask Operator if neccessary.
-
Tape Manager: The only program talking to tape drives.
Interface to tape drive functions. Usually noy used directly by users.
-
tape(<tape>): Puts tape into drive.
-
label(<label>): Labels tape currently in drive.
-
position(<record>): Positions current tape.
-
record(): Finds, where current tape is positioned.
-
ConfigurationManager: The only program updating the configuration
data. Containes GUI. Stores all information in standard place.
-
dataBase(<dbName>): Creates database.
-
setSchema(<dbName>): Reads the whole schema from STDIN.
-
getSchema(<dbName>): Writes the whole schema into STDOUT.
-
setOptions(<dbName>): Reads options from STDIN.
-
getOptions(<dbName>): Writes options into STDOUT.
-
option(<dbName>, <optionName>): Gets single option.
-
setPeriod(<dbName>, <period>): Sets current period.
-
getPeriod(<dbName>): Gets current period.
Functional Decomposition
-
Backup Initialisation:
-
ConfigurationManager.setOptions(<dbName>) < Options.txt
-
ConfigurationManager.dataBase(<dbName>)
-
DatabaseManager.create(<dbName>, <dbPlace>)
-
ConfigurationManager.setSchema(<dbName>) < Schema.txt
-
ConfigurationManager.setPeriod(<dbName>, <period>)
-
Backup:
-
BackupEngine.updateDB(<entity>, <dbName>)
-
DatabaseManager.add(<entity>, <dbName>)
-
BackupEngine.dump(<entity>)
-
BackupEngine.certifyDB(<entity>, <dbName>)
-
DatabaseManager.certify(<entity>, <dbName>)
Implementation
Implementation Choises
-
Implementation Language: The system will be implemented in Tcl language.
This allows easy access into DCE/DFS functionality and allows later easy
implementation of GUI front-end based on Tk package.
Project Diary
-
21Jul-25Jul: Initial Case Study and Requirements Definition (version
0.1)
-
General Case Study and URD created.
-
4Aug-6Aug: Initial Design (version 0.1)
-
Database format desided (filesystem).
-
Schema format desided.
-
Proccess and Functional Decomposition done.
-
Implementation languaga chosen (Tcl).
-
Proccess skeletons created.
-
7Aug-8Aug: Initial Design (version 0.2)
-
Period renamed from number lo letter.
-
Updating database from tape added.
-
Links fom Entity database to Dump database replaced by full description.
-
TapeManager proccess added.
-
BackupEngine.certify erased.
-
Schedule reformat to have standard crontab format.
Project Plan
-
21Jul-25Jul: Initial Case Study and Requirements Definition (version
0.1)
-
4Aug-8Aug: Initial Design (version 0.2)
-
25Aug-29Aug: Design Review (version 0.2)
-
September: Implementation (alpha)
-
October: Full Cycle (beta)
-
November: Full Cycle (version 1.0)
-
December: Deployment.
previous versions: 0.1
J.Hrivnac, 6Oct97