$Id: README,v 1.1.1.1 1999/02/02 23:29:39 shmit Exp $ TICRA ----- Brian Cully Contents: 0 - Introduction 1 - Installation Instructions 1.1 - Building the Package 1.2 - Configuring 1.2.1 - Server Configuration 1.2.1.1 - Hostlist 1.2.1.2 - Server.conf 1.2.2 - Client Configuration 1.2.2.1 - Disklist 1.2.2.2 - Dumptypes 1.2.2.3 - Client.conf 2 - Common Problems 3 - How it works 3.1 - Gathering Estimates 3.2 - Gathering the Dumps 3.2.1 - The Master 3.2.2 - The Slaves 3.2.2.1 - Run-dump 3.3 - Putting It on to the Tape 3.3.1 - Writing the Dump 3.4 - Sending out the Reports 4 - What Still Needs to be Done Appendix A - Example Configuration Files Appendix B - The Tape Format Appendix C - Design Decisions Part 0: Introduction -------------------- TICRA is a backup system. Wow. Toot toot toot. Part 1: Installation Instructions --------------------------------- Installation consists of three parts, building the package, installing the binaries and sample data files, and configuring the client and server. Before you build the package you should know whether the target machine is a client or a server, as the build process is a bit different depending on the target. 1.1: Building the Package ------------------------- Before you build the package, you should poke through the Makefile and set it up for your environment. In particular, you should verify that the binaries are installed into the proper place, and that the OSLIBS match the platform upon which you are trying to install. You should also poke through config.h and make sure things are set up as you would like there. Of particular interest is the PATH_RSH macro, which you should set to point to either rsh or something equivalent (like ssh). Once that's set up, you can build the binaries. As stated above, you build differently depending on whether you're installing the client or server. If you're installing the server, you type: % make server If you're installing the client, you type: % make client Of course, if you don't care you can just type: % make and both will be built. Watch the build process for errors and correct them if you can. Then proceed to install the binaries when everything is compiled properly. Then you install the binaries and example files. To do this for the server you type: % make install-server To install the client, you type: % make install-client Finally, to install the example files: % make install-examples If you want to install everything at once you can just: % make install 1.2: Configuring ---------------- Now that everything is installed you have to do the configuration. The process is very different depending on whether you are installing a client or a server, so they're covered in different sections. All configuration files have fields seperated on word boundries, ignore blank lines, and ignore everything after a `#'. The one thing that must be taken care of for both client and server is the setup of the backup account in the password file. The backup account must be named `ticra' and must use the shell `smrsh' which was installed into BINDIR as specified in the Makefile (probably /usr/local/ticra/bin). 1.2.1: Server Configuration --------------------------- To configure the server you must edit two files: server.conf and hostlist. Hostlist contains a list of hosts to which the server will connect and make backups. Server.conf contains the rest of the server configuration. 1.2.1.1: Hostlist ----------------- The hostlist file is simply a list of hosts to which the backup operator should connect. It consists of one entry per line, each of which is a hostname. The backup operator must be able to connect to each host in the hostlist in order to back it up. The connection process uses rsh, or whatever was specified in the config.h file, so you must make sure that the backup operator can connect to the target host /before/ adding it to the hostlist file, otherwise an error will occur during the backup. 1.2.1.2: Server.conf -------------------- The server.conf file contains the real configuration information for the server. The variables that can be set and their functions are specified below: manager: The e-mail address of the backup operator. This address gets the reports generated by the reporter process. logdir: The directory that contains the info and error logs. infolog: The name of the log that contains informational output generated by the dumper process. It can be set to `-' to log to standard output instead of a file. errorlog: The name of the log that contains error output generated by the dumper process. It can be set to `-' to log to standard error instead of a file. timeout: How many seconds to wait on a opening a connection before giving up. hostlist: The filename that contains the hostlist. spooldir: The directory that backups will be stored in before they are written to disk. spoolsize: The maximum amount of data that can be written to the spooldir, in megabytes. tapedev: The name of the tape device. tapesize: The size of the tape, in megabytes. labelstr: The label on the tape to be written must start with this in order for data to be written. 1.2.2: Client Configuration --------------------------- The client end of things has three files that must be configured before it can be backed up. These three files are `disklist', `dumptypes', and `client.conf'. Disklist contains a list of disks to be backed up, what kind of backup will be performed on the disk, whether or not compression is enabled, and finally, what type of authentication should be used on the disk. Client.conf describes what each kind of backup is and what port will be used for unauthenticated backups. Because TICRA uses rsh to communicate between the client and server, you must make sure that the backup server can initiate a connection to the client over rsh (or the equivalent that was specified in config.h). 1.2.2.1: Disklist ----------------- The disklist contains one entry per line, in the following format: filesystem dumptype compression auth-type These fields are described below: filesystem: The name of the device or directory to be backed up. dumptype: The name of the type of dump to use, the exact meaning of which is set in the `dumptypes' file. compression: `uncompressed' is the only type currently supported. auth-type: `noauth' is the only type currently supported. 1.2.2.2: Dumptypes ------------------ The dumptypes file contains the definition for the dumps which are specified in the disklist. It consists of one entry per line, in the following format: dumpname dumpline estimateline regexp These fields mean: dumpname: The name of type of dump, this is what's used in `disklist'. dumpline: The command line that's used to dump this type onto standard output. Simple substitution is performed as follows: %l - Dump level. %v - Filesystem name (from disklist). %% - % estimateline: The command line that's used to get a size estimate for a filesystem. Substitution is performed as in dumpline. regexp: The regular expression which extracts the size in 1K blocks for a dump from the estimate. 1.2.2.3: Client.conf -------------------- The client.conf file specifies what port should be used for unauthenticated backups. The fields are: port: The port that will be used for unauthenticated backups. Must be a number. Part 2: Common Problems ----------------------- No one uses this, how can there be common problems? Part 3: How it Works -------------------- 3.1: Gathering Estimates ------------------------ The server machine first reads its hostlist and connects to each host sequentially over an RSH pipe, asking it for a list of disks to backup, what type of authentication to use (either `noauth' or `kerbV'), and an estimate of how much space the dump will take up. It uses this information to put together a schedule of which disks to dump and at what level to dump. If the authentication type is `noauth' then the dumper asks which port should be used to connect to for the dump. 3.2: Gathering the Dumps ------------------------ 3.2.1: The Master ----------------- Once the estimates have been gathered and the schedule has been worked out, the master dumper process then identifies itself as such via setproctitle(3) (not available on all platforms). The master then forks and execs the taper process, opening pipes to it in the process. It saves the open file descriptors for use with the slaves. In order to do the actual dumping the master then forks once for each entry in the hostlist, each of these processes will be referred to as slaves. Once all the slaves have finished, the master tells the taper that there is no further data to be written to the tape and waits for the taper to finish writing everything to the tape. When the taper is done, the master finishes off by running the report generating script, `report'. 3.2.2: The Slaves ----------------- The slaves do all the real work with regards to client communication. They first open up an RSH pipe to their client which is used as a control session. The RSH pipe execs the run-dump process and the two (dumper and run-dump) exchange control information and error messages over the pipe. The slave tells the run-dump process on the client to open up a socket on the port specified in the client's client.conf file (which was given to the server during the estimation gathering process). The slave then goes through each disk it knows about iteratively requesting a dump from the run-dump on the client. For each disk it opens a new connection to the client over the negotiated port and waits for data to start coming down the socket (from now on, we'll refer to this as the `data connection'). Normally, the slave will just put the data from the data connection into a file in the spooling directory (as specified in the server.conf file) in the format: : where is the name specified in the hostlist file on the server and is the name specified in the disklist file on the client. Once the dump is completely written to the spooling directory, the slave tells the taper to write that file to tape (using the pipe passed down from the master). If, however, the estimate for the disk says it will be too big for the spooling directory (also specified in the server.conf file), then the slave will instead dump the data directly to the taper. When all disks are finished, the slave tells run-dump that it's done and closes the RSH control connection. 3.2.2.1: Run-dump ----------------- Run-dump reads through the disklist and the client.conf file. It responds to requests from the server to execute a dump and open a connection on a given port (specified in the client.conf file and passed over to the server as part of the estimate process). When a dump is requested for a disk, run-dump executes the process for the dump type specified in the disklist. Run-dump does very crude syntax substitution on the command line, allowing it to run arbitrary commands to do the dump. This means that the only real difference between the types `dump' and `tar' is that the type `dump' is capable of calculating estimates and using leveled dumps, whereas tar is not. The only other thing run-dump responds to is a port command. This tells run-dump to open up a socket on a given port so that the server can connect to it. This is the channel that is used to send the dump over to the server. For obvious reasons, the port command needs to come before the dump command. 3.3: Putting It on to the Tape ------------------------------ Once the taper process is created, it rewinds the tape and checks the label against the one in the server.conf file, if the label doesn't match, it sends an error and dies. If the tape matches the label, however, the taper will wait on the control pipes waiting for to be told either to dump something to tape or to quit. The taper can be told to either write a file to the tape or write a stream of data. In either event, it first writes a file header describing the data and then writes the data. When the taper is done writing to the tape, and the master dumper has told the taper to quit, the taper will write an end of tape marker; in the future this will be used to store more than one periods worth of dumps on a single tape. The taper then exits. 3.3.1: Writing the Dump ----------------------- It is expected that writing to tape will be slower than grabbing the dump from the target machine (although, this is certainly not always the case!), moreover, there will be multiple dumper processes all writing to the spooling disk, but only one taper. Because of this, the taper has to keep a queue of things to write to tape while it is busy actually writing data to the tape. To accomplish this, when the taper first receives a request to write something to tape, it forks off a child to accomplish the task of actually writing a file to tape, while the parent sits on the control connection waiting for more requests and adding them to the queue. When the child dies (after it's done writing to tape, or on an error), the parent forks off another child to deal with the next item in the queue. This goes on until the master dumper says there aren't going to be any more things added to the queue. When the master dumper says everything is finished, the taper stops forking off children and goes through the rest of the queue iteratively. 3.4: Sending out the Reports ---------------------------- The reporter doesn't do much of anything fancy right now, it just mails the error and info logs to the manager and renames the log files to contain a date stamp. Part 4: What Still Needs to be Done ----------------------------------- Check out the file TODO in this directory for a list of things that still need to be accomplished. In particular, this document lies in many places. Notably, the estimate process doesn't actually grab an estimate, there is no scheduling being done, backup levels aren't being used (so, currently, there is no difference between tar and dump), and there's no facility to dump straight to tape (so you'd better make sure that none of your dumps are bigger than the spooling directory). Appendix A: Example Configuration Files --------------------------------------- +-----------+ |server.conf| +-----------+ # To whom mail should go. manager shmit@erols.com # Where to log informational and error messages. logdir /usr/local/ticra/var/log infolog - errorlog - # How many seconds to wait on a connection before giving up. timeout 10 # List of machines to backup. hostlist /usr/local/ticra/libdata/ticra/hostlist # Where dumps will be spooled before being dumped to tape. spooldir /var/holding spoolsize 50 # The tape device on which to dump. tapedev /dev/nrst0 tapesize 50 labelstr TEST00 +--------+ |hostlist| +--------+ localhost +-----------+ |client.conf| +-----------+ # Port to use for non-authenticated dumps. port 31337 # Syntax lines for dump types. Substitution rules are: # %l - dump level # %v - volume name # %% - % dump "dump -%luf - %v" tar "tar -clf - %v" +--------+ |disklist| +--------+ # Filesystem dumptype compression auth-type #--------------------------------------------------------- /tmp tar uncompressed noauth /var dump uncompressed noauth Appendix B: The Tape Format --------------------------- Physical Start of Tape TAPE LABEL: 8192 bytes Tape EOF START OF TAPE: 8192 bytes Tape EOF Repeat for each file on tape: FILE HEADER: 8192 bytes : : Machine from which the disk came. : Name of disk on said machine. : Date of dump in DDMMYYYY format Tape EOF STOP OF TAPE: 8192 bytes Tape EOF Appendix C: Design Decisions ---------------------------- This package has been designed primarily to back up data in a Very Large Organization, although it should work even at small companies without any hassle. When I thought `Very Large Organization', I saw an organization that necessarily had many different departments, all of whom want as much control as possible as to how to back up their data. After all, if the Head of Finance wants to change /var/db/account to be backup up with tar instead of dump, he shouldn't have to call over to the backup operator to do it. Moreover, I didn't want the backup operator to have to spend his days toiling with lots of different hosts and departments trying to get things done their way, and should be able to spend as little time as possible dealing with other people and the lack of reasonable communication which only complicates and confuses the matter. Keep in mind, that I work backups, and I'm lazy, so I wanted to make things as easy as possible for me. However, I think you'll find that my decisions don't make life any harder for anyone, but make it more convenient (all true lazy bones try and make life easier for everyone). To that end, I've tried to make as little as possible configurable on the server end of things, instead pushing the responsibility over to the client end. I came from an AMANDA background, so I was fairly biased when I started writing this software. I kept many of the ideas of AMANDA but thought that a few things needed changing. In particular, I wanted to separate the disks to be backed up from the hosts to be backed up, and keep the disks in a separate file on the client so the client managers could configure which disks are backed up and the type of backup that is used. I also wanted to have the client define how the dump was done, instead of the server, this simplified things in a larger environment for a number of reasons: * Across a diverse network with many different types of clients and file systems you can't be guaranteed that a program called `dump' or `tar' will exist or will do the right thing. Instead of having the backup operator know these details that are largely unimportant to him, I opted to have the person that runs the client in question fill in the blanks. * The size of the server.conf file would become rather unwieldy if it had to contain entries for every possible type of dump that would be used in an organization. * There is no need for the server to know what kind of data is in the backup stream. The other main goal was to have software that actually worked. My experience with AMANDA was mostly loathsome, mainly due to what I considered really poor design decisions, like the use of UDP, embedded data flow information, the poor use of RSH. I've endeavored to fix these design problems with what I consider to be better solutions. Apropos of that, I also endeavored to make the code clean and used a style the moves in that direction. If you're interested in it you should read style(9) on any *BSD system. I also used ANSI prototypes and function prefixes. I feel they're easy to read, and the main argument I see against them (namely, compatibility) I don't find valid with this software (I use so many POSIXism that if you have them, you'll have a compiler that can handle ANSI C).