Linux Archiving Techniques

Here's a possible solution to creating a backup of a directory tree (your home directory would be a good start). First it takes advantage of the fact that the /tmp directory is writable by everyone so that no special permissions are needed to create files and directories there. By using /tmp you can manage all of this without being root.

Next it uses the find command to collect all the file names and the full path for each file and spit them out to tar. The tar command (Tape Archiver) will create a file, and optionally, compress it and store it in the /tmp directory. The -X option to tar will instruct tar to not include any files or directories listed in the exclude file. This prevents archiving a bunch of useless stuff like the web browser's cache files, X-desktop settings and other stuff that either changes often or is specific to this machine. If you want to archive everything then leave off the -X option.

Notice that the script begins its search in the current directory (.). It assumes that you will first cd into this directory. The reason is so that tar won't include the full path. This way you can un-tar it in any directory and it will duplicate the the structure of the current directory only. If you use a full path name like, /home/acme, then the tar file will preserve that path and when you un-tar the archive it will put everything back into the same location. In short, it will overwrite the existing directories on whatever machine it is un-tarred on. Best bet, cd into the directory you want to archive first.

find . -depth | tar -X exclude -zcvf /tmp/save.tar.gz .

This next part will create a temporary .netrc file (assuming that the script is run in the home directory). This .netrc file is used to automatically login into an ftp server someplace and, optionally, execute some commands. Here the entire .netrc file is echoed into place by this script.

Following the login is a command to execute a "macro definition" (macdef) with the name, init. The init macdef name is a special name that ftp will understand to mean, “Do this right after a successful login.” So, it'll login you in and immediately execute all the commands up to the first blank line.

First I tell it to turn off prompting, then I tell it print a hash (#) for every 1Kb of data transferred. Next it will cd into the www directory. Finally it will upload (mput) the files I archived above.

echo "machine acme.com login acme password whizbang
macdef init
prompt
hash
cd www
mput /tmp/saved.tar.gz

Finally I issue the command to quit ftp and make sure to follow that with a blank line so ftp will know that this ends the macdef definition. Since all of this was being echoed, re-direct the output to the .netrc file (I'm in my home directory of course).

quit

">.netrc

Now that there's a .netrc file I can go ahead and do the ftp ...

ftp acme@acme.com

Let's remove the archive since it's not needed any more.

rm /tmp/save.tar.gz

To finish up I put another version of the .netrc file for other ftp sites ...

echo " machine freon.net login will password pollywop
machine faraway.net login edna password pressing
machine ftpsite.com login beme password pulpie
macdef init
hash
prompt
cd www
ls
">.netrc

The script is an example of how you can automate files transferred to another machine either for backup or because you want the files available for access elsewhere.

NOTE: Everything above that's indented is part of a single shell script (or shell function) so you can just copy and paste it as is. Change the directory names, file names, host names and user names to match your setup.

Another example adds some stuff to give you more control over what gets archived and where it goes:

if [ -d /tmp/saved ]
then
rm -r /tmp/saved
mkdir /tmp/saved
fi

The first part above will check to see if a directory exists and, if it does, removes it and then re-creates it. Doing it this way just prevents the shell from complaining if the directory doesn't exist and I try to remove it (there's other ways to test this of course).

Next I check to see if an earlier version of the archive exists and delete it if it does:

if [ -f /root/save.tar.gz ]
then
rm -r /root/save.tar.gz
fi

Here again I use the find command to collect all the file names along with the full path for each ...

find /root -depth |cpio -pdv /tmp/saved

This time I use cpio with -p option to copy the file names passed by the find command to the directory I created in the previous step above. The find command just returns file names with their path information and cpio will copy the named files some place. With this command line you can copy entire directory trees to other locations. Next I cd into another directory and repeat the command,

cd /home
find acme -depth |cpio -pdv /tmp/saved

By first changing to the directory of the parent of the one I want to copy, the find command will return only the path information for the files I'm copying. The advantage in this case is that the resulting directory structure will only have the stuff I care about. Since I'm copying to the /tmp/saved directory I want the shortest paths possible:

/tmp/saved/acme

Without the cd first, the path would be:

/tmp/saved/home/acme

I don't need that extra directory level, hence the cd command. Next I cd to the /tmp/saved directory where I just copied everything. I will create the compressed archive here for the same reason as before; to simplify the path information.

cd /tmp/saved
tar -zcvf /root/save.tar.gz . -X /root/exclude

Notice that the archive was actually created in the /root directory. Now that I have the archive, I can cd to where the archive is located and delete the /tmp/saved directory and recover the disk space:

cd /root
rm -rf /tmp/saved

Now we can do something different with the archive file we've just created. Earlier we used ftp and the .netrc file to automatically upload the archive to another machine somewhere. This time we'll try something else.

Let's suppose that we created this archive script on another machine and now we need to execute the script on that machine. Rather than logging in over there and running the script, try this:

ssh other-host.work.net /root/cleanup
sleep 120
scp other-host.work.net:save.tar.gz /root

What this does is connect to the remote host and execute the archive script (cleanup) there. This will do all the stuff there we talked about above. This time the archive is created on another machine. We'll wait two minutes for it to finish and then copy it from the remote machine to this machine. There are other ways to do this but the idea here is to present a couple of uses for ssh. The ssh utilities let you connect to other machines through an encrypted connection.

This means that no one can see what you're doing. While it's still possible for someone to intercept the data you're transmitting and receiving, they can't make any sense of it because it's encrypted and will look like indecipherable garbage. Only a machine having your authorization can decrypt the data. The ssh connection is about all most networks use these days so telnet and even ftp are rarely acceptable.

The third line above uses the ssh version of rcp (remote copy), scp, to copy files to and from other machines. There is also sftp for ftp type connections. Yet another way to copy files around is:

rsync -az -e ssh acme@acme.work.nks.net:saved.tar.gz .

This uses the rsync command to copy the archive from the remote host to this host and also invokes ssh to create an encrypted connection. The rsync command is one of the handier ways to shuffle stuff around and it has a bunch of options and capabilities. One of its neater features is keeping files on two or more machines identical. Files will only be copied if they are different and even then only the differences are actually copied. You don't have to copy entire files every time so the transfer times are usually quite small once the initial copy is made.

Return to the Previous Page