第 10 章 Data management

目录

10.1. Sharing, copying, and archiving
10.1.1. Archive and compression tools
10.1.2. Copy and synchronization tools
10.1.3. Idioms for the archive
10.1.4. Idioms for the copy
10.1.5. Idioms for the selection of files
10.1.6. Archive media
10.1.7. Removable storage device
10.1.8. Filesystem choice for sharing data
10.1.9. Sharing data via network
10.2. Backup and recovery
10.2.1. Backup utility suites
10.2.2. An example script for the system backup
10.2.3. A copy script for the data backup
10.3. Data security infrastructure
10.3.1. Key management for GnuPG
10.3.2. Using GnuPG on files
10.3.3. Using GnuPG with Mutt
10.3.4. Using GnuPG with Vim
10.3.5. The MD5 sum
10.4. Source code merge tools
10.4.1. Extracting differences for source files
10.4.2. Merging updates for source files
10.4.3. Updating via 3-way-merge
10.5. Version control systems
10.5.1. Comparison of VCS commands
10.6. Git
10.6.1. 配置 Git 客户端
10.6.2. Git 参考
10.6.3. Git 命令
10.6.4. 用于 Subversion 仓库的 Git
10.6.5. 记录配置历史的 Git
10.7. CVS
10.7.1. Configuration of CVS repository
10.7.2. Local access to CVS
10.7.3. Remote access to CVS with pserver
10.7.4. Remote access to CVS with ssh
10.7.5. Importing a new source to CVS
10.7.6. File permissions in CVS repository
10.7.7. Work flow of CVS
10.7.8. Latest files from CVS
10.7.9. Administration of CVS
10.7.10. Execution bit for CVS checkout
10.8. Subversion
10.8.1. Configuration of Subversion repository
10.8.2. Access to Subversion via Apache2 server
10.8.3. Local access to Subversion by group
10.8.4. Remote access to Subversion via SSH
10.8.5. Subversion directory structure
10.8.6. Importing a new source to Subversion
10.8.7. Work flow of Subversion

Tools and tips for managing binary and text data on the Debian system are described.

[警告] 警告

The uncoordinated write access to actively accessed devices and files from multiple processes must not be done to avoid the race condition. File locking mechanisms using flock(1) may be used to avoid it.

The security of the data and its controlled sharing have several aspects.

  • The creation of data archive

  • The remote storage access

  • The duplication

  • The tracking of the modification history

  • The facilitation of data sharing

  • The prevention of unauthorized file access

  • The detection of unauthorized file modification

These can be realized by using some combination of tools.

  • Archive and compression tools

  • Copy and synchronization tools

  • Network filesystems

  • Removable storage media

  • The secure shell

  • The authentication system

  • Version control system tools

  • Hash and cryptographic encryption tools

Here is a summary of archive and compression tools available on the Debian system.

表 10.1. List of archive and compression tools

软件包 流行度 大小 extension 命令 comment
tar V:904, I:999 2749 .tar tar(1) the standard archiver (de facto standard)
cpio V:373, I:998 712 .cpio cpio(1) Unix System V style archiver, use with find(1)
binutils V:210, I:722 22155 .ar ar(1) archiver for the creation of static libraries
fastjar V:6, I:60 171 .jar fastjar(1) archiver for Java (zip like)
pax V:17, I:58 170 .pax pax(1) new POSIX standard archiver, compromise between tar and cpio
gzip V:873, I:999 225 .gz gzip(1), zcat(1), … GNU LZ77 compression utility (de facto standard)
bzip2 V:257, I:929 84 .bz2 bzip2(1), bzcat(1), … Burrows-Wheeler block-sorting compression utility with higher compression ratio than gzip(1) (slower than gzip with similar syntax)
lzma V:5, I:69 144 .lzma lzma(1) LZMA compression utility with higher compression ratio than gzip(1) (deprecated)
xz-utils V:321, I:952 511 .xz xz(1), xzdec(1), … XZ compression utility with higher compression ratio than bzip2(1) (slower than gzip but faster than bzip2; replacement for LZMA compression utility)
p7zip V:38, I:150 862 .7z 7zr(1), p7zip(1) 7-Zip file archiver with high compression ratio (LZMA compression)
p7zip-full V:188, I:529 4215 .7z 7z(1), 7za(1) 7-Zip file archiver with high compression ratio (LZMA compression and others)
lzop V:5, I:42 92 .lzo lzop(1) LZO compression utility with higher compression and decompression speed than gzip(1) (lower compression ratio than gzip with similar syntax)
zip V:49, I:389 572 .zip zip(1) InfoZIP: DOS archive and compression tool
unzip V:300, I:790 506 .zip unzip(1) InfoZIP: DOS unarchive and decompression tool

[警告] 警告

Do not set the "$TAPE" variable unless you know what to expect. It changes tar(1) behavior.

[注意] 注意

The gzipped tar(1) archive uses the file extension ".tgz" or ".tar.gz".

[注意] 注意

The xz-compressed tar(1) archive uses the file extension ".txz" or ".tar.xz".

[注意] 注意

Popular compression method in FOSS tools such as tar(1) has been moving as follows: gzipbzip2xz

[注意] 注意

cp(1), scp(1) and tar(1) may have some limitation for special files. cpio(1) is most versatile.

[注意] 注意

cpio(1) is designed to be used with find(1) and other commands and suitable for creating backup scripts since the file selection part of the script can be tested independently.

[注意] 注意

Internal structure of Libreoffice data files are ".jar" file.

Here is a summary of simple copy and backup tools available on the Debian system.


Copying files with rsync(8) offers richer features than others.

  • delta-transfer algorithm that sends only the differences between the source files and the existing files in the destination

  • quick check algorithm (by default) that looks for files that have changed in size or in last-modified time

  • "--exclude" and "--exclude-from" options similar to tar(1)

  • "a trailing slash on the source directory" syntax that avoids creating an additional directory level at the destination.

[提示] 提示

Execution of the bkup script mentioned in 第 10.2.3 节 “A copy script for the data backup” with the "-gl" option under cron(8) should provide very similar functionality as Plan9's dumpfs for the static data archive.

[提示] 提示

Version control system (VCS) tools in 表 10.11 “List of version control system tools” can function as the multi-way copy and synchronization tools.

Here are several ways to copy the entire content of the directory "./source" using different tools.

  • Local copy: "./source" directory → "/dest" directory

  • Remote copy: "./source" directory at local host → "/dest" directory at "user@host.dom" host

rsync(8):

# cd ./source; rsync -aHAXSv . /dest
# cd ./source; rsync -aHAXSv . user@host.dom:/dest

You can alternatively use "a trailing slash on the source directory" syntax.

# rsync -aHAXSv ./source/ /dest
# rsync -aHAXSv ./source/ user@host.dom:/dest

Alternatively, by the following.

# cd ./source; find . -print0 | rsync -aHAXSv0 --files-from=- . /dest
# cd ./source; find . -print0 | rsync -aHAXSv0 --files-from=- . user@host.dom:/dest

GNU cp(1) 和 openSSH scp(1):

# cd ./source; cp -a . /dest
# cd ./source; scp -pr . user@host.dom:/dest

GNU tar(1):

# (cd ./source && tar cf - . ) | (cd /dest && tar xvfp - )
# (cd ./source && tar cf - . ) | ssh user@host.dom '(cd /dest && tar xvfp - )'

cpio(1):

# cd ./source; find . -print0 | cpio -pvdm --null --sparse /dest

You can substitute "." with "foo" for all examples containing "." to copy files from "./source/foo" directory to "/dest/foo" directory.

You can substitute "." with the absolute path "/path/to/source/foo" for all examples containing "." to drop "cd ./source;". These copy files to different locations depending on tools used as follows.

  • "/dest/foo": rsync(8), GNU cp(1), 和 scp(1)

  • "/dest/path/to/source/foo": GNU tar(1), 和 cpio(1)

[提示] 提示

rsync(8) and GNU cp(1) have option "-u" to skip files that are newer on the receiver.

find(1) is used to select files for archive and copy commands (see 第 10.1.3 节 “Idioms for the archive” and 第 10.1.4 节 “Idioms for the copy”) or for xargs(1) (see 第 9.3.9 节 “使用文件循环来重复一个命令”). This can be enhanced by using its command arguments.

Basic syntax of find(1) can be summarized as the following.

  • Its conditional arguments are evaluated from left to right.

  • This evaluation stops once its outcome is determined.

  • "Logical OR" (specified by "-o" between conditionals) has lower precedence than "logical AND" (specified by "-a" or nothing between conditionals).

  • "Logical NOT" (specified by "!" before a conditional) has higher precedence than "logical AND".

  • "-prune" always returns logical TRUE and, if it is a directory, searching of file is stopped beyond this point.

  • "-name" matches the base of the filename with shell glob (see 第 1.5.6 节 “Shell glob”) but it also matches its initial "." with metacharacters such as "*" and "?". (New POSIX feature)

  • "-regex" matches the full path with emacs style BRE (see 第 1.6.2 节 “正则表达式”) as default.

  • "-size" matches the file based on the file size (value precedented with "+" for larger, precedented with "-" for smaller)

  • "-newer" matches the file newer than the one specified in its argument.

  • "-print0" always returns logical TRUE and print the full filename (null terminated) on the standard output.

find(1) is often used with an idiomatic style as the following.

# find /path/to \
    -xdev -regextype posix-extended \
    -type f -regex ".*\.cpio|.*~" -prune -o \
    -type d -regex ".*/\.git" -prune -o \
    -type f -size +99M -prune -o \
    -type f -newer /path/to/timestamp -print0

This means to do following actions.

  1. Search all files starting from "/path/to"

  2. Globally limit its search within its starting filesystem and uses ERE (see 第 1.6.2 节 “正则表达式”) instead

  3. Exclude files matching regex of ".*\.cpio" or ".*~" from search by stop processing

  4. Exclude directories matching regex of ".*/\.git" from search by stop processing

  5. Exclude files larger than 99 Megabytes (units of 1048576 bytes) from search by stop processing

  6. Print filenames which satisfy above search conditions and are newer than "/path/to/timestamp"

Please note the idiomatic use of "-prune -o" to exclude files in the above example.

[注意] 注意

For non-Debian Unix-like system, some options may not be supported by find(1). In such a case, please consider to adjust matching methods and replace "-print0" with "-print". You may need to adjust related commands too.

When choosing computer data storage media for important data archive, you should be careful about their limitations. For small personal data backup, I use CD-R and DVD-R by the brand name company and store in a cool, shaded, dry, clean environment. (Tape archive media seem to be popular for professional use.)

[注意] 注意

A fire-resistant safe are meant for paper documents. Most of the computer data storage media have less temperature tolerance than paper. I usually rely on multiple secure encrypted copies stored in multiple secure locations.

Optimistic storage life of archive media seen on the net (mostly from vendor info).

  • 100+ years : Acid free paper with ink

  • 100 years : Optical storage (CD/DVD, CD/DVD-R)

  • 30 years : Magnetic storage (tape, floppy)

  • 20 years : Phase change optical storage (CD-RW)

These do not count on the mechanical failures due to handling etc.

Optimistic write cycle of archive media seen on the net (mostly from vendor info).

  • 250,000+ cycles : Harddisk drive

  • 10,000+ cycles : Flash memory

  • 1,000 cycles : CD/DVD-RW

  • 1 cycles : CD/DVD-R, paper

[小心] 小心

Figures of storage life and write cycle here should not be used for decisions on any critical data storage. Please consult the specific product information provided by the manufacture.

[提示] 提示

Since CD/DVD-R and paper have only 1 write cycle, they inherently prevent accidental data loss by overwriting. This is advantage!

[提示] 提示

If you need fast and frequent backup of large amount of data, a hard disk on a remote host linked by a fast network connection, may be the only realistic option.

Removable storage devices may be any one of the following.

They may be connected via any one of the following.

Modern desktop environments such as GNOME and KDE can mount these removable devices automatically without a matching "/etc/fstab" entry.

  • udisks package provides a daemon and associated utilities to mount and unmount these devices.

  • D-bus creates events to initiate automatic processes.

  • PolicyKit provides required privileges.

[提示] 提示

Automounted devices may have the "uhelper=" mount option which is used by umount(8).

[提示] 提示

Automounting under modern desktop environment happens only when those removable media devices are not listed in "/etc/fstab".

Mount point under modern desktop environment is chosen as "/media/<disk_label>" which can be customized by the following.

  • mlabel(1) for FAT filesystem

  • genisoimage(1) with "-V" option for ISO9660 filesystem

  • tune2fs(1) with "-L" option for ext2/ext3/ext4 filesystem

[提示] 提示

The choice of encoding may need to be provided as mount option (see 第 8.3.6 节 “文件名编码”).

[提示] 提示

The use of the GUI menu to unmount a filesystem may remove its dynamically generated device node such as "/dev/sdc". If you wish to keep its device node, unmount it with the umount(8) command from the shell prompt.

When sharing data with other system via removable storage device, you should format it with common filesystem supported by both systems. Here is a list of filesystem choices.


[提示] 提示

See 第 9.8.1 节 “Removable disk encryption with dm-crypt/LUKS” for cross platform sharing of data using device level encryption.

The FAT filesystem is supported by almost all modern operating systems and is quite useful for the data exchange purpose via removable hard disk like media.

When formatting removable hard disk like devices for cross platform sharing of data with the FAT filesystem, the following should be safe choices.

When using the FAT or ISO9660 filesystems for sharing data, the following should be the safe considerations.

  • Archiving files into an archive file first using tar(1), or cpio(1) to retain the long filename, the symbolic link, the original Unix file permission and the owner information.

  • Splitting the archive file into less than 2 GiB chunks with the split(1) command to protect it from the file size limitation.

  • Encrypting the archive file to secure its contents from the unauthorized access.

[注意] 注意

For FAT filesystems by its design, the maximum file size is (2^32 - 1) bytes = (4GiB - 1 byte). For some applications on the older 32 bit OS, the maximum file size was even smaller (2^31 - 1) bytes = (2GiB - 1 byte). Debian does not suffer the latter problem.

[注意] 注意

Microsoft itself does not recommend to use FAT for drives or partitions of over 200 MB. Microsoft highlights its short comings such as inefficient disk space usage in their "Overview of FAT, HPFS, and NTFS File Systems". Of course, we should normally use the ext4 filesystem for Linux.

[提示] 提示

For more on filesystems and accessing filesystems, please read "Filesystems HOWTO".

We all know that computers fail sometime or human errors cause system and data damages. Backup and recovery operations are the essential part of successful system administration. All possible failure modes hit you some day.

[提示] 提示

Keep your backup system simple and backup your system often. Having backup data is more important than how technically good your backup method is.

There are 3 key factors which determine actual backup and recovery policy.

  1. Knowing what to backup and recover.

    • Data files directly created by you: data in "~/"

    • Data files created by applications used by you: data in "/var/" (except "/var/cache/", "/var/run/", and "/var/tmp/")

    • System configuration files: data in "/etc/"

    • Local softwares: data in "/usr/local/" or "/opt/"

    • System installation information: a memo in plain text on key steps (partition, …)

    • Proven set of data: confirmed by experimental recovery operations in advance

  2. Knowing how to backup and recover.

    • Secure storage of data: protection from overwrite and system failure

    • Frequent backup: scheduled backup

    • Redundant backup: data mirroring

    • Fool proof process: easy single command backup

  3. Assessing risks and costs involved.

    • Value of data when lost

    • Required resources for backup: human, hardware, software, …

    • Failure mode and their possibility

[注意] 注意

Do not back up the pseudo-filesystem contents found on /proc, /sys, /tmp, and /run (see 第 1.2.12 节 “procfs 和 sysfs” and 第 1.2.13 节 “tmpfs”). Unless you know exactly what you are doing, they are huge useless data.

As for secure storage of data, data should be at least on different disk partitions preferably on different disks and machines to withstand the filesystem corruption. Important data are best stored on a write-once media such as CD/DVD-R to prevent overwrite accidents. (See 第 9.7 节 “二进制数据” for how to write to the storage media from the shell commandline. GNOME desktop GUI environment gives you easy access via menu: "Places→CD/DVD Creator".)

[注意] 注意

You may wish to stop some application daemons such as MTA (see 第 6.3 节 “Mail transport agent (MTA)”) while backing up data.

[注意] 注意

You should pay extra care to the backup and restoration of identity related data files such as "/etc/ssh/ssh_host_dsa_key", "/etc/ssh/ssh_host_rsa_key", "~/.gnupg/*", "~/.ssh/*", "/etc/passwd", "/etc/shadow", "/etc/fetchmailrc", "popularity-contest.conf", "/etc/ppp/pap-secrets", and "/etc/exim4/passwd.client". Some of these data can not be regenerated by entering the same input string to the system.

[注意] 注意

If you run a cron job as a user process, you must restore files in "/var/spool/cron/crontabs" directory and restart cron(8). See 第 9.3.14 节 “定时任务安排” for cron(8) and crontab(1).

Here is a select list of notable backup utility suites available on the Debian system.


Backup tools have their specialized focuses.

  • Mondo Rescue is a backup system to facilitate restoration of complete system quickly from backup CD/DVD etc. without going through normal system installation processes.

  • sbackup and keep packages provide easy GUI frontend for desktop users to make regular backups of user data. An equivalent function can be realized by a simple script (第 10.2.2 节 “An example script for the system backup”) and cron(8).

  • Bacula, Amanda, and BackupPC are full featured backup suite utilities which are focused on regular backups over network.

Basic tools described in 第 10.1.1 节 “Archive and compression tools” and 第 10.1.2 节 “Copy and synchronization tools” can be used to facilitate system backup via custom scripts. Such script can be enhanced by the following.

  • The obnam package enables incremental (remote) backups.

  • The rdiff-backup package enables incremental (remote) backups.

  • The dump package helps to archive and restore the whole filesystem incrementally and efficiently.

[提示] 提示

See files in "/usr/share/doc/dump/" and "Is dump really deprecated?" to learn about the dump package.

For a personal Debian desktop system running unstable suite, I only need to protect personal and critical data. I reinstall system once a year anyway. Thus I see no reason to backup the whole system or to install a full featured backup utility.

I use a simple script to make a backup archive and burn it into CD/DVD using GUI. Here is an example script for this.

#!/bin/sh -e
# Copyright (C) 2007-2008 Osamu Aoki <osamu@debian.org>, Public Domain
BUUID=1000; USER=osamu # UID and name of a user who accesses backup files
BUDIR="/var/backups"
XDIR0=".+/Mail|.+/Desktop"
XDIR1=".+/\.thumbnails|.+/\.?Trash|.+/\.?[cC]ache|.+/\.gvfs|.+/sessions"
XDIR2=".+/CVS|.+/\.git|.+/\.svn|.+/Downloads|.+/Archive|.+/Checkout|.+/tmp"
XSFX=".+\.iso|.+\.tgz|.+\.tar\.gz|.+\.tar\.bz2|.+\.cpio|.+\.tmp|.+\.swp|.+~"
SIZE="+99M"
DATE=$(date --utc +"%Y%m%d-%H%M")
[ -d "$BUDIR" ] || mkdir -p "BUDIR"
umask 077
dpkg --get-selections \* > /var/lib/dpkg/dpkg-selections.list
debconf-get-selections > /var/cache/debconf/debconf-selections

{
find /etc /usr/local /opt /var/lib/dpkg/dpkg-selections.list \
     /var/cache/debconf/debconf-selections -xdev -print0
find /home/$USER /root -xdev -regextype posix-extended \
  -type d -regex "$XDIR0|$XDIR1" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
find /home/$USER/Mail/Inbox /home/$USER/Mail/Outbox -print0
find /home/$USER/Desktop  -xdev -regextype posix-extended \
  -type d -regex "$XDIR2" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
} | cpio -ov --null -O $BUDIR/BU$DATE.cpio
chown $BUUID $BUDIR/BU$DATE.cpio
touch $BUDIR/backup.stamp

This is meant to be a script example executed from root.

I expect you to change and execute this as follows.

Keep it simple!

[提示] 提示

You can recover debconf configuration data with "debconf-set-selections debconf-selections" and dpkg selection data with "dpkg --set-selection <dpkg-selections.list".

For the set of data under a directory tree, the copy with "cp -a" provides the normal backup.

For the set of large non-overwritten static data under a directory tree such as the one under the "/var/cache/apt/packages/" directory, hardlinks with "cp -al" provide an alternative to the normal backup with efficient use of the disk space.

Here is a copy script, which I named as bkup, for the data backup. This script copies all (non-VCS) files under the current directory to the dated directory on the parent directory or on a remote host.

#!/bin/sh -e
# Copyright (C) 2007-2008 Osamu Aoki <osamu@debian.org>, Public Domain
fdot(){ find . -type d \( -iname ".?*" -o -iname "CVS" \) -prune -o -print0;}
fall(){ find . -print0;}
mkdircd(){ mkdir -p "$1";chmod 700 "$1";cd "$1">/dev/null;}
FIND="fdot";OPT="-a";MODE="CPIOP";HOST="localhost";EXTP="$(hostname -f)"
BKUP="$(basename $(pwd)).bkup";TIME="$(date  +%Y%m%d-%H%M%S)";BU="$BKUP/$TIME"
while getopts gcCsStrlLaAxe:h:T f; do case $f in
g)  MODE="GNUCP";; # cp (GNU)
c)  MODE="CPIOP";; # cpio -p
C)  MODE="CPIOI";; # cpio -i
s)  MODE="CPIOSSH";; # cpio/ssh
t)  MODE="TARSSH";; # tar/ssh
r)  MODE="RSYNCSSH";; # rsync/ssh
l)  OPT="-alv";; # hardlink (GNU cp)
L)  OPT="-av";;  # copy (GNU cp)
a)  FIND="fall";; # find all
A)  FIND="fdot";; # find non CVS/ .???/
x)  set -x;; # trace
e)  EXTP="${OPTARG}";; # hostname -f
h)  HOST="${OPTARG}";; # user@remotehost.example.com
T)  MODE="TEST";; # test find mode
\?) echo "use -x for trace."
esac; done
shift $(expr $OPTIND - 1)
if [ $# -gt 0 ]; then
  for x in $@; do cp $OPT $x $x.$TIME; done
elif [ $MODE = GNUCP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU";cp $OPT . "../$BU/"
elif [ $MODE = CPIOP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU"
  $FIND|cpio --null --sparse -pvd ../$BU
elif [ $MODE = CPIOI ]; then
  $FIND|cpio -ov --null | ( mkdircd "../$BU"&&cpio -i )
elif [ $MODE = CPIOSSH ]; then
  $FIND|cpio -ov --null|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&&cpio -i )"
elif [ $MODE = TARSSH ]; then
  (tar cvf - . )|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&& tar xvfp - )"
elif [ $MODE = RSYNCSSH ]; then
  rsync -aHAXSv ./ "${HOST}:${EXTP}-${BKUP}-${TIME}"
else
  echo "Any other idea to backup?"
  $FIND |xargs -0 -n 1 echo
fi

This is meant to be command examples. Please read script and edit it by yourself before using it.

[提示] 提示

I keep this bkup in my "/usr/local/bin/" directory. I issue this bkup command without any option in the working directory whenever I need a temporary snapshot backup.

[提示] 提示

For making snapshot history of a source file tree or a configuration file tree, it is easier and space efficient to use git(7) (see 第 10.6.5 节 “记录配置历史的 Git”).

The data security infrastructure is provided by the combination of data encryption tool, message digest tool, and signature tool.


See 第 9.8 节 “Data encryption tips” on dm-crypto and ecryptfs which implement automatic data encryption infrastructure via Linux kernel modules.

Here are GNU Privacy Guard commands for the basic key management.


Here is the meaning of the trust code.


The following uploads my key "1DD8D791" to the popular keyserver "hkp://keys.gnupg.net".

$ gpg --keyserver hkp://keys.gnupg.net --send-keys 1DD8D791

A good default keyserver set up in "~/.gnupg/gpg.conf" (or old location "~/.gnupg/options") contains the following.

keyserver hkp://keys.gnupg.net

The following obtains unknown keys from the keyserver.

$ gpg --list-sigs --with-colons | grep '^sig.*\[User ID not found\]' |\
  cut -d ':' -f 5| sort | uniq | xargs gpg --recv-keys

There was a bug in OpenPGP Public Key Server (pre version 0.9.6) which corrupted key with more than 2 sub-keys. The newer gnupg (>1.2.1-2) package can handle these corrupted subkeys. See gpg(1) under "--repair-pks-subkey-bug" option.

md5sum(1) provides utility to make a digest file using the method in rfc1321 and verifying each file with it.

$ md5sum foo bar >baz.md5
$ cat baz.md5
d3b07384d113edec49eaa6238ad5ff00  foo
c157a79031e1c40f85931829bc5fc552  bar
$ md5sum -c baz.md5
foo: OK
bar: OK
[注意] 注意

The computation for the MD5 sum is less CPU intensive than the one for the cryptographic signature by GNU Privacy Guard (GnuPG). Usually, only the top level digest file is cryptographically signed to ensure data integrity.

There are many merge tools for the source code. Following commands caught my eyes.

表 10.10. List of source code merge tools

软件包 流行度 大小 命令 说明
diffutils V:847, I:973 1319 diff(1) compare files line by line
diffutils V:847, I:973 1319 diff3(1) compare and merges three files line by line
vim V:127, I:390 2366 vimdiff(1) compare 2 files side by side in vim
patch V:119, I:938 191 patch(1) apply a diff file to an original
dpatch V:1, I:18 191 dpatch(1) manage series of patches for Debian package
diffstat V:20, I:190 65 diffstat(1) produce a histogram of changes by the diff
patchutils V:19, I:182 189 combinediff(1) create a cumulative patch from two incremental patches
patchutils V:19, I:182 189 dehtmldiff(1) extract a diff from an HTML page
patchutils V:19, I:182 189 filterdiff(1) extract or excludes diffs from a diff file
patchutils V:19, I:182 189 fixcvsdiff(1) fix diff files created by CVS that patch(1) mis-interprets
patchutils V:19, I:182 189 flipdiff(1) exchange the order of two patches
patchutils V:19, I:182 189 grepdiff(1) show which files are modified by a patch matching a regex
patchutils V:19, I:182 189 interdiff(1) show differences between two unified diff files
patchutils V:19, I:182 189 lsdiff(1) show which files are modified by a patch
patchutils V:19, I:182 189 recountdiff(1) recompute counts and offsets in unified context diffs
patchutils V:19, I:182 189 rediff(1) fix offsets and counts of a hand-edited diff
patchutils V:19, I:182 189 splitdiff(1) separate out incremental patches
patchutils V:19, I:182 189 unwrapdiff(1) demangle patches that have been word-wrapped
wiggle V:0, I:0 166 wiggle(1) apply rejected patches
quilt V:4, I:46 710 quilt(1) manage series of patches
meld V:11, I:42 3022 meld(1) compare and merge files (GTK)
dirdiff V:0, I:3 144 dirdiff(1) display differences and merge changes between directory trees
docdiff V:0, I:0 573 docdiff(1) compare two files word by word / char by char
imediff2 V:0, I:0 34 imediff2(1) interactive full screen 2-way merge tool
makepatch V:0, I:0 102 makepatch(1) generate extended patch files
makepatch V:0, I:0 102 applypatch(1) apply extended patch files
wdiff V:6, I:90 643 wdiff(1) display word differences between text files

Here is a summary of the version control systems (VCS) on the Debian system.

[注意] 注意

If you are new to VCS systems, you should start learning with Git, which is growing fast in popularity.


VCS is sometimes known as revision control system (RCS), or software configuration management (SCM).

Distributed VCS such as Git is the tool of choice these days. CVS and Subversion may still be useful to join some existing open source program activities.

Debian provides free VCS services via Debian Alioth service. It supports practically all VCSs. Its documentation can be found at http://wiki.debian.org/Alioth .

There are few basics for creating a shared access VCS archive.

Here is an oversimplified comparison of native VCS commands to provide the big picture. The typical command sequence may require options and arguments.


[小心] 小心

Invoking a git subcommand directly as "git-xyz" from the command line has been deprecated since early 2006.

[提示] 提示

If there is a executable file git-foo in the path specified by $PATH, entring "git foo" without hyphen to the command line invokes this git-foo. This is a feature of the git command.

[提示] 提示

GUI tools such as tkcvs(1) and gitk(1) really help you with tracking revision history of files. The web interface provided by many public archives for browsing their repositories is also quite useful, too.

[提示] 提示

Git can work directly with different VCS repositories such as ones provided by CVS and Subversion, and provides the local repository for local changes with git-cvs and git-svn packages. See git for CVS users, and 第 10.6.4 节 “用于 Subversion 仓库的 Git”.

[提示] 提示

Git has commands which have no equivalents in CVS and Subversion: "fetch", "rebase", "cherry-pick", …

Git 可以用来做本地和远程源代码管理的任何事情。这意味着,你能够在本地记录源代码修改,而不是必须要和远程仓库有网络连接。

参见下面内容。

git-gui(1)gitk(1) 命令使 Git 变得非常容易使用。

[警告] 警告

不要使用带空格的标签字符串。即使一些工具,如 gitk(1) 允许你使用它,但会阻碍其它 git 命令。

即使你的上游使用不同的版本控制系统,使用 git(1) 作为本地活动的版本控制系统,仍然是一个好的主意,因为 git 可以让你在没有上游网络连接的情况下,管理你的本地源代码树拷贝。这里有一些 git(1) 使用的包和命令。


[提示] 提示

With git(1), you work on a local branch with many commits and use something like "git rebase -i master" to reorganize change history later. This enables you to make clean change history. See git-rebase(1) and git-cherry-pick(1).

[提示] 提示

When you want to go back to a clean working directory without loosing the current state of the working directory, you can use "git stash". See git-stash(1).

你可以使用 Git 工具来手工记录按时间先后顺序的配置历史。这里是一个例子,让你练习记录"/etc/apt/" 内容。

$ cd /etc/apt/
$ sudo git init
$ sudo chmod 700 .git
$ sudo git add .
$ sudo git commit -a

提交配置,描述此次提交。

对配置文件进行修改。

$ cd /etc/apt/
$ sudo git commit -a

提交配置,说明提交,继续你的工作。

$ cd /etc/apt/
$ sudo gitk --all

你有全部的配置历史。

[注意] 注意

sudo(8) 是需要用于配置数据文件,任意文件权限的情况。 对于普通用户的配置数据,你需要省略 sudo

[注意] 注意

在上面例子里的 "chmod 700 .git" 命令,是用来保护文档数据不被未经授权的读访问。

[提示] 提示

要更加完整的建立配置历史记录,请参阅 etckeeper 包: 第 9.2.10 节 “记录配置文件的变更”

参见下面内容。

  • cvs(1)

  • "/usr/share/doc/cvs/html-cvsclient"

  • "/usr/share/doc/cvs/html-info"

  • "/usr/share/doc/cvsbook"

  • "info cvs"

Many public CVS servers provide read-only remote access to them with account name "anonymous" via pserver service. For example, Debian web site contents are maintained by webwml project via CVS at Debian alioth service. The following sets up "$CVSROOT" for the remote access to this CVS repository.

$ export CVSROOT=:pserver:anonymous@anonscm.debian.org:/cvs/webwml
$ cvs login
[注意] 注意

Since pserver is prone to eavesdropping attack and insecure, write access is usually disable by server administrators.

The following sets up "$CVS_RSH" and "$CVSROOT" for the remote access to the CVS repository by webwml project with SSH.

$ export CVS_RSH=ssh
$ export CVSROOT=:ext:account@cvs.alioth.debian.org:/cvs/webwml

You can also use public key authentication for SSH which eliminates the remote password prompt.

Here is an example of typical work flow using CVS.

Check all available modules from CVS project pointed by "$CVSROOT" by the following.

$ cvs rls
CVSROOT
module1
module2
...

Checkout "module1" to its default directory "./module1" by the following.

$ cd ~/path/to
$ cvs co module1
$ cd module1

Make changes to the content as needed.

Check changes by making "diff -u [repository] [local]" equivalent by the following.

$ cvs diff -u

You find that you broke some file "file_to_undo" severely but other files are fine.

Overwrite "file_to_undo" file with the clean copy from CVS by the following.

$ cvs up -C file_to_undo

Save the updated local source tree to CVS by the following.

$ cvs ci -m "Describe change"

Create and add "file_to_add" file to CVS by the following.

$ vi file_to_add
$ cvs add file_to_add
$ cvs ci -m "Added file_to_add"

Merge the latest version from CVS by the following.

$ cvs up -d

Watch out for lines starting with "C filename" which indicates conflicting changes.

Look for unmodified code in ".#filename.version".

Search for "<<<<<<<" and ">>>>>>>" in files for conflicting changes.

Edit files to fix conflicts as needed.

Add a release tag "Release-1" by the following.

$ cvs ci -m "last commit for Release-1"
$ cvs tag Release-1

Edit further.

Remove the release tag "Release-1" by the following.

$ cvs tag -d Release-1

Check in changes to CVS by the following.

$ cvs ci -m "real last commit for Release-1"

Re-add the release tag "Release-1" to updated CVS HEAD of main by the following.

$ cvs tag Release-1

Create a branch with a sticky branch tag "Release-initial-bugfixes" from the original version pointed by the tag "Release-initial" and check it out to "~/path/to/old" directory by the following.

$ cvs rtag -b -r Release-initial Release-initial-bugfixes module1
$ cd ~/path/to
$ cvs co -r Release-initial-bugfixes -d old module1
$ cd old
[提示] 提示

Use "-D 2005-12-20" (ISO 8601 date format) instead of "-r Release-initial" to specify particular date as the branch point.

Work on this local source tree having the sticky tag "Release-initial-bugfixes" which is based on the original version.

Work on this branch by yourself … until someone else joins to this "Release-initial-bugfixes" branch.

Sync with files modified by others on this branch while creating new directories as needed by the following.

$ cvs up -d

Edit files to fix conflicts as needed.

Check in changes to CVS by the following.

$ cvs ci -m "checked into this branch"

Update the local tree by HEAD of main while removing sticky tag ("-A") and without keyword expansion ("-kk") by the following.

$ cvs up -d -kk -A

Update the local tree (content = HEAD of main) by merging from the "Release-initial-bugfixes" branch and without keyword expansion by the following.

$ cvs up -d -kk -j Release-initial-bugfixes

Fix conflicts with editor.

Check in changes to CVS by the following.

$ cvs ci -m "merged Release-initial-bugfixes"

Make archive by the following.

$ cd ..
$ mv old old-module1-bugfixes
$ tar -cvzf old-module1-bugfixes.tar.gz old-module1-bugfixes
$ rm -rf old-module1-bugfixes
[提示] 提示

"cvs up" command can take "-d" option to create new directories and "-P" option to prune empty directories.

[提示] 提示

You can checkout only a sub directory of "module1" by providing its name as "cvs co module1/subdir".


Subversion is a recent-generation version control system replacing older CVS. It has most of CVS's features except tags and branches.

You need to install subversion, libapache2-svn and subversion-tools packages to set up a Subversion server.

Here is an example of typical work flow using Subversion with its native client.

[提示] 提示

Client commands offered by the git-svn package may offer alternative work flow of Subversion using the git command. See 第 10.6.4 节 “用于 Subversion 仓库的 Git”.

Check all available modules from Subversion project pointed by URL "file:///srv/svn/project" by the following.

$ svn list file:///srv/svn/project
module1
module2
...

Checkout "module1/trunk" to a directory "module1" by the following.

$ cd ~/path/to
$ svn co file:///srv/svn/project/module1/trunk module1
$ cd module1

Make changes to the content as needed.

Check changes by making "diff -u [repository] [local]" equivalent by the following.

$ svn diff

You find that you broke some file "file_to_undo" severely but other files are fine.

Overwrite "file_to_undo" file with the clean copy from Subversion by the following.

$ svn revert file_to_undo

Save the updated local source tree to Subversion by the following.

$ svn ci -m "Describe change"

Create and add "file_to_add" file to Subversion by the following.

$ vi file_to_add
$ svn add file_to_add
$ svn ci -m "Added file_to_add"

Merge the latest version from Subversion by the following.

$ svn up

Watch out for lines starting with "C filename" which indicates conflicting changes.

Look for unmodified code in, e.g., "filename.r6", "filename.r9", and "filename.mine".

Search for "<<<<<<<" and ">>>>>>>" in files for conflicting changes.

Edit files to fix conflicts as needed.

Add a release tag "Release-1" by the following.

$ svn ci -m "last commit for Release-1"
$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-1

Edit further.

Remove the release tag "Release-1" by the following.

$ svn rm file:///srv/svn/project/module1/tags/Release-1

Check in changes to Subversion by the following.

$ svn ci -m "real last commit for Release-1"

Re-add the release tag "Release-1" from updated Subversion HEAD of trunk by the following.

$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-1

Create a branch with a path "module1/branches/Release-initial-bugfixes" from the original version pointed by the path "module1/tags/Release-initial" and check it out to "~/path/to/old" directory by the following.

$ svn cp file:///srv/svn/project/module1/tags/Release-initial file:///srv/svn/project/module1/branches/Release-initial-bugfixes
$ cd ~/path/to
$ svn co file:///srv/svn/project/module1/branches/Release-initial-bugfixes old
$ cd old
[提示] 提示

Use "module1/trunk@{2005-12-20}" (ISO 8601 date format) instead of "module1/tags/Release-initial" to specify particular date as the branch point.

Work on this local source tree pointing to branch "Release-initial-bugfixes" which is based on the original version.

Work on this branch by yourself … until someone else joins to this "Release-initial-bugfixes" branch.

Sync with files modified by others on this branch by the following.

$ svn up

Edit files to fix conflicts as needed.

Check in changes to Subversion by the following.

$ svn ci -m "checked into this branch"

Update the local tree with HEAD of trunk by the following.

$ svn switch file:///srv/svn/project/module1/trunk

Update the local tree (content = HEAD of trunk) by merging from the "Release-initial-bugfixes" branch by the following.

$ svn merge file:///srv/svn/project/module1/branches/Release-initial-bugfixes

Fix conflicts with editor.

Check in changes to Subversion by the following.

$ svn ci -m "merged Release-initial-bugfixes"

Make archive by the following.

$ cd ..
$ mv old old-module1-bugfixes
$ tar -cvzf old-module1-bugfixes.tar.gz old-module1-bugfixes
$ rm -rf old-module1-bugfixes
[提示] 提示

You can replace URLs such as "file:///…" by any other URL formats such as "http://…" and "svn+ssh://…".

[提示] 提示

You can checkout only a sub directory of "module1" by providing its name as "svn co file:///srv/svn/project/module1/trunk/subdir module1/subdir", etc.