ports//sysutils/ssync/work/ssync-2.3/README.HTML

<html>
<head>
<title>ssync</title>
</head>
<body>
<h2>ssync</h2>
<hr size="1">
<table border="1">
<tr><td><b>Author:</b></td><td>Michael W. Shaffer</td></tr>
<tr><td><b>Current Version:</b></td><td>2.3</td></tr>
<tr><td><b>Status:</b></td><td>Stable</td></tr>
<tr><td><b>Release Date:</b></td><td>2002-11-06</td></tr>
<tr><td><b>Source Archive:</b></td><td><a href="ssync-2.3.tar.gz">ssync-2.3.tar.gz</a></td></tr>
<tr><td><b>DEB Package (i386 woody):</b></td><td><a href="ssync_2.3-1_i386.deb">ssync_2.3-1_i386.deb</a></td></tr>
<tr><td><b>RPM Package (i386 RH6.2):</b></td><td><a href="ssync-2.3-1.i386.rpm">ssync-2.3-1.i386.rpm</a></td></tr>
<tr><td><b>SRPM Package:</b></td><td><a href="ssync-2.3-1.src.rpm">ssync-2.3-1.src.rpm</a></td></tr>
<tr><td><b>Mac OS X Binaries:</b></td><td><a href="ssync-2.3-osx.tar.gz">ssync-2.3-osx.tar.gz</a></td></tr>
<tr><td><b>Change Log:</b></td><td><a href="CHANGES">CHANGES</a></td></tr>
<tr><td><b>License:</b></td><td><a href="COPYING">GPL</a></td></tr>
</table>
<p>
<h3>Contents</h3>
<ul>
<li><a href="#WHAT">What is it?</a>
<li><a href="#WHY">Why another synchronization tool?</a>
<li><a href="#FEATURES">Features</a>
<li><a href="#LIMITATIONS">Limitations</a>
<li><a href="#INSTALLATION">Building and Installing</a>
<li><a href="#CONFIGURATION">Configuration</a>
</ul>
<p>
<a name="WHAT"><h3>What is it?</h3></a>
Ssync is a minimalistic tool for keeping filesystems in synchronization. My main goals in writing ssync were correctness,
simplicity, speed, low-resource consumption, and portability. It features a number of options to control how things are
synchronized and under what conditions, as well as useful dry-run and verbose modes.
<p>
Ssync has been compiled and is known to work on the following systems:
<ul>
<li>Debian GNU/Linux 2.2 (potato)
<li>Debian GNU/Linux 3.0 (woody)
<li>RedHat Linux 6.2
<li>RedHat Linux 7.3
<li>Yelow Dog Linux (PPC)
<li>HP-UX 10.20
<li>HP-UX 11.00
<li>FreeBSD
<li>SCO
<li>Mac OS X Server
</ul>
I use ssync in production on a number of systems, including some with moderately large filesystems
of around 400 GiB and 2 million+ files. So far it has worked well in my environment under very
heavy and constant usage. If you are using FreeBSD, ssync is included in the Ports collection of
recent releases.
<p>
It should build and function correctly on most UNIX-like platforms with a working ANSI or C89
compliant compiler unsing the default makefile, with appropriate tweaks to things like the CC
variables, etc. If you have problems building ssync on your particular UNIX platform, or if
you come up with a makefile to build it successfully, I would appreciate your feedback.
<p>
There are specific makefiles for HP-UX (using <tt>c89</tt>) and OS X, as well as for building
Debian .deb and RedHat .rpm packages. To use an alternate makefile, run make with the <tt>-f</tt>
option, as such:
<blockquote>
<tt>make -f makefile.osx</tt>
</blockquote>
<p>
<a name="WHY"><h3>Why another synchronization tool?</h3></a>
The name <tt>ssync</tt> is a contraction of [s]imple filesystem [sync]hronizer. It was designed to be an extremely
simple and reliable solution to a significant operational need. On the network I manage, I recently put into production
a pair of loosely coupled highly available <a href="http://www.linux.org/">Linux</a> file servers which run
<a href="http://www.samba.org/">Samba</a>, <tt>NFS</tt>, and <a href="http://uts.cc.utexas.edu/~foxx/dhttpd/">dhttpd</a> to
service the file sharing needs of about 500 users with client machines running Windows and various UNIX platforms. I chose
not to use any of the currently available HA packages to manage these systems for various reasons:
<ul>
<li>Most of the packages I looked at require the machines to be in very close proximity in order to
use some sort of dedicated serial inter-connection for monitoring. I wanted these machines to be widely
separated physically to reduce the possibility for both being destroyed or made inaccessible at the same
time in the event of a catastrophe affecting an entire building at our site.
<li>All of the packages I looked at require some sort of shared disk array, which in my opinion makes
them just one step up in availability from a single machine anyway. I wanted to have two completely separate,
duplicate machines, including completely separate disk storage systems for this application.
<li>For our user community and activity load, sub-second failover times are not necessary. Anything under
a minute for complete failover is completely satisfactory in the current environment.
</ul>
The actual monitoring and failover features are handled by a separate daemon I created called <a href="../peerd/index.html">peerd</a>.
Since the implementation does <i>not</i> rely on a shared disk subsystem, some means of keeping the two separate filesystems of the
peer machines in relatively close synchronization was needed. Originally, the solution to this requirement
was a shell script which ran various <a href="http://rsync.samba.org/"><tt>rsync</tt></a> commands, first using a connection to an
<tt>rsync</tt> server process on the master machine and later relying on a couple of <tt>NFS</tt> filesystems exported on the master
and mounted on the slave specifically for the replication. As it turned out, this solution was less than satisfactory
since <tt>rsync</tt> would randomly but fairly frequently fail to complete the synchronization of one or more  
directory trees by either hanging indefinitely or barfing out numerous puzzling and spurious errors. The more I thought
about it, the more I was convinced that what was needed was something much less complex and hopefully more
reliable than <tt>rsync</tt> seemed to be in this application, and thus was born <tt>ssync</tt> / <tt>ssyncd</tt>. I don't
pretend that this program is useful for anything besides the rather narrow mission for which it was designed
(and it may not even be useful for that). I do think, however, that it at least provides an alternative sync tool
for certain situations, and I was unable to find any viable alternative to <tt>rsync</tt> in the open source world
when I wrote this.
<p>
<a name="FEATURES"><h3>Features</h3></a>
<ul>
<li>Works well on large filesystems (up to 2 million+ files and 400GiB in production so far).
<li>Works with the LFS interface on systems which have this support for 64-bit file operations.
<li>Handles filesystem objects with weird and non-ascii characters in their names.
<li>Correctly preserves all symbolic and hard links.
<li>Correctly preserves all mode bits (including SUID, SGID, and sticky bits).
<li>Optional control of sync behavior with <tt>no-sync-data</tt>, <tt>no-sync-time</tt>, and <tt>no-sync-meta</tt>
options.
<li>Can update only 'newer' content from source to destination with the <tt>update-only</tt> option. This option
allows something like a 'union' sync of two peer machines by perhaps cross-mounting and running one <tt>ssyncd</tt> 
on each, both set to <tt>update-only: y</tt>.
<li>Can be run at increased or decreased 'niceness' with <tt>priority</tt> option.
<li>Can be run either as an interactive tool (<tt>ssync</tt>) or as a daemon (<tt>ssyncd</tt>) with identical
functionality. 
<li>Can accept the specification of paths to synchronize either from a text file or on the command line.
<li>Can provide several levels of logging output, either via <tt>syslog</tt>, to its own log file, or 
to <tt>stderr</tt>
<li>Uses <i>a lot</i> less memory in general than <tt>rsync</tt>. Typical sizes for the <tt>ssyncd</tt>
process on my production servers are 500KiB to 3000KiB through all phases of operation, as compared to ~184<i>MiB</i>
peak sizes for <tt>rsync</tt> running against the same filesystems. I don't think that the memory usage of
<tt>rsync</tt> is a serious issue for most environments, and the algorithm it uses is probably better for
supporting many of its features; I just chose a simpler method which seems to work well
so far. I didn't really set out to minimize memory usage (since my production servers here have 4GiB of RAM
each), but it just sort of happened due to the method used to traverse the filesystem tree.
</ul>
<p>
<a name="LIMITATIONS"><h3>Limitations</h3></a>
The basic function of <tt>ssync</tt> is simply to make the directories, files, and links on a destination
filesystem match those on a source filesystem. The default behavior is to read a list of paths to sync from
a specified file and recursively process each of them. You may also specify the paths to sync with the 
(<tt>-f</tt> | <tt>--src-path</tt>) and (<tt>-t</tt> | <tt>--dst-path</tt>) command line options if you
just want to quickly sync two paths without bothering to create configuration and work files.
<ul>
<li>It only handles directories, regular files, hard links, and symbolic links. All other file types are just ignored
The number of unknown files is reported, and the specific paths ignored can be as well at more verbose logging levels.
<li>It only syncs from one path <i>on a mounted filesystem</i> to another. There is no client / server mode and
operation over an <tt>rsh</tt> / <tt>ssh</tt> connection is not implemented at all. The primary intent was to address synchronization
of well connected machines on the same LAN, so the job of network transport is assumed to be performed
by <tt>NFS</tt> or some equivalent means.
<li>There is no form of encryption implemented, nor is any attempt made to optimize the copying of files other than
matching of I/O buffer sizes to the <tt>st_blksize</tt> parameters reported by <tt>stat(2)</tt> for filesystem objects.
Due to this, and the previous limitation, this utility will probably not work well (or in most cases even at all)
for slow or non-local network connections. The program <i>does</i> optimize copying to the extent that it only copies files when
the <tt>st_mtime</tt>, or <tt>st_size</tt> parameters are different, but once it decides to copy a
file it just does the whole thing by brute force without any further contemplation. Also, no attempt is made to
compute a checksum or signature of the file contents, so it would be possible to 'fool' it if the file contents
were changed without altering either the times or the byte size. This is the one additional feature that I might
like to add at the moment, but it would slow things down quite some bit, and the current behavior seems to cover
all but the pathological cases.
<li>Since <tt>ssync</tt> uses a really simple hash table to load up the list of work items you want to sync, it will
wind up proceeding through them in more or less random fashion. This is not really a limitation, but might annoy
some users, so I'll point it out up front. I just coded it this way because it was simple and I had working code
for a minimalistic hash table implementation. The program will spit out the names of work items either to stderr
or the log file as it begins them, so you will still know what it has finished, what it is working on, and (more or
less) what is remaining.
</ul>
<p>
<a name="INSTALLATION"><h3>Building and Installing</h3></a>
As of version 1.8 there are now binary packages available. If you have a Linux system which uses either the <tt>.rpm</tt> or <tt>.deb</tt>
package formats, then all you have to do is install the package and edit the config files. I have tested and deployed <tt>ssync</tt> on both
RedHat and Debian Linux. I am not aware of any Linux specific features which it uses, so I think it will work fine on most other UNIX-like
platforms as well. As of the 2.0 release, I have eliminated what few GCC-isms the code contained and added the GCC
<tt>-ansi</tt> and <tt>-pedantic</tt> flags to the makefile, so I think it will now build and work on most UNIX systems
with a reasonably ANSI or C89 compliant compiler. With the GCC <tt>-ansi</tt> flag on, and because I did use
<tt>snprintf()</tt>, <tt>lstat()</tt>, <tt>lchown()</tt>, and a couple of other not-strictly-POSIX things, it does require
<tt>-D_BSD_SOURCE</tt> to build on Linux. If your platform does not have any of these functions for some reason, just
let me know and I'll see if there are any workarounds.
<p>
There is no <tt>configure</tt> script since I just didn't feel like writing one and I don't really
think one is necessary at this point. There may be one in the future. You may need to change the <tt>makefile</tt> if you don't
have <tt>gcc</tt> available. Otherwise, a plain old <nobr><tt>make</tt></nobr> should do it. The build will produce two binaries,
<tt>ssync</tt> (the interactive version), and <tt>ssyncd</tt>, the daemon. Also included is a rather generic <tt>ssyncd.init</tt> startup
script which can be copied to <tt>/etc/init.d</tt> or wherever your distribution puts startup files. Examples of the the config files
<tt>/etc/ssyncd.conf</tt> and <tt>/etc/ssyncd.work</tt> are provided, and they should be edited as appropriate to your situation. If you
are running the interactive <tt>ssync</tt> version, it will obey whatever command line options you give as well as any
configuration it might find in a file called <tt>.ssyncrc</tt> in <i>the current directory</i>. I have not yet gotten around to
implementing any behavior for <tt>ssync</tt> to look for a <tt>.ssyncrc</tt> file in the user's home directory.
<p>
<a name="CONFIGURATION"><h3>Configuration</h3></a>
All of the available configuration options are shown in the example <tt>ssyncd.conf</tt> configuration file and can be set
either in this file (for <tt>ssyncd</tt>), in <tt>.ssyncrc</tt> (for <tt>ssync</tt>), or on the command line (for both). A
summary of config options is below. The <tt>-c</tt> option, of course, only makes sense on the command line (duh). You will
see a complete list of all updates, deletions, and exceptions at the default <tt>log-level</tt> of 0. If you want to suppress
everything except errors, set log level 3 (warn). Log level 2 (info) is probably what most people want.
<p>
<table border="1">
<tr><td><b>Config file</b></td><td><b>Long Option</b></td><td><b>Short Option</b></td><td><b>Comment</b></td></tr>
<tr><td>-</td><td>--help</td><td>-h</td><td>display usage message and version</td></tr>
<tr><td>conf-path</td><td>--conf-path</td><td>-c</td><td>read alternative config file from the default</td></tr>
<tr><td>interval</td><td>--interval</td><td>-i</td><td>number of seconds to sleep between completing one run and starting the next</td></tr>
<tr><td>work-file</td><td>--work-file</td><td>-w</td><td>path for file containing work paths (see also <tt>src-path</tt> and <tt>dst-path</tt>)</td></tr>
<tr><td>src-path</td><td>--src-path</td><td>-f</td><td>alternative way to specify a single source path</td></tr>
<tr><td>dst-path</td><td>--dst-path</td><td>-t</td><td>alternative way to specify a single destination path</td></tr>
<tr><td>priority</td><td>--priority</td><td>-n</td><td>scheduling priority (-20 - +20), see renice(8)</td></tr>
<tr><td>no-detach</td><td>--no-detach</td><td>-F</td><td>do not daemonize (use with <tt>log-mode: stderr)</tt></td></tr>
<tr><td>no-sync-data</td><td>--no-sync-data</td><td>-D</td><td>do not sync data (content) of files</td></tr>
<tr><td>no-sync-time</td><td>--no-sync-time</td><td>-T</td><td>do not sync atime / mtime</td></tr>
<tr><td>no-sync-meta</td><td>--no-sync-meta</td><td>-M</td><td>do not sync meta-data (uid / gid / mode)</td></tr>
<tr><td>update-only</td><td>--update-only</td><td>-U</td><td>only sync things if source mtime is > destination mtime</td></tr>
<tr><td>test</td><td>--test</td><td>-X</td><td>run sync procedure and collect statistics without actually modifying anything</td></tr>
<tr><td>pid-path</td><td>--pid-path</td><td>-p</td><td>path for pid file</td></tr>
<tr><td>log-mode</td><td>--log-mode</td><td>-m</td><td>[file|syslog|stderr] logging mode</td></tr>
<tr><td>log-path</td><td>--log-path</td><td>-l</td><td>path for log file if using file based logging</td></tr>
<tr><td>log-ident</td><td>--log-ident</td><td>-s</td><td>identification string if using syslog based logging</td></tr>
<tr><td>log-level</td><td>--log-level</td><td>-v</td><td>logging verbosity (0 - 5), lower levels are more verbose (2 is normal, 3 is errors only, 0 lists all updates and deletions</td></tr>
</table>
<p>
Here's the example <tt>ssyncd.conf</tt> file:
<pre>

#
# ssyncd.conf
#

interval:		300			# time between sync runs in seconds
work-file:		/etc/ssyncd.work	# list of paths to synchronize (you can also specify
                                                # a single source and destination in the config file
						# or on the command line with src-path and dst-path
#src-path:		/src/path		# alternative specification of one source path
#dst-path:		/dst/path		# alternative specification of one destination path

priority:		0			# scheduling priority (range -20 - +20)
                                                # be careful with this! and read renice(8)
                                                # if you don't know what it means

#no-detach:             yes                     # [y|n] do not detach from terminal
#no-sync-data:		yes			# [y|n] do not sync data (file contents)
#no-sync-time:		yes			# [y|n] do not sync atime / mtime
#no-sync-meta:		yes			# [y|n] do not sync meta-data (uid / gid / mode)
#update-only:		yes			# [y|n] update only if source mtime > dest mtime
#test:			yes			# [y|n] test only (modify nothing in dest.)

pid-path:		/var/run/ssyncd.pid	# path for pid file

log-mode:               file                    # [file|syslog|stderr] logging mode
log-path:		/var/log/ssyncd.log	# path for file based logging
log-ident:		ssyncd			# id for syslog based logging
log-level:		2	# 0 - ALL
				# 1 - TRACE
				# 2 - INFO
				# 3 - WARN
				# 4 - SEVERE
				# 5 - FATAL

</pre>
<p>
The work file just contains a list of work items, one per line, in the form:
<pre>
/source/path | /destination/path
</pre>
The paths can be either files or directories, and <i>source</i> directories will be processed recursively. There is no form of substitution or
environment variable parsing, and there is no facility for excluding things. If the destination is a different type than the source
(i.e. source is a file and destination is a directory), then the program will unlink the destination object (recursively) and re-create
it as the new type. This means that if you wanted to sync a file <i>into</i> a directory, you should give the full path name of the
destination <i>including the file name</i>. This 'feature' might also have some disastrously unexpected effects if you tried to specify
a symlink to a directory or file as the source path and a real directory or file as the destination. The config file parsing routines
are really simple-minded and will just discard all whitespace in either config file (meaning paths with whitespace will not be parsed
correctly). If it causes a lot of issues, I may refine this behavior in the future. Here's the example <tt>ssyncd.work</tt> file:
<pre>
#
# ssyncd.work:   Example work file for ssync / ssyncd
#
# Each line must be of the form:
#
#   source path | destination path
#

# Individual files
/mnt/peer/etc/aliases          | /etc/aliases
/mnt/peer/etc/group            | /etc/group
/mnt/peer/etc/group-           | /etc/group-
/mnt/peer/etc/gshadow-         | /etc/gshadow-
/mnt/peer/etc/gshadow          | /etc/gshadow
/mnt/peer/etc/passwd           | /etc/passwd
/mnt/peer/etc/passwd-          | /etc/passwd-
/mnt/peer/etc/shadow-          | /etc/shadow-
/mnt/peer/etc/shadow           | /etc/shadow

# Directory trees
/mnt/peer/etc/cron.d           | /etc/cron.d
/mnt/peer/etc/cron.daily       | /etc/cron.daily
/mnt/peer/etc/cron.monthly     | /etc/cron.monthly
/mnt/peer/etc/cron.weekly      | /etc/cron.weekly
/mnt/peer/etc/init.d           | /etc/init.d
/mnt/peer/etc/logrotate.d      | /etc/logrotate.d
/mnt/peer/etc/rc0.d            | /etc/rc0.d
/mnt/peer/etc/rc1.d            | /etc/rc1.d
/mnt/peer/etc/rc2.d            | /etc/rc2.d
/mnt/peer/etc/rc3.d            | /etc/rc3.d
/mnt/peer/etc/rc4.d            | /etc/rc4.d
/mnt/peer/etc/rc5.d            | /etc/rc5.d
/mnt/peer/etc/rc6.d            | /etc/rc6.d
/mnt/peer/etc/rcS.d            | /etc/rcS.d
</pre>
<hr size="1">
<a href="mailto:mwshaffer@yahoo.com">mwshaffer@yahoo.com</a>
</body>
</html>
syntax highlighted by Code2HTML, v. 0.9.1