Home -> Professional -> Match -> Match Task
Match searches directories for matching files. The tool may be configured to select from a variety of criteria to determine whether two files match, including file name, file size, last modified time, and file contents. Files may be considered a match if the criteria are the same, or if the criteria are different. The tool is also available as a command-line application.
The tool reports two numbers: the number of matching files, and the number of matching file groups. The number of matching files is the count of individual files which have at least one other file that matches. The number of matching groups is the number of groups of files that match. For example, if the criteria are "same filename", and the directory trees being searched contain three files named README and two files named LICENSE with no other matching files, then the number of matching files would be five, and the number of matching groups would be two. These numbers are placed into properties for use later in the build file.
The match task can also produce a report of the duplicate files. It
does this using a formatter. Three formatters are supplied, a text formatter,
a summary formatter, and an XML formatter. The text and XML formatters
produce files containing the full filenames of any matching files, organized
into groups. All three formatters include summary information (how many files
and how many groups were detected). Alternatively, a custom formatter
(implementing com.bennettconsulting.match.ResultFormatter)
can be used. If multiple formatters are supplied, each one will produce
a report.
Central to this tool is the idea of "matching files." This is determined by comparing various aspects of the file: file name, file size, and time of last modification. According to the criteria selected, a file may match another file on an aspect if the two values are the same, or if the two values are different. Two files match if and only if all criteria are satisfied.
The criteria are specified in the task's attributes. Each aspect can be given one of three values, same, diff, or ignore. By default, each aspect is set to ignore; when set to ignore, that aspect plays no part in determining a match. When set to same, the two files are considered to match if and only if they have the same value for that aspect. When set to diff, the two files are considered to match if they have different values for that aspect. If two files match on some aspects but do not match on others, then those two files do not match.
Consider these four files:
| Path | Size | Last Modified |
dist/output.jar |
22438 bytes | 11:43:03, 12 July 2002 |
build/output.jar |
22438 bytes | 11:42:36, 12 July 2002 |
src/App.java |
4720 bytes | 22:00:00, 10 July 2002 |
src/Engine.java |
6494 bytes | 22:00:00, 10 July 2002 |
If the criteria were name=same,
then the first two would match, since they both have the same name
(output.jar); the rest of the path is not relevant. With criteria
of name=same, size=same, the first two would still
match, since the first two also have the same size. However, with criteria of
name=same, size=diff, there would be no matches,
since no files that have the same name also have different sizes. With
criteria of size=diff, time=same, the last two
files would match, since they have the same timestamp but different sizes.
| Attribute | Description | Required |
| names | Controls how files' names are used to determine whether
two files match (either same, diff, or
ignore). The file name is that portion of the complete path
following the final separator character; e.g., on a Unix machine, the
path /usr/local/bin/tokay.jar would have a corresponding
file name of tokay.jar |
One or more of these
must be specified as either same or diff;
default is ignore. |
| sizes | Controls how files' sizes (in bytes) are used to
determine whether two files match (either same,
diff, or ignore). |
|
| times | Controls how a files' last modified times are used to
determine whether two files match (either same,
diff, or ignore). The tool compares the times
with millisecond precision, although the operating system may store the
times at lower precision. |
|
| contents | Controls how files' contents, byte for byte, are used
to determine whether two files match (either same,
diff, or ignore). |
|
| groupproperty | The name of a property to set with the number of groups of matching files. | No |
| fileproperty | The name of a property to set with the number of matching files. | No |
The match task supports any number of nested
<fileset>
elements to specify the files to be checked for matches.
The results of the comparisons can be printed in different formats. Output
is sent to a file, whose name is set by the file attribute of
the <formatter>. One match task can support any number
of formatters. If there are no formatters, then no report is produced.
There are three predefined formatters—one prints the test results in
XML format, the other two emit plain text. The formatter named
summary prints the a summary of the results as ASCII text. The
formatter named plain prints the complete results as ASCII text.
The formatter named com.bennettconsulting.match.ResultFormatter, can be specified.
| Attribute | Description | Required |
| type | Use a predefined formatter (one of
xml, summary, or plain). |
Exactly one of these. |
| classname | Name of a custom formatter class. | |
| file | Name of file to write output to. | No; defaults to standard output. |
| header | A string, used as a header in the file. | No. |
<taskdef resource="matchtask.properties" classpath="match.jar"/>
Establishes the match task.
<match names="same" sizes="diff">
<formatter type="summary" header="Possible version mismatches in"/>
<formatter type="xml" file="versionmismatch.xml" header="Possible version mismatches:"/>
<fileset dir="dist" includes="**/*"/>
</match>
Checks the directory named dist and all subdirectories of it
for files with the same file name but with different file sizes, and writes
a summary to the console and an XML-formatted report of those files to
a file named versionmismatch.xml.
<match fileproperty="matches" names="same">
<fileset dir="${basedir}" includes="*"/>
</match>
<condition property="succeeded">
<equals arg1="${matches}" arg2="0"/>
</condition>
<fail unless="succeeded" message="${matches} duplicates found!"/>
Checks the base directory for files with the same file name, and places
the count of those files into the property matches. The build
then fails if there are any matching files, with a message telling the user
how many files matched. No output is produced if the build succeeds.
This tool is distributed under the Apache 2.0 License. The license is also available on the web.
Copyright © 2004, 2005 Leif Bennett. All rights Reserved.