.\" Automatically generated by Pandoc 2.0.6
.\"
.TH "SUBDIFF" "1" "June 2018" "User Manual" ""
.hy
.SH NAME
.PP
subdiff \- substring diff program
.SH SYNOPSIS
.PP
subdiff [\f[I]options\f[]] \f[I]old\-file\f[] \f[I]new\-file\f[]
.SH DESCRIPTION
.PP
\f[C]subdiff\f[] is entirely analogous to \f[C]diff\f[], except it can
be asked to only compare parts of the line selected by a regular
expression, but still output the original input lines after comparison.
Its output format faithfully follows that of \f[C]diff\f[], so that its
output can be passed to tools like \f[C]diffstat\f[] and
\f[C]colordiff\f[].
.SH OPTIONS
.TP
.B \-r \f[I]RE\f[], \-\-regex=RE
Specify regular expression.
For lines which are matched by RE, only the parts of the line which are
matched by top\-level capture groups take part in the comparison.
A line that is not matched by a regular expression is compared in whole.
.RS
.PP
This option can be given multiple times.
It is the responsibility of the user to ensure that no input line is
matched by more than one regular expression \[en] a runtime error is
generated otherwise.
.PP
Multiple regular expressions are searched for in parallel (i.e they are
compiled into the same automaton); if a single regular expression
matches, it is then re\-run by itself in order to build up the capture
groups.
.RE
.TP
.B \-i \f[I]RE\f[], \-\-ignore=RE
Specify character sequences that should be \f[I]ignored\f[].
The provided RE is only considered as a whole (i.e.\ individual
subgroups are not looked at).
This is run after any regular expressions specified by \f[C]\-r\f[] and
can match multiple times per line; any of its matches are removed from
the selected substring.
.RS
.PP
This option makes it easier to ignore variable parts of the input that
can also appear a variable number of times per line.
For example, it exclude from comparison anything that looks like a
hexadecimal address.
It can also be used to ignore changes in the amount of whitespace.
.PP
As this RE is matched as a whole, this option can only be given a single
time.
To ignore more than one regular expression, the user should specify them
as alternatives, i.e.
\f[C]"RE1|RE2"\f[].
.RE
.TP
.B \-c N, \-\-context=N
Number of context lines to be displayed
.RS
.RE
.TP
.B \-\-context\-format=CTXFMT
Display format for any displayed context lines.
.RS
.PP
Context lines do not, by definition, have differences in the parts of
the line that were selected by a regular expression.
That said, they can still have differences in the parts that were not
selected (or ignored).
This option specifies the display style to use for such differences.
.IP \[bu] 2
\f[I]wdiff\f[] Word\-diff style.
This is the default.
It presents changes in the context lines using wdiff\-style
\f[C]{\-removed}{+added}\f[] markers.
.IP \[bu] 2
\f[I]ccwide\f[] Summarize by character class.
Whenever all the characters in consecutive changes can be said to belong
to one of the predefined character classes, do not print out the changes
themselves; instead, output the character class followed by the count of
the characters in the old and new version.
.RS 2
Character classes are printed out as either \\c{n}, when both the old
and new versions consist of \f[C]n\f[] characters of this class, or
\\c{o,n} when the old and new versions consist of \f[C]o\f[] and
\f[C]n\f[] number of charecters respectively.
The character classes are
.IP \[bu] 2
\\a alphabetic
.IP \[bu] 2
\\d digit
.IP \[bu] 2
\\w word (i.e.\ digit or alphabetic)
.IP \[bu] 2
\\s whitespace
.IP \[bu] 2
\&.
any
.RE
.IP \[bu] 2
\f[I]cc\f[] Aggressively summarize by character class.
This functions like \f[I]ccwide\f[] above, but will also pull in any
adjacent characters that are common between the two files (therefore the
output will be more \[lq]narrow\[rq], i.e.\ fewer characters).
Given that we are marking changes to lines that the user specifically
ignored, this is the most appropriate option to summarize changes to
values where \f[I]wdiff\f[] would be too much clutter.
For example, changes in large numerical quantities such as timestamps,
execution times or dates.
.IP \[bu] 2
\f[I]new\f[] Use the corresponding line from the \f[C]new\f[] file.
This is useful when one is interested in where there were changes, but
needs accurate context information to make sense of the change.
.IP \[bu] 2
\f[I]old\f[] Use the corresponding line from the \f[C]old\f[] file.
See the description for \f[I]new\f[].
.RE
.TP
.B \-\-context\-tokenization=CTOK
Select the tokenization rules for context lines.
These options apply to the \f[I]wdiff\f[], \f[I]cc\f[] and
\f[I]ccwide\f[] context formats.
Possible values are
.RS
.IP \[bu] 2
\f[I]word\f[] Tokenize by word boundaries (i.e.\ the regex anchor
\f[C]\\\\b\f[]).
This is the more readable choice and is therefore the default.
When tokenizing the line into words, the \f[I]wdiff\f[] format will
behave similarly to the \f[C]wdiff\f[] command.
Importantly, the \f[I]cc\f[] format will have a better chance of
ignoring insignificant edits in multiple parts of a large
\[lq]word\[rq], such as multiple digits of an address or floating\-point
number and producing more readable output.
.IP \[bu] 2
\f[I]char\f[] Consider each character as an individual token.
This will produce more accurate output which, however, is likely to be
too cluttered for general use.
.RE
.TP
.B \-\-mark\-changed\-context
Prefix each changed context line with a bang (\f[C]!\f[]) character.
This can be useful when using \f[C]\-\-context\-format=new\f[] (or
\f[C]old\f[]) to be informed of which context lines have changes between
files, even when those changes are not being displayed.
.RS
.RE
.TP
.B \-\-display\-selected
Output the parts of the input lines that were actually considered for
comparison, instead of outputting the corresponding lines from the input
files.
This is intended as a diagnostic option for debugging any unexpected
mismatches of the provided regular expressions.
.RS
.RE
.TP
.B \-V, \-\-version
Print version information
.RS
.RE
.TP
.B \-h, \-\-help
Display usage
.RS
.RE
.SH NOTES
.PP
Input is treated as arbitrary bytes.
That means that it does not need to be of a valid encoding.
Conversely, unicode character classes are not available when specifying
a regular expression.
.PP
If neither \f[C]\-r\f[] nor \f[C]\-i\f[] are specified, \f[C]subdiff\f[]
will behave as \f[C]diff\f[].
.SH EXIT STATUS
.PP
Like \f[C]diff\f[], \f[C]subdiff\f[] terminates with exit code 0 if
there were no differences between the (selected parts of the) two files.
It exits with 1 if there were differences and with 2 if there was an
error.
.SH AUTHORS
Angelos Oikonomopoulos.