pub struct FrenchFormatter { /* private fields */ }
Expand description
French typographic formatter.
The purpose of this struct is to try to make a text more typographically correct, according to french typographic rules. This means:
- making spaces before
?
,!
,;
narrow non-breaking space; - making spaces before
:
non-breaking space; - making space after
—
for dialog a demi em space; - making spaces after
«
and before»
non-breking space or narrow non-breking space, according to the circumstances (dialog or a few quoted words). - making spaces in numbers, e.g.
80 000
or50 €
narrow and non-breaking.
Additionally, this feature use functions that are “generic” (not specific to french language) in order to:
- replace straight quotes (
'
and"
) with curly, typographic ones; - replace ellipsis (
...
) with the unicode character (…
).
As some of these features require a bit of guessing sometimes, there are some paremeters that can be set if you want better results.
Example
use crowbook_text_processing::FrenchFormatter;
let input = "Un texte à 'formater', n'est-ce pas ?";
let output = FrenchFormatter::new()
.typographic_ellipsis(false) // don't replace ellipsis
.format_tex(input); // format to tex (so non-breaking
// spaces are visible in assert_eq!)
assert_eq!(&output, "Un texte à ‘formater’, n’est-ce pas\\,?");
Implementations§
source§impl FrenchFormatter
impl FrenchFormatter
sourcepub fn threshold_currency(&mut self, t: usize) -> &mut Self
pub fn threshold_currency(&mut self, t: usize) -> &mut Self
Sets the threshold currency.
After that number of characters, assume it’s not a currency
Default is 3
.
sourcepub fn threshold_unit(&mut self, t: usize) -> &mut Self
pub fn threshold_unit(&mut self, t: usize) -> &mut Self
Sets the threshold for unit.
After that number of characters, assume it’s not an unit.
Default is 2
.
sourcepub fn threshold_quote(&mut self, t: usize) -> &mut Self
pub fn threshold_quote(&mut self, t: usize) -> &mut Self
Sets the threshold for quote.
After that number of characters, assume it’s not a quote of a single word or a few words, but a dialog.
Default is 20
.
sourcepub fn threshold_real_word(&mut self, t: usize) -> &mut Self
pub fn threshold_real_word(&mut self, t: usize) -> &mut Self
Sets the threshold for real word.
After that number of characters, assume it’s not an abbreviation
but a real word (used to determine if .
marks the end of a sentence
or just a title such as M. Dupuis
.
Default is 3
sourcepub fn typographic_quotes(&mut self, b: bool) -> &mut Self
pub fn typographic_quotes(&mut self, b: bool) -> &mut Self
Enables the typographic quotes replacement.
If true, “L’” will be replaced by “L’”
Default is true
sourcepub fn typographic_ellipsis(&mut self, b: bool) -> &mut Self
pub fn typographic_ellipsis(&mut self, b: bool) -> &mut Self
Enables typographic ellipsis replacement.
If true, “…” will be replaced by “…”
Default is true
sourcepub fn ligature_dashes(&mut self, b: bool) -> &mut Self
pub fn ligature_dashes(&mut self, b: bool) -> &mut Self
If set to true, replaces --
to –
and ---
to —
.
Default is false.
sourcepub fn ligature_guillemets(&mut self, b: bool) -> &mut Self
pub fn ligature_guillemets(&mut self, b: bool) -> &mut Self
If set to true, replaces <<
to «
and >>
to »
.
Default is false.
sourcepub fn format<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
pub fn format<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
(Try to) Format a string according to french typographic rules.
This method should be called for each paragraph, as it makes some suppositions that the beginning of the string also means the beginning of a line.
This method calls remove_whitespaces
internally, as it relies on it.
Example
use crowbook_text_processing::FrenchFormatter;
let f = FrenchFormatter::new();
let s = f.format("« Est-ce bien formaté ? » se demandait-elle — les espaces \
insécables étaient tellement compliquées à gérer,
dans cette langue !");
println!("{}", s);
sourcepub fn format_tex<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
pub fn format_tex<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
(Try to) Format a string according to french typographic rules, escape the characters that need to be escaped in LaTeX (e.g. backslashes) and use TeX commands (“~”, “\enspace” “and “,”) for non-breaking spaces so it works correctly with some LaTeX versions (and it makes the non-breaking spaces shenanigans more visible with most editors)
Example
use crowbook_text_processing::FrenchFormatter;
let f = FrenchFormatter::new();
let s = f.format_tex("« Est-ce bien formaté ? »");
assert_eq!(&s, "«~Est-ce bien formaté\\,?~»");
sourcepub fn format_html<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
pub fn format_html<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
(Try to) Format a string according to french typographic rules, and escape the characters
that need to be escaped in HTML (e.g. &). Also use HTML commands instead
of unicode for narrow non-breaking spaces. See escape::nb_spaces_html
. It’s a bit of a hack
to make it work in most browsers/ereaders.