Struct crowbook_text_processing::FrenchFormatter
[−]
[src]
pub struct FrenchFormatter { /* fields omitted */ }
French typographic formatter.
The purpose of this struct is to try to make a text more typographically correct, according to french typographic rules. This means:
- making spaces before
?
,!
,;
narrow non-breaking space; - making spaces before
:
non-breaking space; - making space after
—
for dialog a demi em space; - making spaces after
«
and before»
non-breking space or narrow non-breking space, according to the circumstances (dialog or a few quoted words). - making spaces in numbers, e.g.
80 000
or50 €
narrow and non-breaking.
Additionally, this feature use functions that are "generic" (not specific to french language) in order to:
- replace straight quotes (
'
and"
) with curly, typographic ones; - replace ellipsis (
...
) with the unicode character (…
).
As some of these features require a bit of guessing sometimes, there are some paremeters that can be set if you want better results.
Example
use crowbook_text_processing::FrenchFormatter; let input = "Un texte à 'formater', n'est-ce pas ?"; let output = FrenchFormatter::new() .typographic_ellipsis(false) // don't replace ellipsis .format_tex(input); // format to tex (so non-breaking // spaces are visible in assert_eq!) assert_eq!(&output, "Un texte à ‘formater’, n’est-ce pas~?");
Methods
impl FrenchFormatter
[src]
fn new() -> Self
Create a new FrenchFormatter with default settings
fn threshold_currency(&mut self, t: usize) -> &mut Self
Sets the threshold currency.
After that number of characters, assume it's not a currency
Default is 3
.
fn threshold_unit(&mut self, t: usize) -> &mut Self
Sets the threshold for unit.
After that number of characters, assume it's not an unit.
Default is 2
.
fn threshold_quote(&mut self, t: usize) -> &mut Self
Sets the threshold for quote.
After that number of characters, assume it's not a quote of a single word or a few words, but a dialog.
Default is 20
.
fn threshold_real_word(&mut self, t: usize) -> &mut Self
Sets the threshold for real word.
After that number of characters, assume it's not an abbreviation
but a real word (used to determine if .
marks the end of a sentence
or just a title such as M. Dupuis
.
Default is 3
fn typographic_quotes(&mut self, b: bool) -> &mut Self
Enables the typographic quotes replacement.
If true, "L'" will be replaced by "L’"
Default is true
fn typographic_ellipsis(&mut self, b: bool) -> &mut Self
Enables typographic ellipsis replacement.
If true, "..." will be replaced by "…"
Default is true
fn ligature_dashes(&mut self, b: bool) -> &mut Self
If set to true, replaces --
to –
and ---
to —
.
Default is false.
fn ligature_guillemets(&mut self, b: bool) -> &mut Self
If set to true, replaces <<
to «
and >>
to »
.
Default is false.
fn format<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
(Try to) Format a string according to french typographic rules.
This method should be called for each paragraph, as it makes some suppositions that the beginning of the string also means the beginning of a line.
This method calls remove_whitespaces
internally, as it relies on it.
Example
use crowbook_text_processing::FrenchFormatter; let f = FrenchFormatter::new(); let s = f.format("« Est-ce bien formaté ? » se demandait-elle — les espaces \ insécables étaient tellement compliquées à gérer, dans cette langue !"); println!("{}", s);
fn format_tex<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>
(Try to) Format a string according to french typographic rules, and use '~' so it works correctly with LaTeX output.
Example
use crowbook_text_processing::FrenchFormatter; let f = FrenchFormatter::new(); let s = f.format_tex("« Est-ce bien formaté ? »"); assert_eq!(&s, "«~Est-ce bien formaté~?~»");