Struct crowbook_text_processing::FrenchFormatter [] [src]

pub struct FrenchFormatter { /* fields omitted */ }

French typographic formatter.

The purpose of this struct is to try to make a text more typographically correct, according to french typographic rules. This means:

  • making spaces before ?, !, ; narrow non-breaking space;
  • making spaces before : non-breaking space;
  • making space after for dialog a demi em space;
  • making spaces after « and before » non-breking space or narrow non-breking space, according to the circumstances (dialog or a few quoted words).
  • making spaces in numbers, e.g. 80 000 or 50 € narrow and non-breaking.

Additionally, this feature use functions that are "generic" (not specific to french language) in order to:

  • replace straight quotes (' and ") with curly, typographic ones;
  • replace ellipsis (...) with the unicode character ().

As some of these features require a bit of guessing sometimes, there are some paremeters that can be set if you want better results.

Example

use crowbook_text_processing::FrenchFormatter;
let input = "Un texte à 'formater', n'est-ce pas ?";
let output = FrenchFormatter::new()
             .typographic_ellipsis(false) // don't replace ellipsis
             .format_tex(input); // format to tex (so non-breaking
                                 // spaces are visible in assert_eq!)
assert_eq!(&output, "Un texte à ‘formater’, n’est-ce pas~?");

Methods

impl FrenchFormatter
[src]

Create a new FrenchFormatter with default settings

Sets the threshold currency.

After that number of characters, assume it's not a currency

Default is 3.

Sets the threshold for unit.

After that number of characters, assume it's not an unit.

Default is 2.

Sets the threshold for quote.

After that number of characters, assume it's not a quote of a single word or a few words, but a dialog.

Default is 20.

Sets the threshold for real word.

After that number of characters, assume it's not an abbreviation but a real word (used to determine if . marks the end of a sentence or just a title such as M. Dupuis.

Default is 3

Enables the typographic quotes replacement.

If true, "L'" will be replaced by "L’"

Default is true

Enables typographic ellipsis replacement.

If true, "..." will be replaced by "…"

Default is true

If set to true, replaces --to and --- to .

Default is false.

If set to true, replaces << to « and >> to ».

Default is false.

(Try to) Format a string according to french typographic rules.

This method should be called for each paragraph, as it makes some suppositions that the beginning of the string also means the beginning of a line.

This method calls remove_whitespaces internally, as it relies on it.

Example

use crowbook_text_processing::FrenchFormatter;
let f = FrenchFormatter::new();
let s = f.format("« Est-ce bien formaté ? » se demandait-elle — les espaces \
                  insécables étaient tellement compliquées à gérer,
                  dans cette langue !");
println!("{}", s);

(Try to) Format a string according to french typographic rules, and use '~' so it works correctly with LaTeX output.

Example

use crowbook_text_processing::FrenchFormatter;
let f = FrenchFormatter::new();
let s = f.format_tex("« Est-ce bien formaté ? »");
assert_eq!(&s, "«~Est-ce bien formaté~?~»");

Trait Implementations

impl Debug for FrenchFormatter
[src]

Formats the value using the given formatter.

impl Default for FrenchFormatter
[src]

Returns the "default value" for a type. Read more