pub struct FrenchFormatter { /* private fields */ }
Expand description

French typographic formatter.

The purpose of this struct is to try to make a text more typographically correct, according to french typographic rules. This means:

  • making spaces before ?, !, ; narrow non-breaking space;
  • making spaces before : non-breaking space;
  • making space after for dialog a demi em space;
  • making spaces after « and before » non-breking space or narrow non-breking space, according to the circumstances (dialog or a few quoted words).
  • making spaces in numbers, e.g. 80 000 or 50 € narrow and non-breaking.

Additionally, this feature use functions that are “generic” (not specific to french language) in order to:

  • replace straight quotes (' and ") with curly, typographic ones;
  • replace ellipsis (...) with the unicode character ().

As some of these features require a bit of guessing sometimes, there are some paremeters that can be set if you want better results.

Example

use crowbook_text_processing::FrenchFormatter;
let input = "Un texte à 'formater', n'est-ce pas ?";
let output = FrenchFormatter::new()
             .typographic_ellipsis(false) // don't replace ellipsis
             .format_tex(input); // format to tex (so non-breaking
                                 // spaces are visible in assert_eq!)
assert_eq!(&output, "Un texte à ‘formater’, n’est-ce pas\\,?");

Implementations§

source§

impl FrenchFormatter

source

pub fn new() -> Self

Create a new FrenchFormatter with default settings

source

pub fn threshold_currency(&mut self, t: usize) -> &mut Self

Sets the threshold currency.

After that number of characters, assume it’s not a currency

Default is 3.

source

pub fn threshold_unit(&mut self, t: usize) -> &mut Self

Sets the threshold for unit.

After that number of characters, assume it’s not an unit.

Default is 2.

source

pub fn threshold_quote(&mut self, t: usize) -> &mut Self

Sets the threshold for quote.

After that number of characters, assume it’s not a quote of a single word or a few words, but a dialog.

Default is 20.

source

pub fn threshold_real_word(&mut self, t: usize) -> &mut Self

Sets the threshold for real word.

After that number of characters, assume it’s not an abbreviation but a real word (used to determine if . marks the end of a sentence or just a title such as M. Dupuis.

Default is 3

source

pub fn typographic_quotes(&mut self, b: bool) -> &mut Self

Enables the typographic quotes replacement.

If true, “L’” will be replaced by “L’”

Default is true

source

pub fn typographic_ellipsis(&mut self, b: bool) -> &mut Self

Enables typographic ellipsis replacement.

If true, “…” will be replaced by “…”

Default is true

source

pub fn ligature_dashes(&mut self, b: bool) -> &mut Self

If set to true, replaces --to and --- to .

Default is false.

source

pub fn ligature_guillemets(&mut self, b: bool) -> &mut Self

If set to true, replaces << to « and >> to ».

Default is false.

source

pub fn format<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>

(Try to) Format a string according to french typographic rules.

This method should be called for each paragraph, as it makes some suppositions that the beginning of the string also means the beginning of a line.

This method calls remove_whitespaces internally, as it relies on it.

Example
use crowbook_text_processing::FrenchFormatter;
let f = FrenchFormatter::new();
let s = f.format("« Est-ce bien formaté ? » se demandait-elle — les espaces \
                  insécables étaient tellement compliquées à gérer,
                  dans cette langue !");
println!("{}", s);
source

pub fn format_tex<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>

(Try to) Format a string according to french typographic rules, escape the characters that need to be escaped in LaTeX (e.g. backslashes) and use TeX commands (“~”, “\enspace” “and “,”) for non-breaking spaces so it works correctly with some LaTeX versions (and it makes the non-breaking spaces shenanigans more visible with most editors)

Example
use crowbook_text_processing::FrenchFormatter;
let f = FrenchFormatter::new();
let s = f.format_tex("« Est-ce bien formaté ? »");
assert_eq!(&s, "«~Est-ce bien formaté\\,?~»");
source

pub fn format_html<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>

(Try to) Format a string according to french typographic rules, and escape the characters that need to be escaped in HTML (e.g. &). Also use HTML commands instead of unicode for narrow non-breaking spaces. See escape::nb_spaces_html. It’s a bit of a hack to make it work in most browsers/ereaders.

Trait Implementations§

source§

impl Debug for FrenchFormatter

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for FrenchFormatter

source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.