[−][src]Struct crowbook_text_processing::FrenchFormatter
French typographic formatter.
The purpose of this struct is to try to make a text more typographically correct, according to french typographic rules. This means:
- making spaces before
?,!,;narrow non-breaking space; - making spaces before
:non-breaking space; - making space after
—for dialog a demi em space; - making spaces after
«and before»non-breking space or narrow non-breking space, according to the circumstances (dialog or a few quoted words). - making spaces in numbers, e.g.
80 000or50 €narrow and non-breaking.
Additionally, this feature use functions that are "generic" (not specific to french language) in order to:
- replace straight quotes (
'and") with curly, typographic ones; - replace ellipsis (
...) with the unicode character (…).
As some of these features require a bit of guessing sometimes, there are some paremeters that can be set if you want better results.
Example
use crowbook_text_processing::FrenchFormatter; let input = "Un texte à 'formater', n'est-ce pas ?"; let output = FrenchFormatter::new() .typographic_ellipsis(false) // don't replace ellipsis .format_tex(input); // format to tex (so non-breaking // spaces are visible in assert_eq!) assert_eq!(&output, "Un texte à ‘formater’, n’est-ce pas\\,?");
Methods
impl FrenchFormatter[src]
pub fn new() -> Self[src]
Create a new FrenchFormatter with default settings
pub fn threshold_currency(&mut self, t: usize) -> &mut Self[src]
Sets the threshold currency.
After that number of characters, assume it's not a currency
Default is 3.
pub fn threshold_unit(&mut self, t: usize) -> &mut Self[src]
Sets the threshold for unit.
After that number of characters, assume it's not an unit.
Default is 2.
pub fn threshold_quote(&mut self, t: usize) -> &mut Self[src]
Sets the threshold for quote.
After that number of characters, assume it's not a quote of a single word or a few words, but a dialog.
Default is 20.
pub fn threshold_real_word(&mut self, t: usize) -> &mut Self[src]
Sets the threshold for real word.
After that number of characters, assume it's not an abbreviation
but a real word (used to determine if . marks the end of a sentence
or just a title such as M. Dupuis.
Default is 3
pub fn typographic_quotes(&mut self, b: bool) -> &mut Self[src]
Enables the typographic quotes replacement.
If true, "L'" will be replaced by "L’"
Default is true
pub fn typographic_ellipsis(&mut self, b: bool) -> &mut Self[src]
Enables typographic ellipsis replacement.
If true, "..." will be replaced by "…"
Default is true
pub fn ligature_dashes(&mut self, b: bool) -> &mut Self[src]
If set to true, replaces --to – and --- to —.
Default is false.
pub fn ligature_guillemets(&mut self, b: bool) -> &mut Self[src]
If set to true, replaces << to « and >> to ».
Default is false.
pub fn format<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>[src]
(Try to) Format a string according to french typographic rules.
This method should be called for each paragraph, as it makes some suppositions that the beginning of the string also means the beginning of a line.
This method calls remove_whitespaces internally, as it relies on it.
Example
use crowbook_text_processing::FrenchFormatter; let f = FrenchFormatter::new(); let s = f.format("« Est-ce bien formaté ? » se demandait-elle — les espaces \ insécables étaient tellement compliquées à gérer, dans cette langue !"); println!("{}", s);
pub fn format_tex<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>[src]
(Try to) Format a string according to french typographic rules, escape the characters that need to be escaped in LaTeX (e.g. backslashes) and use TeX commands ("~", "\enspace" "and ",") for non-breaking spaces so it works correctly with some LaTeX versions (and it makes the non-breaking spaces shenanigans more visible with most editors)
Example
use crowbook_text_processing::FrenchFormatter; let f = FrenchFormatter::new(); let s = f.format_tex("« Est-ce bien formaté ? »"); assert_eq!(&s, "«~Est-ce bien formaté\\,?~»");
pub fn format_html<'a, S: Into<Cow<'a, str>>>(&self, input: S) -> Cow<'a, str>[src]
(Try to) Format a string according to french typographic rules, and escape the characters
that need to be escaped in HTML (e.g. &). Also use HTML commands instead
of unicode for narrow non-breaking spaces. See escape::nb_spaces_html. It's a bit of a hack
to make it work in most browsers/ereaders.
Trait Implementations
impl Debug for FrenchFormatter[src]
impl Default for FrenchFormatter[src]
Auto Trait Implementations
impl RefUnwindSafe for FrenchFormatter
impl Send for FrenchFormatter
impl Sync for FrenchFormatter
impl Unpin for FrenchFormatter
impl UnwindSafe for FrenchFormatter
Blanket Implementations
impl<T> Any for T where
T: 'static + ?Sized, [src]
T: 'static + ?Sized,
impl<T> Borrow<T> for T where
T: ?Sized, [src]
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized, [src]
T: ?Sized,
fn borrow_mut(&mut self) -> &mut T[src]
impl<T> From<T> for T[src]
impl<T, U> Into<U> for T where
U: From<T>, [src]
U: From<T>,
impl<T, U> TryFrom<U> for T where
U: Into<T>, [src]
U: Into<T>,
type Error = Infallible
The type returned in the event of a conversion error.
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>[src]
impl<T, U> TryInto<U> for T where
U: TryFrom<T>, [src]
U: TryFrom<T>,