# BudouX Java Module
BudouX is a standalone, small, and language-neutral phrase segmenter tool that
provides beautiful and legible line breaks.
For more details about the project, please refer to the [project README](https://github.com/google/budoux/).
## Demo
<https://google.github.io/budoux>
## Usage
### Simple usage
You can get a list of phrases by feeding a sentence to the parser.
The easiest way is to get a parser is loading the default parser for each language.
```java
import com.google.budoux.Parser;
public class App
{
public static void main( String[] args )
{
Parser parser = Parser.loadDefaultJapaneseParser();
System.out.println(parser.parse("今日は良い天気ですね。"));
// [今日は, 良い, 天気ですね。]
}
}
```
#### Supported languages and their default parsers
- Japanese: `Parser.loadDefaultJapaneseParser()`
- Simplified Chinese: `Parser.loadDefaultSimplifiedChineseParser()`
- Traditional Chinese: `Parser.loadDefaultTraditionalChineseParser()`
- Thai: `Parser.loadDefaultThaiParser()`
### Working with HTML
If you want to use the result in a website, you can use the `translateHTMLString`
method to get an HTML string that wraps phrases with non-breaking markup,
speicifcally, zero-width space (U+200B).
```java
System.out.println(parser.translateHTMLString("今日は<strong>良い天気</strong>ですね。"));
//<span style="word-break: keep-all; overflow-wrap: anywhere;">今日は<strong>\u200b良い\u200b天気</strong>ですね。</span>
```
Please note that separators are denoted as `\u200b` in the example above for
illustrative purposes, but the actual output is an invisible string as it's a
zero-width space.
## Caveat
BudouX supports HTML inputs and outputs HTML strings with markup applied to wrap
phrases, but it's not meant to be used as an HTML sanitizer.
**BudouX doesn't sanitize any inputs.**
Malicious HTML inputs yield malicious HTML outputs.
Please use it with an appropriate sanitizer library if you don't trust the input.
## Author
[Shuhei Iitsuka](https://tushuhei.com)
## Disclaimer
This is not an officially supported Google product.