sakurs-cli
Fast, parallel sentence boundary detection for the command line.
Table of Contents
Installation
After installation, the sakurs command will be available in your PATH.
Quick Start
# Process text files
# Process multiple files with glob pattern
# Process from stdin
|
# Output as JSON
Features
- Parallel Processing: Automatically utilizes multiple CPU cores for optimal performance
- Multiple Output Formats: Plain text, JSON, or quiet mode for different use cases
- Language Support: Built-in configurations for English and Japanese
Usage Examples
Basic File Processing
# Process a single file
# Process with specific language
Batch Processing
# Process all text files in a directory
# Recursive processing with complex patterns
Output Formats
# Default format (human-readable)
# JSON format for programmatic use
# Quiet mode (only sentence count)
Performance Tuning
For large files, you can tune performance:
# Use 8 threads with 1MB chunks
# Sequential processing (useful for debugging)
Command Reference
sakurs process [OPTIONS]
OPTIONS:
-i, --input <INPUT> Input file(s) or '-' for stdin
-o, --output <OUTPUT> Output file (default: stdout)
-f, --format <FORMAT> Output format [default: text]
[possible values: text, json, quiet]
-l, --language <LANGUAGE> Language for sentence detection [default: en]
[possible values: en, ja, english, japanese]
--sequential Force sequential processing
--parallel Force parallel processing (default: auto)
--threads <N> Number of threads (default: CPU count)
--chunk-kb <SIZE> Chunk size in KB [default: 256]
-h, --help Print help
-V, --version Print version
Examples
Processing Japanese Text
Analyzing Code Documentation
# Extract sentences from all README files
Pipeline Integration
# Count sentences in git commit messages
|
# Extract sentences from specific files
License
MIT License. See LICENSE for details.