markdown-ai-cite-remove 0.3.0

High-performance removal of AI-generated citations and annotations from Markdown text
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
# Frequently Asked Questions (FAQ)

## Installation & Setup

### Q: What version of Rust do I need?

**A:** Rust 1.70 or later. Check your version with:
```bash
rustc --version
```

If you need to update:
```bash
rustup update stable
```

### Q: How do I install the CLI tool?

**A:** Two options:

```bash
# From crates.io (when published)
cargo install markdown-ai-cite-remove

# From source
git clone https://github.com/opensite-ai/markdown-ai-cite-remove.git
cd markdown-ai-cite-remove
cargo install --path .
```

### Q: The `mdcr` command isn't found after installation

**A:** Add Cargo's bin directory to your PATH:

```bash
# Add to ~/.bashrc, ~/.zshrc, or equivalent
export PATH="$HOME/.cargo/bin:$PATH"

# Reload your shell
source ~/.bashrc  # or source ~/.zshrc
```

### Q: Do I need Gnuplot installed?

**A:** No, it's optional. Gnuplot is only used for benchmark visualization. Benchmarks work fine without it, but you'll see a harmless warning message. To install:

```bash
# macOS
brew install gnuplot

# Ubuntu/Debian
sudo apt-get install gnuplot
```

## Testing & Benchmarking

### Q: Why do tests show as "ignored" when I run `cargo bench`?

**A:** This is **completely normal** Rust behavior. When running benchmarks, the test harness automatically skips regular tests to avoid interfering with timing measurements. All tests still pass when you run `cargo test`.

To verify:
```bash
cargo test  # All 58 tests should pass
```

### Q: What are "outliers" in benchmark results?

**A:** Outliers are measurements that deviate from the typical execution time. They're caused by:
- Operating system scheduling
- CPU frequency scaling
- Background processes
- Cache effects

**3-13% outliers is completely normal** and doesn't indicate a problem. Criterion automatically detects and excludes them from statistics.

### Q: How do I know if my changes improved performance?

**A:** Use baseline comparison:

```bash
# Save current performance
cargo bench -- --save-baseline before

# Make your changes...

# Compare
cargo bench -- --baseline before
```

Look for the "change" line in output. Changes with p < 0.05 are statistically significant.

### Q: Can I run benchmarks faster?

**A:** Yes, use quick mode:

```bash
cargo bench -- --quick
```

This reduces sample size for faster results (less accurate but good for development).

### Q: How do I view benchmark visualizations?

**A:** After running `cargo bench`, Criterion generates HTML reports with charts:

```bash
# View main report
open target/criterion/report/index.html        # macOS
xdg-open target/criterion/report/index.html    # Linux
start target/criterion/report/index.html       # Windows
```

The reports include:
- Line charts showing performance over time
- Violin plots showing distribution
- Statistical analysis (PDF/CDF plots)
- Comparison charts (if using baselines)

**Note**: Gnuplot must be installed for the best visualizations. See installation instructions in the main README.

## Usage

### Q: How do I process a file with auto-generated output?

**A:** Just provide the input filename:

```bash
mdcr ai_response.md
# Creates: ai_response__cite_removed.md
```

### Q: How do I clean a file in-place?

**A:** Use a temporary file:

```bash
mdcr input.md -o temp.md && mv temp.md input.md
```

Or in a script:
```bash
#!/bin/bash
for file in *.md; do
  mdcr "$file" -o "$file.tmp"
  mv "$file.tmp" "$file"
done
```

### Q: Can I process multiple files at once?

**A:** Yes, several ways:

```bash
# Auto-generated outputs (easiest!)
for file in *.md; do
  mdcr "$file"
done

# Custom naming
for file in *.md; do
  mdcr "$file" -o "cleaned_${file}"
done

# Find
find . -name "*.md" -exec mdcr {} \;

# Parallel (if installed)
ls *.md | parallel mdcr {}
```

See [CLI_GUIDE.md](CLI_GUIDE.md) for more examples.

### Q: Does it work with stdin/stdout?

**A:** Yes! Perfect for piping:

```bash
# From stdin
echo "Text[1] here." | mdcr

# Pipe from file
cat document.md | mdcr

# Chain commands
cat document.md | mdcr | pandoc -f markdown -t html
```

### Q: How do I verify citations were removed?

**A:** Use verbose mode:

```bash
mdcr input.md --verbose
```

This shows input/output sizes. Or compare files:
```bash
diff input.md input__cite_removed.md
```

## Library Usage

### Q: How do I use this in my Rust project?

**A:** Add to `Cargo.toml`:

```toml
[dependencies]
markdown-ai-cite-remove = "0.1"
```

Then in your code:
```rust
use markdown_ai_cite_remove::clean;

let result = remove_citations("Text[1] here.");
```

### Q: Can I customize what gets removed?

**A:** Yes, use custom configuration:

```rust
use markdown_ai_cite_remove::{CitationRemover, RemoverConfig};

// Remove only inline citations
let config = RemoverConfig::inline_only();
let remover = CitationRemover::with_config(config);
let result = cleaner.remove_citations("Text[1].\n\n[1]: https://example.com");

// Custom configuration
let config = RemoverConfig {
    remove_inline_citations: true,
    remove_reference_links: false,
    // ... other options
    ..Default::default()
};
```

### Q: Is it thread-safe?

**A:** Yes! The cleaner is stateless and can be safely shared across threads:

```rust
use std::sync::Arc;
use markdown_ai_cite_remove::CitationRemover;

let remover = Arc::new(CitationRemover::new());

// Use in multiple threads
let cleaner_clone = cleaner.clone();
std::thread::spawn(move || {
    let result = cleaner_clone.remove_citations("Text[1]");
});
```

### Q: Can I reuse a cleaner instance?

**A:** Yes, and it's recommended for batch processing:

```rust
let remover = CitationRemover::new();

for document in documents {
    let result = cleaner.remove_citations(&document);
    // Process cleaned document...
}
```

## Performance

### Q: How fast is it?

**A:** Very fast! Typical performance:
- Simple documents: ~580 ns (sub-microsecond)
- Complex documents: ~2-20 μs
- Large documents (50+ KB): ~100-300 μs
- Throughput: 100-650 MB/s

### Q: Does it allocate memory?

**A:** Minimal allocations. The regex patterns are compiled once and reused. Each removing citations operation allocates only for the output string.

### Q: Can it handle large files?

**A:** Yes! Performance scales linearly with file size. A 50 KB document processes in ~250 μs.

### Q: Is it faster than other solutions?

**A:** Yes, significantly. Rust's zero-cost abstractions and optimized regex engine make it much faster than Python/JavaScript alternatives.

## Features & Limitations

### Q: What citation formats are supported?

**A:** All common AI citation formats:
- Inline numeric: `[1][2][3]`
- Named: `[source:1][ref:2][cite:3][note:4]`
- Reference links: `[1]: https://...`
- Reference headers: `## References`, `# Citations`, etc.
- Bibliographic entries: `[1] Author (2024). Title...`

### Q: Does it preserve markdown formatting?

**A:** Yes! It preserves:
- Bold, italic, code
- Links and images
- Lists and tables
- Headings
- Code blocks

### Q: Does it remove citations from code blocks?

**A:** Currently yes (known limitation in v0.1). This is acceptable for most use cases since citations in code blocks are rare. Future versions may add code block detection.

### Q: What about citations in inline code?

**A:** They're removed. If you need to preserve `[1]` in inline code, this is a limitation of v0.1.

### Q: Can I add custom citation patterns?

**A:** Not in v0.1, but this is planned for future releases. Current patterns cover all major AI tools (ChatGPT, Claude, Perplexity, Gemini).

## Troubleshooting

### Q: I'm getting compilation errors

**A:** Check your Rust version:
```bash
rustc --version  # Should be 1.70+
rustup update stable
```

### Q: Tests are failing

**A:** Make sure you're running the full test suite:
```bash
cargo test --all-features
```

If tests still fail, please open an issue with the error output.

### Q: Benchmarks show inconsistent results

**A:** This is normal. For more consistent results:
1. Close unnecessary applications
2. Ensure laptop is plugged in
3. Let system cool down between runs
4. Run multiple times and compare

### Q: The output looks wrong

**A:** Please open an issue with:
1. Input markdown
2. Expected output
3. Actual output
4. Command or code used

## Contributing

### Q: How can I contribute?

**A:** Contributions welcome! You can:
- Report bugs
- Suggest features
- Submit pull requests
- Improve documentation
- Add test cases

See the main [README.md](../../README.md) for development setup.

### Q: What's the development workflow?

**A:**
```bash
# Clone and setup
git clone https://github.com/opensite-ai/markdown-ai-cite-remove.git
cd markdown-ai-cite-remove

# Make changes...

# Run tests
cargo test

# Run benchmarks
cargo bench

# Format code
cargo fmt

# Run linter
cargo clippy

# Build docs
cargo doc --open
```

### Q: How do I add a new test?

**A:** Add to `tests/integration_tests.rs`:

```rust
#[test]
fn test_my_new_case() {
    let input = "Your test input[1]";
    let expected = "Your expected output";
    assert_eq!(remove_citations(input), expected);
}
```

Then run:
```bash
cargo test test_my_new_case
```

## Getting Help

### Q: Where can I get more help?

**A:**
- **Documentation**: [README.md]../../README.md, [CLI_GUIDE.md]CLI_GUIDE.md, [BENCHMARKING.md]../performance/BENCHMARKING.md
- **API Docs**: Run `cargo doc --open`
- **Examples**: Check the `examples/` directory
- **Issues**: Open an issue on GitHub
- **Email**: contact@opensite.ai

### Q: How do I report a bug?

**A:** Open a GitHub issue with:
1. Description of the problem
2. Steps to reproduce
3. Expected vs actual behavior
4. Your environment (OS, Rust version)
5. Minimal code example

### Q: Is there a Discord/Slack?

**A:** Not yet, but we're considering it based on community interest. For now, use GitHub issues for questions and discussions.