pdf_oxide 0.3.24

The fastest Rust PDF library with text extraction: 0.8ms mean, 100% pass rate on 3,830 PDFs. 5× faster than pdf_extract, 17× faster than oxidize_pdf. Extract, create, and edit PDFs.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
# C# Extension Methods Guide

PdfOxide provides 100+ extension methods for convenient PDF operations. All extension methods are in the `PdfOxide.Extensions` namespace.

## Document Extension Methods (PdfDocumentExtensions)

Shortcuts to create managers from documents:

```csharp
var doc = PdfDocument.Open("document.pdf");

doc.Pages()         // PageManager
doc.Search()        // SearchManager
doc.Extract()       // ExtractionManager
doc.Forms()         // FormManager
doc.Security()      // SecurityManager
doc.Outlines()      // OutlineManager
doc.Layers()        // LayerManager
doc.Metadata()      // MetadataManager
```

### Text Operations

```csharp
// Extract all text as single string
string allText = doc.ExtractAllText();

// Extract text from specific page
string pageText = doc.ExtractPageText(0);

// Search for text
var results = doc.SearchDocument("keyword");

// Check if text exists
bool found = doc.ContainsText("specific text");
```

### Content Operations

```csharp
// Extract all images
var images = doc.ExtractAllImages();

// Get form field names
var fields = doc.GetFormFieldNames();

// Check for forms
bool hasForms = doc.HasForms();

// Get bookmarks
var bookmarks = doc.GetOutlines();

// Check for bookmarks
bool hasOutlines = doc.HasOutlines();

// Get layer names
var layers = doc.GetLayerNames();

// Check for layers
bool hasLayers = doc.HasLayers();
```

### Metadata Operations

```csharp
// Get document title
string title = doc.GetTitle();

// Get document author
string author = doc.GetAuthor();

// Get metadata manager for more details
var metadata = doc.Metadata();
```

### Security Operations

```csharp
// Check encryption status
bool isEncrypted = doc.IsEncrypted();

// Check permissions
bool canPrint = doc.CanPrint();
bool canCopy = doc.CanCopy();
bool canModify = doc.CanModify();
bool canFillForms = doc.CanFillForms();
bool canAnnotate = doc.CanAnnotate();

// Check view-only status
bool isViewOnly = doc.IsViewOnly();

// Get security manager for detailed info
var security = doc.Security();
```

### Page Information

```csharp
// Check if empty
bool isEmpty = doc.IsEmpty();

// Check for multiple pages
bool hasMultiple = doc.HasMultiplePages();

// Get page information
var allPages = doc.GetPageInfo();

// Get first page index
int first = doc.GetFirstPageIndex();

// Get last page index
int last = doc.GetLastPageIndex();

// Get middle page index
int middle = doc.GetMiddlePageIndex();

// Validate page index
bool isValid = doc.IsValidPage(5);

// Access PageManager
var pages = doc.Pages();
```

## Page Extension Methods (PdfPageExtensions)

### Manager Shortcuts

```csharp
var page = doc.GetPage(0);

page.Annotations()  // AnnotationManager for this page
page.Content()      // ContentManager for this page
```

### Content Information

```csharp
// Get dimension summary string
string dims = page.GetDimensions();

// Check if page has content
bool hasContent = page.HasContent();

// Check if page is blank
bool isBlank = page.IsBlank();

// Get content types
var types = page.GetContentTypes();  // Returns StringCollection

// Get content summary
string summary = page.GetContentSummary();

// Get complexity score (0-100)
int complexity = page.GetComplexity();
```

### Content Detection

```csharp
// Check for forms
bool likelyHasForms = page.LikelyHasForms();

// Check for tables
bool likelyHasTables = page.LikelyHasTables();

// Check for images
bool likelyHasImages = page.LikelyHasImages();
```

### Annotation Operations

```csharp
// Get annotation count
int count = page.GetAnnotationCount();

// Check if has annotations
bool hasAnnotations = page.HasAnnotations();

// Get annotation types
var types = page.GetAnnotationTypes();  // Returns StringCollection
```

### Dimension Methods

```csharp
// Get width in points
float width = page.GetWidth();

// Get height in points
float height = page.GetHeight();

// Check orientation
bool isPortrait = page.IsPortrait();
bool isLandscape = page.IsLandscape();

// Get aspect ratio
float ratio = page.GetAspectRatio();

// Get area
float area = page.GetArea();

// Get standard size name
string size = page.GetStandardSize();  // "Letter", "A4", etc.

// Compare with another page
bool sameSzie = page.SameSizeAs(otherPage);
```

## LINQ Collection Extension Methods (PdfLinqExtensions)

### String Extensions

```csharp
IEnumerable<string> strings = new[] { "Hello", "Help", "Goodbye" };

// Case-insensitive substring search
var results = strings.ContainsSubstring("hel");

// Filter non-empty/whitespace strings
var nonEmpty = strings.WhereNotEmpty();
```

### Search Result Extensions

```csharp
IEnumerable<SearchResult> results = searchCollection;

// Filter by page
var page5 = results.OnPage(5);

// Filter by page range
var pages5to10 = results.OnPageRange(5, 10);

// Get unique pages
var uniquePages = results.GetUniquePages();

// Sort by page and position
var sorted = results.ByPageAndPosition();

// Group by page
var grouped = results.GroupByPage();
```

### Image Extensions

```csharp
IEnumerable<ExtractedImage> images = imageCollection;

// Filter by format
var jpegs = images.WithFormat("JPEG");

// Filter by page
var page3Images = images.FromPage(3);

// Filter by size
var large = images.MinimumSize(1000, 1000);
var small = images.MaximumSize(500, 500);

// Filter by DPI
var highRes = images.MinimumDpi(300);
var lowRes = images.MaximumDpi(150);

// Sort by size (largest first)
var bySize = images.BySize();

// Group by page
var byPage = images.GroupByPage();

// Group by format
var byFormat = images.GroupByFormat();

// Calculate total size
long bytes = images.TotalSizeBytes();
double mb = images.TotalSizeMb();
```

### Page Extension

```csharp
IEnumerable<PageInfo> pages = pageCollection;

// Get page range
var range = pages.InRange(5, 10);

// Get every nth page
var everyOther = pages.EveryNthPage(2);
```

### Layer Extensions

```csharp
IEnumerable<LayerVisibility> layers = layerCollection;

// Filter visible
var visible = layers.WhereVisible();

// Filter hidden
var hidden = layers.WhereHidden();

// Filter by name
var matching = layers.WithNameContaining("layer");

// Sort
var sorted = layers.OrderByName();
var sortedByVis = layers.OrderByVisibility();
```

### Outline Extensions

```csharp
IEnumerable<OutlineItem> outlines = outlineCollection;

// Filter by page
var page10 = outlines.TargetingPage(10);

// Filter by page range
var pages5to15 = outlines.TargetingPageRange(5, 15);

// Filter with children
var parents = outlines.WhereHasChildren();

// Filter leaf nodes
var leaves = outlines.WhereLeaf();

// Filter by expansion state
var expanded = outlines.WhereExpanded();
var collapsed = outlines.WhereCollapsed();

// Sort
var byPage = outlines.OrderByPage();
var byTitle = outlines.OrderByTitle();

// Group
var grouped = outlines.GroupByPage();
```

## Fluent Query Extensions (FluentQueryExtensions)

### Custom Filtering

```csharp
var images = doc.ExtractAllImages();

// Custom where clause
var filtered = images.Where(img => img.Dpi > 200 && img.Width > 500);

// Projection (Select)
var sizes = images.Select(img => new { img.Width, img.Height });
```

### Aggregates

```csharp
var results = doc.Search().SearchAll("text");

// Check if any match condition
bool hasLarge = results.Any(r => r.Text.Length > 100);

// Count matching
int largeCount = results.Count(r => r.Text.Length > 100);
```

### Pagination

```csharp
var images = doc.ExtractAllImages();

// Skip elements
var skipped = images.Skip(5);

// Take elements
var first10 = images.Take(10);

// Combined
var page2 = images.Skip(10).Take(10);
```

### Statistics

```csharp
var images = doc.ExtractAllImages()
    .WithFormat("PNG")
    .MinimumDpi(150);

var stats = images.GetStatistics();
Console.WriteLine($"Count: {stats.TotalCount}");
Console.WriteLine($"Size: {stats.TotalSizeMb:F2}MB");
Console.WriteLine($"Avg DPI: {stats.AverageDpi:F0}");
Console.WriteLine($"Formats: {stats.UniqueForms}");
Console.WriteLine($"Pages: {stats.UniquePages}");

// Search result statistics
var searchStats = doc.Search()
    .SearchAll("important")
    .GetStatistics();
Console.WriteLine($"Results: {searchStats.TotalResults}");
Console.WriteLine($"Pages: {searchStats.UniquePages}");
```

## Real-World Examples

### Analyze Document Content

```csharp
using (var doc = PdfDocument.Open("report.pdf"))
{
    // Overview
    Console.WriteLine($"Pages: {doc.PageCount}");
    Console.WriteLine($"Title: {doc.GetTitle()}");
    Console.WriteLine($"Author: {doc.GetAuthor()}");
    Console.WriteLine($"Encrypted: {doc.IsEncrypted()}");
    Console.WriteLine($"Can Print: {doc.CanPrint()}");

    // Find specific content
    bool hasKeyword = doc.ContainsText("critical");
    var images = doc.ExtractAllImages()
        .GetStatistics();
    Console.WriteLine($"Images: {images}");

    // Analyze pages
    var pages = doc.GetPageInfo();
    var firstPage = pages.FirstPage;
    var lastPage = pages.LastPage;
    Console.WriteLine($"Range: {firstPage.Number}-{lastPage.Number}");
}
```

### Extract and Process Images

```csharp
var images = doc.ExtractAllImages()
    .WithFormat("JPEG")
    .MinimumDpi(300)
    .SortBySize();

foreach (var image in images.Take(5))
{
    Console.WriteLine($"Page {image.PageIndex}: {image.Width}x{image.Height}@{image.Dpi}DPI");
    File.WriteAllBytes($"image_{image.PageIndex}.jpg", image.Data);
}
```

### Search and Analyze

```csharp
var results = doc.SearchDocument("important")
    .Where(r => r.Text.Length > 10)
    .GetStatistics();

Console.WriteLine($"Found '{results}}'");

var byPage = doc.SearchDocument("term")
    .GroupByPage();
foreach (var pageGroup in byPage)
{
    Console.WriteLine($"Page {pageGroup.Key}: {pageGroup.Count()} occurrences");
}
```

### Analyze Structure

```csharp
if (doc.HasOutlines())
{
    var outline = doc.GetOutlines()
        .Flattened()
        .Where(o => o.TargetPageIndex > 0);

    foreach (var item in outline)
    {
        Console.WriteLine($"{item.Title} → Page {item.TargetPageIndex + 1}");
    }
}

if (doc.HasLayers())
{
    var layers = doc.Layers()
        .GetAllLayerVisibility()
        .Visible()
        .OrderByName();

    foreach (var layer in layers)
    {
        Console.WriteLine($"✓ {layer.Name}");
    }
}
```

## See Also

- [LINQ Support Guide]LINQ_SUPPORT.md
- [Collections Reference]COLLECTIONS_REFERENCE.md
- [API Reference]API_REFERENCE.md
- [Migration Guide]MIGRATION_GUIDE.md