url-preview 0.6.0

High-performance URL preview generator for messaging and social media applications
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
# Security Features Documentation

## Overview

The `url-preview` library implements comprehensive security measures to protect against common web scraping vulnerabilities and ensure safe URL preview generation. This document details all security features available in version 0.5.0 and above.

## Table of Contents

1. [URL Validation]#url-validation
2. [Content Security]#content-security
3. [Network Security]#network-security
4. [Error Handling]#error-handling
5. [Usage Examples]#usage-examples
6. [Best Practices]#best-practices

## URL Validation

### Overview

The library provides robust URL validation through the `UrlValidator` class, which prevents access to potentially dangerous URLs and implements SSRF (Server-Side Request Forgery) protection.

### Features

#### 1. **Scheme Validation**
- Default allowed schemes: `http`, `https`
- Blocks dangerous schemes like `file://`, `ftp://`, `javascript:`, etc.
- Configurable through `UrlValidationConfig`

#### 2. **Private IP Blocking**
- Blocks access to private IP ranges (RFC 1918)
- Prevents SSRF attacks targeting internal networks
- Includes IPv4 ranges:
  - 10.0.0.0/8
  - 172.16.0.0/12
  - 192.168.0.0/16
  - 169.254.0.0/16 (link-local)
  - 100.64.0.0/10 (carrier-grade NAT)
- Includes IPv6 ranges:
  - fe80::/10 (link-local)
  - fc00::/7 (unique local)

#### 3. **Localhost Blocking**
- Blocks `localhost`, `127.0.0.1`, `::1`
- Prevents access to local services

#### 4. **Domain Filtering**
- **Blacklist**: Block specific domains or subdomains
- **Whitelist**: Only allow specific domains (when configured)
- Subdomain matching included

#### 5. **Redirect Control**
- Configurable maximum redirect count (default: 10)
- Prevents redirect loops and excessive redirections

### Configuration

```rust
use url_preview::{UrlValidationConfig, UrlValidator};
use std::collections::HashSet;

let mut config = UrlValidationConfig::default();

// Customize allowed schemes
config.allowed_schemes.insert("https".to_string());
config.allowed_schemes.remove("http"); // HTTPS only

// Add blocked domains
config.blocked_domains.insert("malicious.com".to_string());
config.blocked_domains.insert("phishing.site".to_string());

// Or use whitelist mode
config.allowed_domains.insert("trusted.com".to_string());
config.allowed_domains.insert("mycompany.com".to_string());

// Adjust security settings
config.block_private_ips = true;
config.block_localhost = true;
config.max_redirects = 5;

let validator = UrlValidator::new(config);
```

## Content Security

### Overview

The library implements content security measures to prevent resource exhaustion and ensure safe content processing.

### Features

#### 1. **Content Size Limits**
- Default maximum: 10MB
- Prevents memory exhaustion from large files
- Enforced both via Content-Length header and streaming

#### 2. **Download Time Limits**
- Default maximum: 30 seconds
- Prevents slow-loris style attacks
- Ensures timely response

#### 3. **Content Type Filtering**
- Default allowed types:
  - `text/html`
  - `application/xhtml+xml`
  - `text/plain`
  - `application/json`
- Prevents downloading binary files or unexpected content

### Configuration

```rust
use url_preview::{ContentLimits, FetcherConfig};
use std::collections::HashSet;

let mut content_limits = ContentLimits::default();

// Adjust size limit (5MB)
content_limits.max_content_size = 5 * 1024 * 1024;

// Adjust time limit (15 seconds)
content_limits.max_download_time = 15;

// Customize allowed content types
content_limits.allowed_content_types.clear();
content_limits.allowed_content_types.insert("text/html".to_string());
content_limits.allowed_content_types.insert("application/json".to_string());

let config = FetcherConfig {
    content_limits,
    ..Default::default()
};
```

## Network Security

### Built-in Features

#### 1. **Timeout Protection**
- Default request timeout: 10 seconds
- Twitter-specific timeout: 30 seconds
- Configurable per-request

#### 2. **Connection Pooling**
- Maximum 10 idle connections per host
- Prevents connection exhaustion

#### 3. **Retry Logic**
- Maximum 3 retries with exponential backoff
- Only retries on server errors (5xx) and timeouts
- Prevents amplification attacks

#### 4. **Concurrent Request Limiting**
- Default limit: 500 concurrent requests
- Prevents resource exhaustion
- Configurable via `PreviewServiceConfig`

### SSL/TLS Security

The library uses `reqwest`'s default TLS configuration, which includes:
- TLS 1.2+ support
- Certificate validation
- Hostname verification

## Error Handling

### Security-Specific Error Types

The library provides detailed error types for security-related issues:

```rust
pub enum PreviewError {
    // URL validation errors
    InvalidUrlScheme(String),
    InvalidUrl(String),
    DomainNotAllowed(String),
    DomainBlocked(String),
    LocalhostBlocked,
    PrivateIpBlocked(String),
    
    // Content security errors
    ContentSizeExceeded { size: usize, limit: usize },
    DownloadTimeExceeded { elapsed: u64, limit: u64 },
    ContentTypeNotAllowed(String),
    
    // Other security-relevant errors
    TimeoutError(String),
    DnsError(String),
    ConnectionError(String),
}
```

### Error Logging

With the `logging` feature enabled, all security events are logged appropriately:
- URL validation failures: `WARN` level
- Content security violations: `WARN` level
- Network errors: `ERROR` level

## Usage Examples

### Basic Usage with Default Security

```rust
use url_preview::{PreviewService, PreviewError};

#[tokio::main]
async fn main() -> Result<(), PreviewError> {
    // Default configuration includes all security features
    let service = PreviewService::new();
    
    // This will fail due to localhost blocking
    match service.generate_preview("http://localhost:8080").await {
        Err(PreviewError::LocalhostBlocked) => {
            println!("Localhost access blocked as expected");
        }
        _ => unreachable!(),
    }
    
    // This will fail due to private IP blocking
    match service.generate_preview("http://192.168.1.1").await {
        Err(PreviewError::PrivateIpBlocked(_)) => {
            println!("Private IP access blocked as expected");
        }
        _ => unreachable!(),
    }
    
    Ok(())
}
```

### Custom Security Configuration

```rust
use url_preview::{PreviewService, PreviewServiceConfig, FetcherConfig};
use url_preview::{UrlValidationConfig, ContentLimits};
use std::collections::HashSet;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create custom URL validation
    let mut url_validation = UrlValidationConfig::default();
    url_validation.allowed_domains.insert("github.com".to_string());
    url_validation.allowed_domains.insert("twitter.com".to_string());
    url_validation.max_redirects = 3;
    
    // Create custom content limits
    let mut content_limits = ContentLimits::default();
    content_limits.max_content_size = 5 * 1024 * 1024; // 5MB
    content_limits.max_download_time = 20; // 20 seconds
    
    // Configure fetcher
    let fetcher_config = FetcherConfig {
        url_validation,
        content_limits,
        ..Default::default()
    };
    
    // Create service with custom config
    let service_config = PreviewServiceConfig {
        fetcher_config: Some(fetcher_config),
        ..Default::default()
    };
    
    let service = PreviewService::with_config(service_config);
    
    // Only github.com and twitter.com URLs will work
    let preview = service.generate_preview("https://github.com/rust-lang/rust").await?;
    println!("Title: {:?}", preview.title);
    
    Ok(())
}
```

### Handling Security Errors

```rust
use url_preview::{PreviewService, PreviewError};

async fn safe_preview(url: &str) -> Result<String, String> {
    let service = PreviewService::new();
    
    match service.generate_preview(url).await {
        Ok(preview) => Ok(preview.title.unwrap_or_else(|| "No title".to_string())),
        Err(e) => match e {
            PreviewError::InvalidUrlScheme(scheme) => {
                Err(format!("Unsafe URL scheme: {}", scheme))
            }
            PreviewError::PrivateIpBlocked(ip) => {
                Err(format!("Access to private IP {} is blocked", ip))
            }
            PreviewError::LocalhostBlocked => {
                Err("Access to localhost is blocked".to_string())
            }
            PreviewError::DomainBlocked(domain) => {
                Err(format!("Domain {} is blocked", domain))
            }
            PreviewError::ContentSizeExceeded { size, limit } => {
                Err(format!("Content too large: {} bytes (limit: {})", size, limit))
            }
            PreviewError::ContentTypeNotAllowed(ct) => {
                Err(format!("Content type {} not allowed", ct))
            }
            _ => Err(format!("Preview failed: {}", e)),
        }
    }
}
```

## Best Practices

### 1. **Always Use HTTPS When Possible**

```rust
let mut config = UrlValidationConfig::default();
config.allowed_schemes.clear();
config.allowed_schemes.insert("https".to_string());
```

### 2. **Implement Domain Whitelisting for Production**

```rust
let mut config = UrlValidationConfig::default();
// Only allow specific trusted domains
config.allowed_domains.insert("trusted-news-site.com".to_string());
config.allowed_domains.insert("official-blog.com".to_string());
```

### 3. **Adjust Limits Based on Use Case**

```rust
// For previewing only lightweight pages
let mut limits = ContentLimits::default();
limits.max_content_size = 1 * 1024 * 1024; // 1MB
limits.max_download_time = 10; // 10 seconds

// For specific content types only
limits.allowed_content_types.clear();
limits.allowed_content_types.insert("text/html".to_string());
```

### 4. **Monitor Security Events**

```rust
use url_preview::setup_logging;

// Enable logging to monitor security events
setup_logging(None);

// Security events will be logged at appropriate levels
```

### 5. **Regular Security Updates**

- Keep the library updated to get latest security patches
- Review and update domain lists periodically
- Monitor for new attack vectors and adjust configuration

## Security Considerations

### What This Library Protects Against

1. **SSRF (Server-Side Request Forgery)**
   - Private IP filtering
   - Localhost blocking
   - Scheme validation

2. **Resource Exhaustion**
   - Content size limits
   - Download time limits
   - Connection pooling
   - Concurrent request limiting

3. **Malicious Redirects**
   - Redirect count limiting
   - Each redirect is validated

4. **Unexpected Content**
   - Content type filtering
   - UTF-8 validation

### What This Library Does NOT Protect Against

1. **JavaScript Execution**
   - The library only parses HTML, it doesn't execute JavaScript
   - Dynamic content loaded by JavaScript won't be captured

2. **Advanced SSRF Techniques**
   - DNS rebinding attacks
   - Time-of-check to time-of-use (TOCTOU) attacks
   - IPv6 address confusion

3. **Content Validation**
   - The library doesn't sanitize extracted content
   - XSS prevention is the responsibility of the consuming application

4. **Rate Limiting Per Domain**
   - The library limits total concurrent requests but not per-domain
   - Implement additional rate limiting if needed

## Conclusion

The `url-preview` library provides comprehensive security features suitable for most URL preview generation use cases. By following the configuration guidelines and best practices outlined in this document, you can ensure safe and secure preview generation in your applications.

For additional security requirements or custom implementations, consider extending the validation logic or implementing additional layers of security in your application.