1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
// SPDX-License-Identifier: Apache-2.0
// Copyright © 2025 Au-Zone Technologies. All Rights Reserved.
//! Retry policies with URL-based classification for EdgeFirst Studio Client.
//!
//! # Overview
//!
//! This module implements intelligent retry logic that classifies requests into
//! two categories:
//!
//! - **StudioApi**: EdgeFirst Studio JSON-RPC API calls
//! (`*.edgefirst.studio/api`)
//! - **FileIO**: File upload/download operations (AWS S3 pre-signed URLs,
//! CloudFront, etc.)
//!
//! # Motivation
//!
//! Different types of operations have different failure characteristics and
//! retry requirements:
//!
//! ## Studio API Requests
//!
//! - **Low concurrency**: Sequential JSON-RPC method calls
//! - **Fast-fail desired**: Authentication failures should not retry
//! - **Predictable errors**: HTTP 401/403 indicate auth issues, not transient
//! failures
//! - **User experience**: Users expect quick feedback on invalid credentials
//!
//! ## File I/O Operations (S3, CloudFront)
//!
//! - **High concurrency**: Parallel uploads/downloads of dataset files (100+
//! files)
//! - **Transient failures common**: S3 rate limiting, network congestion,
//! timeouts
//! - **Retry-safe**: Idempotent operations (pre-signed URLs, multipart uploads)
//! - **Robustness critical**: Dataset operations must complete reliably despite
//! temporary issues
//!
//! # Classification Strategy
//!
//! URLs are classified by inspecting the host and path:
//!
//! - **StudioApi**: `https://*.edgefirst.studio/api*` (exact host match + path
//! prefix)
//! - **FileIO**: Everything else (S3, CloudFront, or any non-API Studio path)
//!
//! # Retry Behavior
//!
//! Both scopes use the same configurable retry count (`EDGEFIRST_MAX_RETRIES`,
//! default: 3), but differ in error classification:
//!
//! # Environment Variables
//!
//! - `EDGEFIRST_MAX_RETRIES`: Maximum number of retries for failed requests
//! (default: 5)
//! - `MAX_TASKS`: Maximum concurrent upload/download tasks (default: half of
//! CPU cores, min 2, max 8). Lower values (2-8) work better for large files
//! to avoid timeouts. Higher values (16-32) are better for many small files.
//!
//! ## StudioApi Error Classification
//!
//! - **Never retry**: 401 Unauthorized, 403 Forbidden (auth failures)
//! - **Always retry**: 408 Timeout, 429 Too Many Requests, 5xx Server Errors
//! - **Retry transports errors**: Connection failures, DNS errors, timeouts
//!
//! ## FileIO Error Classification
//!
//! - **Always retry**: 408 Timeout, 409 Conflict, 423 Locked, 429 Too Many
//! Requests, 5xx Server Errors
//! - **Retry transport errors**: Connection failures, DNS errors, timeouts
//! - **No auth bypass**: All HTTP errors (including 401/403) are retried for S3
//! URLs
//!
//! # Configuration
//!
//! - `EDGEFIRST_MAX_RETRIES`: Maximum retry attempts per request (default: 5)
//! - `EDGEFIRST_TIMEOUT`: Request timeout in seconds (default: 30)
//!
//! **For bulk file operations**, increase retry count for better resilience:
//! ```bash
//! export EDGEFIRST_MAX_RETRIES=10 # More retries for S3 operations
//! export EDGEFIRST_TIMEOUT=60 # Longer timeout for large files
//! ```
//!
//! # Examples
//!
//! ```rust
//! use edgefirst_client::{RetryScope, classify_url};
//!
//! // Studio API calls
//! assert_eq!(
//! classify_url("https://edgefirst.studio/api"),
//! RetryScope::StudioApi
//! );
//! assert_eq!(
//! classify_url("https://test.edgefirst.studio/api/datasets.list"),
//! RetryScope::StudioApi
//! );
//!
//! // File I/O operations
//! assert_eq!(
//! classify_url("https://s3.amazonaws.com/bucket/file.bin"),
//! RetryScope::FileIO
//! );
//! assert_eq!(
//! classify_url("https://d123abc.cloudfront.net/dataset.zip"),
//! RetryScope::FileIO
//! );
//! ```
use Url;
/// Retry scope classification for URL-based retry policies.
///
/// Determines whether a request is a Studio API call or a File I/O operation,
/// enabling different error handling strategies for each category.
/// Classifies a URL to determine which retry policy to apply.
///
/// This function performs URL-based classification to differentiate between
/// EdgeFirst Studio API calls and File I/O operations (S3, CloudFront, etc.).
///
/// # Classification Algorithm
///
/// 1. Parse URL using proper URL parser (handles ports, query params,
/// fragments)
/// 2. Check protocol: Only HTTP/HTTPS are classified as StudioApi (all others →
/// FileIO)
/// 3. Check host: Must be `edgefirst.studio` or `*.edgefirst.studio`
/// 4. Check path: Must start with `/api` (exact match or `/api/...`)
/// 5. If all conditions met → `StudioApi`, otherwise → `FileIO`
///
/// # Edge Cases Handled
///
/// - **Port numbers**: `https://test.edgefirst.studio:8080/api` → StudioApi
/// - **Trailing slashes**: `https://edgefirst.studio/api/` → StudioApi
/// - **Query parameters**: `https://edgefirst.studio/api?foo=bar` → StudioApi
/// - **Subdomains**: `https://ocean.edgefirst.studio/api` → StudioApi
/// - **Similar domains**: `https://edgefirst.studio.com/api` → FileIO (not
/// exact match)
/// - **Path injection**: `https://evil.com/edgefirst.studio/api` → FileIO (host
/// mismatch)
/// - **Non-API paths**: `https://edgefirst.studio/download` → FileIO
///
/// # Security
///
/// The function uses proper URL parsing to prevent domain spoofing attacks.
/// Only the URL host is checked, not the path, preventing injection via
/// `https://attacker.com/edgefirst.studio/api`.
///
/// # Examples
///
/// ```rust
/// use edgefirst_client::{RetryScope, classify_url};
///
/// // Studio API URLs
/// assert_eq!(
/// classify_url("https://edgefirst.studio/api"),
/// RetryScope::StudioApi
/// );
/// assert_eq!(
/// classify_url("https://test.edgefirst.studio/api/datasets"),
/// RetryScope::StudioApi
/// );
/// assert_eq!(
/// classify_url("https://test.edgefirst.studio:443/api?token=abc"),
/// RetryScope::StudioApi
/// );
///
/// // File I/O URLs (S3, CloudFront, etc.)
/// assert_eq!(
/// classify_url("https://s3.amazonaws.com/bucket/file.bin"),
/// RetryScope::FileIO
/// );
/// assert_eq!(
/// classify_url("https://d123abc.cloudfront.net/dataset.zip"),
/// RetryScope::FileIO
/// );
/// assert_eq!(
/// classify_url("https://edgefirst.studio/download_model"),
/// RetryScope::FileIO // Non-API path
/// );
/// ```
/// Creates a retry policy with URL-based classification.
///
/// This function builds a reqwest retry policy that inspects each request URL
/// and applies different error classification rules based on whether it's a
/// Studio API call or a File I/O operation.
///
/// # Retry Configuration
///
/// - **Max retries**: Configurable via `EDGEFIRST_MAX_RETRIES` (default: 5)
/// - **Timeout**: Configurable via `EDGEFIRST_TIMEOUT` (default: 30 seconds)
///
/// # Error Classification by Scope
///
/// ## StudioApi (*.edgefirst.studio/api)
///
/// Optimized for fast-fail on authentication errors:
///
/// | HTTP Status | Action | Rationale |
/// |-------------|--------|-----------|
/// | 401, 403 | Never retry | Authentication failure - user action required |
/// | 408, 429 | Retry | Timeout, rate limiting - transient |
/// | 5xx | Retry | Server error - may recover |
/// | Connection errors | Retry | Network issues - transient |
///
/// ## FileIO (S3, CloudFront, etc.)
///
/// Optimized for robustness under high concurrency:
///
/// | HTTP Status | Action | Rationale |
/// |-------------|--------|-----------|
/// | 408, 429 | Retry | Timeout, rate limiting - common with S3 |
/// | 409, 423 | Retry | Conflict, locked - S3 eventual consistency |
/// | 5xx | Retry | Server error - S3 transient issues |
/// | Connection errors | Retry | Network issues - common in parallel uploads |
///
/// # Usage Recommendations
///
/// **For dataset downloads/uploads** (many concurrent S3 operations):
/// ```bash
/// export EDGEFIRST_MAX_RETRIES=10 # More retries for robustness
/// export EDGEFIRST_TIMEOUT=60 # Longer timeout for large files
/// ```
///
/// **For testing** (fast failure detection):
/// ```bash
/// export EDGEFIRST_MAX_RETRIES=1 # Minimal retries
/// export EDGEFIRST_TIMEOUT=10 # Quick timeout
/// ```
///
/// # Implementation Notes
///
/// Due to reqwest retry API limitations, both StudioApi and FileIO use the
/// same `max_retries_per_request` value. The differentiation is in error
/// classification only (which errors trigger retries), not retry count.
///
/// For operations requiring different retry counts, use separate Client
/// instances with different `EDGEFIRST_MAX_RETRIES` configuration.