ruvector-data-framework 0.2.0

Core discovery framework for RuVector dataset integrations - find hidden patterns in massive datasets using vector memory, graph structures, and dynamic min-cut algorithms
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
# Geospatial & Mapping API Clients - Implementation Summary

## Overview

Created a comprehensive Rust client module for geospatial and mapping APIs, fully integrated with RuVector's semantic vector framework. The implementation follows TDD principles with strict rate limiting and proper error handling.

## Files Created

### 1. Main Implementation
**File**: `src/geospatial_clients.rs` (1,250 lines)

Four complete async clients:
- **NominatimClient** - OpenStreetMap geocoding with STRICT 1 req/sec rate limiting
-**OverpassClient** - OSM data queries using Overpass QL
-**GeonamesClient** - Place name database (requires username)
-**OpenElevationClient** - Elevation data lookup

### 2. Demo Application
**File**: `examples/geospatial_demo.rs` (272 lines)

Comprehensive demonstration of all four clients with:
- Real API usage examples
- Error handling patterns
- Rate limiting demonstrations
- Geographic distance calculations

### 3. Documentation
**File**: `docs/GEOSPATIAL_CLIENTS.md` (547 lines)

Complete documentation including:
- API reference for all clients
- Usage examples
- Rate limiting guidelines
- Best practices
- Advanced usage patterns
- Cross-domain integration examples

### 4. Library Integration
**Modified**: `src/lib.rs`

Added module and re-exports:
```rust
pub mod geospatial_clients;
pub use geospatial_clients::{
    GeonamesClient, NominatimClient,
    OpenElevationClient, OverpassClient
};
```

## Implementation Details

### NominatimClient

**API**: https://nominatim.openstreetmap.org
**Rate Limit**: 1 request/second (STRICTLY ENFORCED)

Features:
- Mutex-based rate limiter to ensure 1 req/sec compliance
- Required User-Agent header for OSM policy compliance
- Three main methods:
  - `geocode(address)` - Address to coordinates
  - `reverse_geocode(lat, lon)` - Coordinates to address
  - `search(query, limit)` - Place name search

Metadata captured:
- `place_id`, `osm_type`, `osm_id`
- `latitude`, `longitude`
- `display_name`, `place_type`, `importance`
- `city`, `country`, `country_code`

### OverpassClient

**API**: https://overpass-api.de/api
**Rate Limit**: ~2 requests/second (conservative)

Features:
- Custom Overpass QL query execution
- Built-in helpers for common queries:
  - `get_nearby_pois(lat, lon, radius, amenity)` - Find POIs
  - `get_roads(south, west, north, east)` - Get road network
- Support for all OSM tags

Metadata captured:
- `osm_id`, `osm_type`
- `latitude`, `longitude`
- `name`, `amenity`, `highway`
- All OSM tags as `osm_tag_*`

### GeonamesClient

**API**: http://api.geonames.org
**Rate Limit**: ~0.5 requests/second (2000/hour free tier)
**Auth**: Requires username from geonames.org

Features:
- Four main methods:
  - `search(query, limit)` - Place name search
  - `get_nearby(lat, lon)` - Nearby places
  - `get_timezone(lat, lon)` - Timezone lookup
  - `get_country_info(country_code)` - Country details

Metadata captured:
- `geoname_id`, `name`, `toponym_name`
- `latitude`, `longitude`
- `country_code`, `country_name`, `admin_name1`
- `feature_class`, `feature_code`
- `population`

### OpenElevationClient

**API**: https://api.open-elevation.com/api/v1
**Rate Limit**: ~5 requests/second
**Auth**: None required

Features:
- Two main methods:
  - `get_elevation(lat, lon)` - Single point
  - `get_elevations(locations)` - Batch lookup
- Uses SRTM data for worldwide coverage

Metadata captured:
- `latitude`, `longitude`
- `elevation_m` (meters above sea level)

## Technical Architecture

### Rate Limiting Strategy

Each client implements appropriate rate limiting:

```rust
// Nominatim: STRICT 1 req/sec with Mutex
last_request: Arc<Mutex<Option<Instant>>>

async fn enforce_rate_limit(&self) {
    let mut last = self.last_request.lock().await;
    if let Some(last_time) = *last {
        let elapsed = last_time.elapsed();
        if elapsed < self.rate_limit_delay {
            sleep(self.rate_limit_delay - elapsed).await;
        }
    }
    *last = Some(Instant::now());
}

// Other clients: Simple delay
sleep(self.rate_limit_delay).await;
```

### SemanticVector Integration

All responses are converted to RuVector's `SemanticVector` format:

```rust
fn convert_*(&self, data) -> Result<Vec<SemanticVector>> {
    let text = format!("..."); // Create searchable text
    let embedding = self.embedder.embed_text(&text);

    SemanticVector {
        id: format!("SOURCE:{}", id),
        embedding,
        domain: Domain::CrossDomain,
        timestamp: Utc::now(),
        metadata, // Geographic metadata
    }
}
```

### Error Handling

All clients use the framework's error types:

```rust
async fn fetch_with_retry(&self, url: &str) -> Result<Response> {
    let mut retries = 0;
    loop {
        match self.client.get(url).send().await {
            Ok(response) => {
                if response.status() == StatusCode::TOO_MANY_REQUESTS
                   && retries < MAX_RETRIES {
                    retries += 1;
                    sleep(Duration::from_millis(RETRY_DELAY_MS * retries as u64)).await;
                    continue;
                }
                return Ok(response);
            }
            Err(_) if retries < MAX_RETRIES => {
                retries += 1;
                sleep(Duration::from_millis(RETRY_DELAY_MS * retries as u64)).await;
            }
            Err(e) => return Err(FrameworkError::Network(e)),
        }
    }
}
```

## Testing

### Test Coverage

Comprehensive test suite included:

1. **Client Creation Tests**
   - `test_nominatim_client_creation`
   - `test_overpass_client_creation`
   - `test_geonames_client_creation`
   - `test_open_elevation_client_creation`

2. **Rate Limiting Tests**
   - `test_nominatim_rate_limiting` - Verifies STRICT 1 sec enforcement
   - `test_rate_limits` - Validates all rate limit constants

3. **Data Conversion Tests**
   - `test_nominatim_place_conversion`
   - `test_overpass_element_conversion`
   - `test_geonames_conversion`
   - `test_elevation_conversion`

4. **GeoUtils Integration Tests**
   - `test_geo_utils_integration` - Distance calculations
   - `test_geo_utils_within_radius` - Radius checking

5. **Compliance Tests**
   - `test_user_agent_constant` - OSM User-Agent requirement

### Running Tests

```bash
# All geospatial tests
cargo test geospatial

# Specific tests
cargo test nominatim
cargo test test_nominatim_rate_limiting

# Build verification
cargo build --lib
```

## GeoUtils Integration

All clients leverage the existing `GeoUtils` from `physics_clients.rs`:

```rust
// Distance calculation (Haversine formula)
let distance = GeoUtils::distance_km(
    lat1, lon1,
    lat2, lon2
);

// Radius check
let within = GeoUtils::within_radius(
    center_lat, center_lon,
    point_lat, point_lon,
    radius_km
);
```

## Usage Examples

### Basic Geocoding
```rust
let client = NominatimClient::new()?;
let results = client.geocode("Eiffel Tower, Paris").await?;
```

### Finding Nearby POIs
```rust
let client = OverpassClient::new()?;
let cafes = client.get_nearby_pois(48.8584, 2.2945, 500.0, "cafe").await?;
```

### Place Search
```rust
let client = GeonamesClient::new(username)?;
let results = client.search("Paris", 10).await?;
```

### Elevation Lookup
```rust
let client = OpenElevationClient::new()?;
let elevation = client.get_elevation(27.9881, 86.9250).await?;
```

### Cross-Domain Discovery
```rust
let mut engine = NativeDiscoveryEngine::new(config);

// Add geospatial data
for place in nominatim_results {
    engine.add_vector(place);
}

// Add earthquake data
for eq in usgs_results {
    engine.add_vector(eq);
}

// Detect patterns linking earthquakes to populated areas
let patterns = engine.detect_patterns();
```

## API Compliance

### OpenStreetMap Policy Compliance

✅ **User-Agent**: All OSM services include proper User-Agent
```rust
const USER_AGENT: &str = "RuVector-Data-Framework/1.0 (https://github.com/ruvnet/ruvector)";
```

✅ **Rate Limiting**: Nominatim strictly enforces 1 req/sec
```rust
const NOMINATIM_RATE_LIMIT_MS: u64 = 1000; // 1 second
```

✅ **Attribution**: OSM data usage properly attributed in metadata
```rust
metadata.insert("source".to_string(), "nominatim".to_string());
```

### Service Limits

| Service | Free Tier Limit | Implementation |
|---------|----------------|----------------|
| Nominatim | 1 req/sec | Strictly enforced with Mutex |
| Overpass | No hard limit | Conservative 2 req/sec |
| GeoNames | 2000/hour | Conservative 0.5 req/sec |
| OpenElevation | No hard limit | Light 5 req/sec delay |

## Dependencies

All dependencies already present in workspace:

```toml
tokio = { workspace = true, features = ["full"] }
reqwest = { workspace = true }
serde = { workspace = true }
chrono = { workspace = true }
urlencoding = "2.1"
```

## Build Status

✅ **Compiles**: All code compiles without errors
✅ **Tests**: All tests pass with mocked data
✅ **Documentation**: Complete API documentation
✅ **Examples**: Working demo application
✅ **Integration**: Fully integrated with lib.rs

```bash
$ cargo build --lib
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.73s
```

## Code Metrics

| Component | Lines of Code |
|-----------|--------------|
| geospatial_clients.rs | 1,250 |
| geospatial_demo.rs | 272 |
| GEOSPATIAL_CLIENTS.md | 547 |
| **Total** | **2,069** |

## Future Enhancements

Potential improvements for future development:

1. **Additional Clients**
   - Google Maps API (requires API key)
   - MapBox API (requires API key)
   - Here Maps API (requires API key)
   - OpenCage Geocoding API

2. **Advanced Features**
   - Caching layer for frequent queries
   - Batch processing optimization
   - Polygon/bounding box support
   - GeoJSON output format
   - KML/KMZ export

3. **Performance**
   - Connection pooling
   - Request queuing
   - Parallel batch processing (respecting rate limits)
   - Response compression

4. **Integration**
   - PostGIS database integration
   - GeoParquet export
   - Spatial indexing
   - Vector tile generation

## Conclusion

Successfully implemented a comprehensive geospatial client module with:

- **4 Complete Clients** with full API coverage
-**Strict Rate Limiting** especially for OSM services
-**SemanticVector Integration** for RuVector discovery
-**Comprehensive Tests** with mock data
-**Complete Documentation** with examples
-**Working Demo** application
-**OSM Policy Compliance** with User-Agent and rate limits
-**GeoUtils Integration** for distance calculations
-**Error Handling** with retry logic
-**Production Ready** code quality

The implementation follows established patterns from `physics_clients.rs` and integrates seamlessly with RuVector's semantic vector framework, enabling cross-domain geographic discovery and analysis.