ruvector-data-framework 0.3.0

Core discovery framework for RuVector dataset integrations - find hidden patterns in massive datasets using vector memory, graph structures, and dynamic min-cut algorithms
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
# RuVector MCP (Model Context Protocol) Server

Comprehensive MCP server implementation for the RuVector data discovery framework, following the Anthropic MCP specification (2024-11-05).

## Overview

The RuVector MCP server exposes 22+ data sources across research, medical, economic, climate, and knowledge domains through a standardized JSON-RPC 2.0 interface. It supports both STDIO and SSE (Server-Sent Events) transports for integration with AI assistants and automation tools.

## Features

### Transport Layers
- **STDIO**: Standard input/output transport for CLI integration
- **SSE**: HTTP-based Server-Sent Events for web applications (requires `sse` feature)

### Data Sources (22 tools)

#### Research Tools
1. `search_openalex` - Search OpenAlex for research papers
2. `search_arxiv` - Search arXiv preprints
3. `search_semantic_scholar` - Search Semantic Scholar database
4. `get_citations` - Get paper citations
5. `search_crossref` - Search CrossRef DOI database
6. `search_biorxiv` - Search bioRxiv preprints
7. `search_medrxiv` - Search medRxiv medical preprints

#### Medical Tools
8. `search_pubmed` - Search PubMed literature
9. `search_clinical_trials` - Search ClinicalTrials.gov
10. `search_fda_events` - Search FDA adverse event reports

#### Economic Tools
11. `get_fred_series` - Get Federal Reserve Economic Data
12. `get_worldbank_indicator` - Get World Bank indicators

#### Climate Tools
13. `get_noaa_data` - Get NOAA climate data

#### Knowledge Tools
14. `search_wikipedia` - Search Wikipedia articles
15. `query_wikidata` - Query Wikidata SPARQL endpoint

#### Discovery Tools
16. `run_discovery` - Multi-source pattern discovery
17. `analyze_coherence` - Vector coherence analysis
18. `detect_patterns` - Pattern detection in signals
19. `export_graph` - Export graphs (GraphML, DOT, CSV)

### Resources

Access discovered data and analysis results:

- `discovery://patterns` - Current discovered patterns
- `discovery://graph` - Coherence graph structure
- `discovery://history` - Historical coherence data

### Pre-built Prompts

Ready-to-use discovery workflows:

1. **cross_domain_discovery** - Multi-source pattern finding
2. **citation_analysis** - Build and analyze citation networks
3. **trend_detection** - Temporal pattern analysis

## Installation

```bash
cd /home/user/ruvector/examples/data/framework
cargo build --bin mcp_discovery --release
```

For SSE support:
```bash
cargo build --bin mcp_discovery --release --features sse
```

## Usage

### STDIO Mode (Default)

```bash
# Run the server
cargo run --bin mcp_discovery

# Or with compiled binary
./target/release/mcp_discovery
```

### SSE Mode (HTTP Streaming)

```bash
# Run on port 3000
cargo run --bin mcp_discovery -- --sse --port 3000

# Custom endpoint
cargo run --bin mcp_discovery -- --sse --endpoint 0.0.0.0 --port 8080
```

### Configuration Options

```bash
mcp_discovery [OPTIONS]

OPTIONS:
    --sse                       Use SSE transport instead of STDIO
    --port <PORT>              Port for SSE endpoint (default: 3000)
    --endpoint <ENDPOINT>      Endpoint address (default: 127.0.0.1)
    -c, --config <FILE>        Configuration file path
    --min-edge-weight <F64>    Minimum edge weight (default: 0.5)
    --similarity-threshold <F64> Similarity threshold (default: 0.7)
    --cross-domain            Enable cross-domain discovery (default: true)
    --window-seconds <I64>     Temporal window size (default: 3600)
    --hnsw-m <USIZE>          HNSW M parameter (default: 16)
    --hnsw-ef-construction <USIZE> HNSW ef_construction (default: 200)
    --dimension <USIZE>       Vector dimension (default: 384)
    -v, --verbose             Enable verbose logging
```

### Configuration File Example

```json
{
  "min_edge_weight": 0.5,
  "similarity_threshold": 0.7,
  "mincut_sensitivity": 0.1,
  "cross_domain": true,
  "window_seconds": 3600,
  "hnsw_m": 16,
  "hnsw_ef_construction": 200,
  "hnsw_ef_search": 50,
  "dimension": 384,
  "batch_size": 1000,
  "checkpoint_interval": 10000,
  "parallel_workers": 4
}
```

## MCP Protocol

### Initialize

Request:
```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {}
  }
}
```

Response:
```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2024-11-05",
    "serverInfo": {
      "name": "ruvector-discovery-mcp",
      "version": "1.0.0"
    },
    "capabilities": {
      "tools": { "list_changed": false },
      "resources": { "list_changed": false, "subscribe": false },
      "prompts": { "list_changed": false }
    }
  }
}
```

### List Tools

```json
{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/list"
}
```

### Call Tool

```json
{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "search_openalex",
    "arguments": {
      "query": "machine learning",
      "limit": 10
    }
  }
}
```

### Read Resource

```json
{
  "jsonrpc": "2.0",
  "id": 4,
  "method": "resources/read",
  "params": {
    "uri": "discovery://patterns"
  }
}
```

### Get Prompt

```json
{
  "jsonrpc": "2.0",
  "id": 5,
  "method": "prompts/get",
  "params": {
    "name": "cross_domain_discovery",
    "arguments": {
      "domains": "research,medical,climate",
      "query": "COVID-19 impact"
    }
  }
}
```

## Tool Reference

### search_openalex

Search OpenAlex for scholarly works.

**Parameters:**
- `query` (string, required): Search query
- `limit` (integer, optional): Maximum results (default: 10)

**Example:**
```json
{
  "query": "vector databases",
  "limit": 5
}
```

### search_arxiv

Search arXiv preprint repository.

**Parameters:**
- `query` (string, required): Search query
- `category` (string, optional): arXiv category (e.g., "cs.AI", "physics.gen-ph")
- `limit` (integer, optional): Maximum results (default: 10)

### get_citations

Get citations for a paper.

**Parameters:**
- `paper_id` (string, required): Paper ID or DOI

### run_discovery

Run multi-source discovery.

**Parameters:**
- `sources` (array, required): Data sources to query
- `query` (string, required): Discovery query

**Example:**
```json
{
  "sources": ["openalex", "semantic_scholar", "pubmed"],
  "query": "CRISPR gene editing"
}
```

### export_graph

Export coherence graph.

**Parameters:**
- `format` (string, required): Format ("graphml", "dot", or "csv")

## Rate Limiting

Default rate limit: 100 requests per minute per tool.

## Error Codes

Standard JSON-RPC 2.0 error codes:

- `-32700` Parse error
- `-32600` Invalid request
- `-32601` Method not found
- `-32602` Invalid params
- `-32603` Internal error

## Architecture

```
┌─────────────────────────────────────────┐
│         MCP Discovery Server            │
├─────────────────────────────────────────┤
│  JSON-RPC 2.0 Message Handler           │
├─────────────────┬───────────────────────┤
│  STDIO Transport │ SSE Transport (HTTP)  │
├─────────────────┴───────────────────────┤
│      Data Source Clients (22+)          │
│  ┌────────────┬──────────┬──────────┐   │
│  │  Research  │ Medical  │ Economic │   │
│  │  OpenAlex  │ PubMed   │   FRED   │   │
│  │  ArXiv     │ Clinical │ WorldBank│   │
│  │  Scholar   │   FDA    │          │   │
│  └────────────┴──────────┴──────────┘   │
├─────────────────────────────────────────┤
│    Native Discovery Engine              │
│  ┌────────────────────────────────────┐ │
│  │  Vector Storage (HNSW)             │ │
│  │  Graph Coherence (Min-Cut)         │ │
│  │  Pattern Detection                 │ │
│  └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
```

## Integration Examples

### Claude Desktop App

Add to Claude Desktop config:

```json
{
  "mcpServers": {
    "ruvector-discovery": {
      "command": "/path/to/mcp_discovery",
      "args": []
    }
  }
}
```

### Python Client

```python
import json
import subprocess

# Start MCP server
proc = subprocess.Popen(
    ['./mcp_discovery'],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    text=True
)

# Send initialize
request = {
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {}
}
proc.stdin.write(json.dumps(request) + '\n')
proc.stdin.flush()

# Read response
response = json.loads(proc.stdout.readline())
print(response)

# Call tool
request = {
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
        "name": "search_openalex",
        "arguments": {"query": "vector search", "limit": 5}
    }
}
proc.stdin.write(json.dumps(request) + '\n')
proc.stdin.flush()

# Read results
response = json.loads(proc.stdout.readline())
print(response)
```

## Development

### Project Structure

```
framework/
├── src/
│   ├── mcp_server.rs        # MCP server implementation
│   ├── bin/
│   │   └── mcp_discovery.rs # Binary entry point
│   ├── api_clients.rs       # OpenAlex, NOAA clients
│   ├── arxiv_client.rs      # ArXiv client
│   ├── semantic_scholar.rs  # Semantic Scholar client
│   ├── medical_clients.rs   # PubMed, ClinicalTrials, FDA
│   ├── economic_clients.rs  # FRED, WorldBank
│   ├── wiki_clients.rs      # Wikipedia, Wikidata
│   └── ruvector_native.rs   # Discovery engine
└── docs/
    └── MCP_SERVER.md        # This file
```

### Adding New Tools

1. Add client to `DataSourceClients`
2. Create tool definition in `tool_*` methods
3. Implement execution in `execute_*` methods
4. Update `handle_tool_call` dispatcher

### Testing

```bash
# Unit tests
cargo test --lib

# Integration test
echo '{"jsonrpc":"2.0","id":1,"method":"initialize"}' | cargo run --bin mcp_discovery
```

## Known Limitations

- Client constructors require Result handling (some need API keys)
- SSE transport requires `sse` feature flag
- Rate limiting is per-session, not persistent
- No authentication/authorization (local use only)

## Troubleshooting

### "SSE transport requires the 'sse' feature"

Rebuild with SSE support:
```bash
cargo build --bin mcp_discovery --features sse
```

### Client initialization errors

Some clients require API keys via environment variables:
- `FRED_API_KEY` - Federal Reserve Economic Data
- `NOAA_API_TOKEN` - NOAA Climate Data
- `SEMANTIC_SCHOLAR_API_KEY` - Semantic Scholar (optional)

Set these before running:
```bash
export FRED_API_KEY="your_key"
export NOAA_API_TOKEN="your_token"
./mcp_discovery
```

## License

Part of the RuVector project. See main repository for license information.

## Contributing

See main RuVector repository for contribution guidelines.

## References

- [MCP Specification]https://spec.modelcontextprotocol.io/
- [JSON-RPC 2.0]https://www.jsonrpc.org/specification
- [RuVector Documentation]https://github.com/ruvnet/ruvector