# Text-to-Cypher Best Practices
This guide provides best practices for using and optimizing the text-to-cypher system based on current research and industry standards.
## Table of Contents
1. [Prompt Engineering](#prompt-engineering)
2. [Schema Design](#schema-design)
3. [Query Optimization](#query-optimization)
4. [Error Handling](#error-handling)
5. [Production Deployment](#production-deployment)
6. [Monitoring and Maintenance](#monitoring-and-maintenance)
## Prompt Engineering
### Writing Effective Natural Language Queries
#### DO ✅
**Be Specific and Clear**
```
Good: "Find all customers who purchased more than $1000 worth of products in 2024"
Bad: "Show me customers"
```
**Use Domain Terminology**
```
Good: "Which actors appeared in movies directed by Christopher Nolan?"
Bad: "People in things made by that guy"
```
**Specify Desired Output**
```
Good: "Return the names and ages of all employees in the Engineering department"
Bad: "Get engineering people"
```
**Include Constraints**
```
Good: "Find products with price between $50 and $100 sorted by rating"
Bad: "Find products"
```
#### DON'T ❌
**Avoid Ambiguous References**
```
Bad: "Show me the ones from yesterday"
Good: "Show me orders created on 2024-01-15"
```
**Don't Mix Multiple Questions**
```
Bad: "Show me customers and their orders and products they bought and the reviews"
Good: "Show me customers with their orders and the products in each order"
```
**Avoid Vague Quantifiers**
```
Bad: "Find people with lots of friends"
Good: "Find people with more than 10 connections"
```
### System Prompt Optimization
The system prompt in `templates/system_prompt.txt` follows these principles:
1. **Clear Task Definition**: Explicitly states the task is to generate Cypher
2. **Schema Context**: Includes the ontology/schema in the prompt
3. **Constraints**: Lists what the model MUST and MUST NOT do
4. **Examples**: Provides representative examples
5. **Validation Checklist**: Guides the model through validation steps
## Schema Design
### Best Practices for Graph Schemas
#### 1. Consistent Naming Conventions
```cypher
// Good: Consistent PascalCase for labels
(:Person)-[:KNOWS]->(:Person)
(:Company)-[:EMPLOYS]->(:Person)
// Bad: Inconsistent naming
(:person)-[:knows]->(:PERSON)
(:company)-[:Employs]->(:Person)
```
#### 2. Descriptive Relationship Types
```cypher
// Good: Specific relationship types
(:Person)-[:WORKS_FOR]->(:Company)
(:Person)-[:MANAGES]->(:Department)
// Bad: Generic relationships
(:Person)-[:RELATED_TO]->(:Company)
(:Person)-[:HAS]->(:Department)
```
#### 3. Meaningful Property Names
```cypher
// Good: Clear property names
CREATE (p:Person {
firstName: "John",
lastName: "Doe",
dateOfBirth: date("1990-01-01"),
email: "john.doe@example.com"
})
// Bad: Cryptic property names
CREATE (p:Person {
fn: "John",
ln: "Doe",
dob: "1990-01-01",
em: "john.doe@example.com"
})
```
#### 4. Property Value Consistency
```cypher
// Good: Consistent value formats
CREATE (p:Person {name: "John Doe"})
CREATE (p:Person {name: "Jane Smith"})
// Bad: Inconsistent formats
CREATE (p:Person {name: "John Doe"})
CREATE (p:Person {name: "JANE SMITH"})
CREATE (p:Person {name: "bob-jones"})
```
### Schema Enhancement Tips
1. **Add Example Values**: The system now collects examples automatically
2. **Document Value Ranges**: Use constraints or properties to define valid ranges
3. **Use Indexes**: Create indexes on frequently queried properties
4. **Normalize Names**: Store canonical forms and use `toLower()` for matching
```cypher
// Create index for better performance
CREATE INDEX person_name_index FOR (p:Person) ON (p.name)
// Create constraint for data quality
CREATE CONSTRAINT person_email_unique FOR (p:Person) REQUIRE p.email IS UNIQUE
```
## Query Optimization
### Writing Efficient Natural Language Queries
#### 1. Limit Result Sets
```
Good: "Find the top 10 highest-rated movies"
Average: "Find highly-rated movies" (might return too many)
```
#### 2. Use Specific Filters
```
Good: "Find customers in California who made purchases in 2024"
Average: "Find customers who made purchases"
```
#### 3. Avoid Overly Complex Single Queries
```
Bad: "Find all customers, their orders, products, suppliers, reviews, and related recommendations with detailed analytics"
Good: Break into multiple queries:
1. "Find customers with orders in 2024"
2. "For customer John Doe, show order details and products"
3. "Show reviews for product X"
```
### Understanding Generated Queries
The system generates Cypher based on your schema. Here's what to expect:
**Simple Entity Lookup**
```
Input: "Find person named John"
Output: MATCH (p:Person) WHERE toLower(p.name) = 'john' RETURN p
```
**Relationship Traversal**
```
Input: "Who are John's friends?"
Output: MATCH (p:Person {name: 'John'})-[:KNOWS]->(friend:Person) RETURN friend
```
**Aggregation**
```
Input: "How many orders does each customer have?"
Output: MATCH (c:Customer)-[:PLACED]->(o:Order) RETURN c.name, count(o) AS orderCount
```
## Error Handling
### Common Errors and Solutions
#### 1. Property Not Found
**Error**: `Property 'name' not found on node type 'Person'`
**Solutions**:
- Check schema for correct property name
- Verify data exists with examples
- Use `WHERE EXISTS(n.property)` to check
#### 2. Validation Errors
The system now validates queries before execution. Common validation errors:
**Unbalanced Parentheses**
```
Bad: MATCH (p:Person WHERE p.name = 'John' RETURN p
Fixed: MATCH (p:Person) WHERE p.name = 'John' RETURN p
```
**Missing RETURN Clause**
```
Bad: MATCH (p:Person) WHERE p.age > 30
Fixed: MATCH (p:Person) WHERE p.age > 30 RETURN p
```
#### 3. Self-Healing
When a query fails, the system automatically attempts to fix it:
1. **Validation Failures**: Regenerates with validation errors as context
2. **Execution Failures**: Regenerates with execution error feedback
3. **Fallback**: Reports error if self-healing fails
### Handling Failed Queries
If self-healing fails:
1. **Review the Schema**: Ensure your graph matches the expected schema
2. **Simplify the Query**: Try a simpler, more specific question
3. **Check Examples**: Verify example values match your data
4. **Clear Cache**: Use `/clear_schema_cache/{graph_name}` if schema changed
## Production Deployment
### Configuration Best Practices
#### 1. Environment Variables
```bash
# Required
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_KEY=your-api-key
# Optional but recommended
FALKORDB_CONNECTION=falkor://127.0.0.1:6379
REST_PORT=8080
MCP_PORT=3001
```
#### 2. Schema Caching
The system caches schemas for performance. Consider:
- **Cache Size**: Default is 100 graphs (configurable)
- **Cache Invalidation**: Use `/clear_schema_cache` when schema changes
- **Cold Start**: First query per graph discovers schema (slower)
#### 3. Rate Limiting
Implement rate limiting at the API level:
```bash
# Using nginx
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req zone=api burst=20;
```
#### 4. Model Selection
Choose models based on your requirements:
| gpt-4o-mini | Fast | Good | Low | Development, simple queries |
| gpt-4o | Medium | Better | Medium | Production, complex queries |
| gpt-4 | Slow | Best | High | Critical queries, complex schemas |
### Security Considerations
#### 1. Query Validation
The validator checks for dangerous operations:
- `DROP` statements
- `DELETE` without constraints
- Unbounded operations
#### 2. Input Sanitization
Always validate user input:
```rust
// The system does this automatically
let validation = CypherValidator::validate(query);
if !validation.is_valid {
return Error("Invalid query");
}
```
#### 3. API Key Management
- Never commit API keys to version control
- Use environment variables or secrets management
- Rotate keys regularly
- Use separate keys for dev/staging/prod
#### 4. Access Control
Implement at the API level:
```rust
// Example middleware
async fn auth_middleware(req: Request) -> Result<Request> {
verify_api_token(req.headers().get("Authorization"))?;
Ok(req)
}
```
## Monitoring and Maintenance
### Key Metrics to Track
#### 1. Query Success Rate
Monitor the percentage of successful query executions:
```
success_rate = successful_queries / total_queries
```
**Target**: >95% success rate
#### 2. Self-Healing Effectiveness
Track how often self-healing succeeds:
```
self_healing_rate = queries_fixed_by_healing / failed_queries
```
**Target**: >60% of failures fixed
#### 3. Query Latency
Monitor end-to-end latency:
- Schema discovery: <500ms (cached) / <5s (uncached)
- Query generation: <2s (simple) / <5s (complex)
- Query execution: <100ms (simple) / <1s (complex)
- Total: <3s (simple) / <10s (complex)
#### 4. Validation Failures
Track validation failure types:
- Syntax errors
- Dangerous operations
- Missing clauses
- Unbalanced syntax
### Logging Best Practices
Enable structured logging:
```rust
tracing::info!(
query = %cypher_query,
graph = %graph_name,
duration_ms = %duration.as_millis(),
"Query executed successfully"
);
```
### Regular Maintenance
#### 1. Schema Updates
When your graph schema changes:
```bash
# Clear cache for specific graph
curl -X POST http://localhost:8080/clear_schema_cache/my_graph
# Or clear entire cache by restarting service
docker restart text-to-cypher
```
#### 2. Model Updates
When switching models:
1. Test with sample queries first
2. Monitor success rates
3. Adjust system prompts if needed
4. Roll back if issues occur
#### 3. Example Value Refresh
Example values are collected during schema discovery. To refresh:
1. Clear schema cache
2. Next query will rediscover schema with new examples
### Performance Tuning
#### 1. Schema Discovery
Adjust sample size based on data volume:
```rust
// Default is 100, increase for better examples
Schema::discover_from_graph(&mut graph, 200).await
```
#### 2. Concurrent Requests
The system uses async processing. Configure based on load:
- CPU-bound: Number of cores
- I/O-bound: Higher (10x cores)
#### 3. Database Connection Pooling
Configure FalkorDB connection pool:
```rust
FalkorClientBuilder::new_async()
.with_max_connections(10)
.build()
```
## Testing Strategies
### 1. Unit Testing
Test individual components:
```rust
#[test]
fn test_query_validation() {
let query = "MATCH (n:Person) RETURN n";
let result = CypherValidator::validate(query);
assert!(result.is_valid);
}
```
### 2. Integration Testing
Test end-to-end flows:
```bash
# Test query generation
curl -X POST http://localhost:8080/text_to_cypher \
-H "Content-Type: application/json" \
-d '{
"graph_name": "test",
"chat_request": {
"messages": [{"role": "user", "content": "Find all persons"}]
}
}'
```
### 3. Schema Testing
Verify schema discovery:
```bash
curl http://localhost:8080/get_schema/test_graph
```
### 4. Load Testing
Use tools like Apache Bench or k6:
```bash
ab -n 100 -c 10 http://localhost:8080/text_to_cypher
```
## Troubleshooting
### Common Issues
#### Issue: Slow Query Generation
**Possible Causes**:
- Large schema
- Complex question
- Slow LLM response
**Solutions**:
- Reduce schema sample size
- Simplify question
- Use faster model
- Implement request timeout
#### Issue: Inaccurate Queries
**Possible Causes**:
- Unclear question
- Missing schema information
- Insufficient examples
**Solutions**:
- Rephrase question more clearly
- Ensure schema is up to date
- Add more example values
- Use better model
#### Issue: High Failure Rate
**Possible Causes**:
- Schema mismatch
- Data quality issues
- Model hallucination
**Solutions**:
- Verify schema matches data
- Improve data consistency
- Enable validation and self-healing
- Use better model
## Resources
### Documentation
- [Main README](../readme.md)
- [Improvements Guide](./IMPROVEMENTS.md)
- [Docker Release Guide](./DOCKER_RELEASE.md)
### External Resources
- [Neo4j Cypher Manual](https://neo4j.com/docs/cypher-manual/)
- [FalkorDB Documentation](https://docs.falkordb.com/)
- [Text2Cypher Research Paper](https://arxiv.org/abs/2412.10064)
### Community
- [GitHub Issues](https://github.com/FalkorDB/text-to-cypher/issues)
- [FalkorDB Discord](https://discord.gg/falkordb)
## Contributing
Contributions are welcome! When contributing:
1. Follow these best practices in your code
2. Add tests for new features
3. Update documentation
4. Consider backward compatibility
## Conclusion
Following these best practices will help you:
- Generate more accurate queries
- Achieve better performance
- Handle errors gracefully
- Deploy reliably to production
- Maintain the system effectively
For questions or issues, please open a GitHub issue or reach out to the community.