data-forge 0.1.0

高性能数据锻造工坊 - 为Rust开发者打造的随机数据生成与数据库填充解决方案
Documentation
# DataForge


[![crates.io](https://img.shields.io/badge/version-0.1.0-yellow)](https://crates.io/crates/dataforge)

[//]: # ([![Documentation](https://img.shields.io/docsrs/dataforge)](https://docs.rs/dataforge))


[![](https://img.shields.io/circleci/project/github/badges/shields/master)](build_status)

[![license](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](https://opensource.org/licenses/MIT)

[![Website](https://img.shields.io/badge/官网-whosly-lightgrey?style=social&logo=world&logoColor=blue)](https://baidu.com)

**High-performance Data Forge Workshop** - Random data generation and database population solution for Rust developers

## 📋 Prerequisites

```
Nightly Rust compiler

$ rustc --version
rustc 1.85.1 (4eb161250 2025-03-15)
```

## ✨ Features


- **High-performance Data Generation**
  - Rust-based high-performance random number generation engine
  - Multi-threaded parallel generation (powered by rayon)
  - Memory pool optimization technology

- **Database Support**
  - Support for MySQL, PostgreSQL, SQLite databases
  - Automatic Schema inference and matching
  - Bulk insert optimization

- **Rich Data Generators**
  - Name generators (Chinese, English, Japanese)
  - Address generators (supports Chinese regional data)
  - Network data generators (email, URL, IP, etc.)
  - Date and time generators
  - Number generators (phone numbers, ID cards, etc.)

- **Flexible Generation Methods**
  - Support for regular expression pattern generation
  - Convenient macro interface
  - Support for custom generator extensions
  - Multi-language data support

## 🚀 Quick Start


### Installation

```toml
[dependencies]
dataforge = "0.1.0"

# Optional features

dataforge = { version = "0.1.0", features = ["database"] }
```

### Basic Usage


```rust
use dataforge::generators::*;
use dataforge::forge;
use serde_json::json;

// Generate test user data
let user = forge!({
    "id" => uuid_v4(),
    "name" => name::zh_cn_fullname(),
    "age" => number::adult_age(),
    "email" => internet::email(),
    "phone" => number::phone_number_cn(),
    "address" => serde_json::json!({
        "province": address::zh_province(),
        "city": "北京市",
        "street": address::zh_address()
    }),
    "created_at" => datetime::iso8601()
});

println!("{}", serde_json::to_string_pretty(&user).unwrap());
```

### Using Macros to Generate Data


```rust
use dataforge::{pattern, rand_num, datetime};

// Generate using patterns
let phone = pattern!("1[3-9]\\d{9}");

// Generate random numbers
let age = rand_num!(18, 65);

// Generate date and time
let timestamp = datetime!("timestamp");
let iso_date = datetime!("iso");
```

### Core Engine Usage


```rust
use dataforge::core::{CoreEngine, GenConfig, GenerationStrategy};

let config = GenConfig {
    batch_size: 1000,
    strategy: GenerationStrategy::Random,
    null_probability: 0.05,
    ..Default::default()
};

let engine = CoreEngine::new(config);
let data = engine.generate_batch(100)?;

// Get performance metrics
let metrics = engine.metrics();
println!("Generated: {}, Errors: {}", 
    metrics.generated_count(), 
    metrics.error_count()
);
```

### Database Population


```rust
use dataforge::db::DatabaseForge;

// Create database filler
let forge = DatabaseForge::new("mysql://user:pass@localhost/db");

// Configure table and fill data
let result = forge
    .table("users", 1000, |t| {
        t.field("id", || uuid_v4())
         .field("name", || name::zh_cn_fullname())
         .field("email", || internet::email())
    })
    .fill_sync()?;

println!("Filled {} records", result);
```

### Custom Generators


```rust
use dataforge::{DataForge, Language};
use serde_json::Value;

// Create data generator
let mut forge = DataForge::new(Language::ZhCN);

// Register custom generator
forge.register("product_id", || {
    serde_json::json!(format!("PROD-{:06}", rand::random::<u32>() % 1000000))
});

// Use custom generator
let product_id = forge.generate("product_id");
```

## Generator Types


### Name Generators

- `name::zh_cn_fullname()` - Chinese full name
- `name::en_us_fullname()` - English full name
- `name::ja_jp_fullname()` - Japanese full name

### Address Generators

- `address::zh_province()` - Chinese province
- `address::zh_address()` - Chinese address
- `address::us_state()` - US state name
- `address::us_city()` - US city

### Network Data Generators

- `internet::email()` - Email address
- `internet::url()` - Website URL
- `internet::ip_address()` - IP address
- `internet::mac_address()` - MAC address
- `internet::user_agent()` - User agent string

### Number Generators

- `number::phone_number_cn()` - Chinese mobile number
- `number::id_card_cn()` - Chinese ID card number
- `number::credit_card_number()` - Bank card number
- `number::adult_age()` - Adult age
- `number::currency(min, max)` - Currency amount

### Date and Time Generators

- `datetime::iso8601()` - ISO8601 format date
- `datetime::timestamp()` - Timestamp
- `datetime::birthday()` - Birthday date
- `datetime::work_time()` - Work time

## Advanced Features


### Parallel Generation

```rust
use dataforge::core::{CoreEngine, GenConfig, GenerationStrategy};

let config = GenConfig {
    batch_size: 1000,
    strategy: GenerationStrategy::Random,
    parallelism: 4,
    ..Default::default()
};

let engine = CoreEngine::new(config);
let results = engine.generate_batch(10000)?;
```

### Memory Optimization

```rust
use dataforge::memory::{MemoryPool, MemoryPoolConfig};

let config = MemoryPoolConfig::default();
let mut pool = MemoryPool::new(config);
let buffer = pool.allocate(1024)?;
```

### Rule Engine

```rust
use dataforge::rules::{RuleEngine, Rule, RuleType};

let mut engine = RuleEngine::new();
engine.add_rule(Rule {
    name: "adult_user".to_string(),
    rule_type: RuleType::Condition,
    condition: "age >= 18".to_string(),
    action: "generate_adult_data".to_string(),
});
```

## Configuration File Support


Supports TOML and YAML configuration files:

```toml
# dataforge.toml

[generation]
batch_size = 1000
strategy = "Random"
null_probability = 0.05

[database]
url = "mysql://user:pass@localhost/db"
batch_size = 5000
```

## Performance Features


- **Multi-threaded Parallelism**: Efficient parallel processing based on rayon
- **Memory Pool**: Reduce memory allocation overhead
- **Batch Operations**: Optimize database insert performance
- **Lazy Loading**: Load data files on demand
- **Zero Copy**: Reduce unnecessary memory copying

## Project Structure


```
dataforge/
├── src/
│   ├── core.rs              # Core engine
│   ├── generators/          # Data generators
│   ├── regions/             # Regional data
│   ├── filling/             # Database filling
│   ├── multithreading/      # Multi-threaded processing
│   ├── memory/              # Memory management
│   ├── customization/       # User customization
│   ├── generation/          # Data generation
│   ├── db/                  # Database related
│   │   └── schema.rs        # Schema parsing
│   ├── config.rs            # Configuration management
│   ├── rules/               # Rule engine
│   └── macros.rs            # Macro definitions
├── data/                    # External data files
├── tests/                   # Test files
└── doc/                     # Documentation
```

## 📚 Ecosystem

dataforge-faker:  Ruby Faker-compatible syntax
dataforge-sqlx: Async database support via sqlx
dataforge-cli: Command-line data generation tool

## License


This project is licensed under either MIT or Apache-2.0 dual license.

## Contributing


Welcome to submit Issues and Pull Requests!