vcs2git 0.4.0 - Docs.rs

# Solution Proposals for Known Issues

## Critical Bug Fixes (High Priority)

### 1. Fix Logic Error in Disjoint Check
**Issue**: `check_disjoint` function has inverted logic in `src/main.rs:225`  
**Severity**: High (functional bug)

**Current Code**:
```rust
fn check_disjoint<T>(lset: &HashSet<T>, rset: &HashSet<T>) -> Result<()>
where
    T: Eq + Hash + Debug,
{
    if lset.is_disjoint(rset) {  // ← BUG: This is backwards
        let inter: Vec<_> = lset.intersection(rset).collect();
        bail!("Repositories cannot be selected and skipped at the same time: {inter:?}");
    }
    Ok(())
}
```

**Proposed Fix**:
```rust
fn check_disjoint<T>(lset: &HashSet<T>, rset: &HashSet<T>) -> Result<()>
where
    T: Eq + Hash + Debug,
{
    if !lset.is_disjoint(rset) {  // ← FIXED: Check if NOT disjoint
        let inter: Vec<_> = lset.intersection(rset).collect();
        bail!("Repositories cannot be selected and skipped at the same time: {inter:?}");
    }
    Ok(())
}
```

**Impact**: Fixes critical logic error that could allow invalid repository selections

### 2. Fix Misleading Error Message
**Issue**: Error message incorrectly states unsupported repository types are "supported"  
**Severity**: Low (cosmetic)

**Current Code**:
```rust
RepoType::Unknown(ty) => {
    bail!("Repository type '{ty}' is supported");  // ← BUG: Should be "not supported"
}
```

**Proposed Fix**:
```rust
RepoType::Unknown(ty) => {
    bail!("Repository type '{ty}' is not supported. Only 'git' repositories are currently supported.");
}
```

**Impact**: Clearer error messages reduce user confusion

## Enhanced Error Handling and Recovery

### 3. Implement Commit-Tracking Rollback Mechanism
**Issue**: Failed operations leave repository in partial state  
**Complexity**: Medium  
**Approach**: Track original submodule commits and restore on failure

**Design Overview**:
- Require clean repository state (validated in pre-flight checks)
- Track original commit SHA for each submodule before modifications
- On failure, restore all submodules to their original commits
- All-or-nothing approach: if any operation fails, restore everything

**Proposed Solution**:
```rust
/// Tracks original submodule states for rollback
#[derive(Debug)]
struct SubmoduleStateTracker {
    original_states: HashMap<String, SubmoduleState>,
}

#[derive(Debug, Clone)]
struct SubmoduleState {
    name: String,
    path: PathBuf,
    commit: git2::Oid,
    url: String,
}

impl SubmoduleStateTracker {
    /// Create tracker and capture current state of all submodules
    fn new(repo: &Repository) -> Result<Self> {
        let mut original_states = HashMap::new();
        
        for submodule in repo.submodules()? {
            let name = submodule.name()
                .ok_or_else(|| anyhow!("Submodule without name"))?
                .to_string();
            
            let state = SubmoduleState {
                name: name.clone(),
                path: submodule.path().to_path_buf(),
                commit: submodule.workdir_id()
                    .ok_or_else(|| anyhow!("Submodule {} has no workdir commit", name))?,
                url: submodule.url()
                    .ok_or_else(|| anyhow!("Submodule {} has no URL", name))?
                    .to_string(),
            };
            
            original_states.insert(name, state);
        }
        
        Ok(Self { original_states })
    }
    
    /// Restore all submodules to their original commits
    fn rollback(&self, repo: &Repository) -> Result<()> {
        println!("Rolling back submodule changes...");
        
        for (name, state) in &self.original_states {
            println!("  Restoring {} to commit {}", name, state.commit);
            
            let submodule = repo.find_submodule(&name)?;
            let sub_repo = submodule.open()?;
            
            // Checkout the original commit
            let obj = sub_repo.find_object(state.commit, None)?;
            sub_repo.checkout_tree(&obj, None)?;
            sub_repo.set_head_detached(state.commit)?;
            
            // Update the superproject's index
            repo.add_to_index(&state.path)?;
        }
        
        println!("Rollback complete. All submodules restored to original state.");
        Ok(())
    }
}

/// Process repositories with atomic rollback on failure
fn process_repositories_with_rollback(
    root_repo: &mut Repository,
    new_repos: Vec<(&Path, &Repo)>,
    update_repos: Vec<(&Path, (&str, &Repo))>,
    opts: &Opts,
) -> Result<()> {
    // Capture original state before any modifications
    let tracker = SubmoduleStateTracker::new(root_repo)?;
    
    // Track which operations we've completed
    let mut completed_new = Vec::new();
    let mut completed_updates = Vec::new();
    
    // Process new repositories
    for (path, info) in &new_repos {
        match add_submodule(root_repo, path, info, opts) {
            Ok(_) => {
                println!("✓ Added {}", path.display());
                completed_new.push(path);
            }
            Err(e) => {
                eprintln!("✗ Failed to add {}: {}", path.display(), e);
                
                // Rollback everything
                eprintln!("\nOperation failed. Rolling back all changes...");
                
                // Remove any newly added submodules
                for added_path in completed_new {
                    if let Err(e) = remove_submodule(root_repo, added_path) {
                        eprintln!("Warning: Failed to remove {}: {}", added_path.display(), e);
                    }
                }
                
                // Restore original states
                tracker.rollback(root_repo)?;
                
                return Err(anyhow!("Operation failed and was rolled back"));
            }
        }
    }
    
    // Process updates to existing submodules
    if opts.update {
        for (path, (name, info)) in &update_repos {
            match update_submodule(root_repo, name, info, opts) {
                Ok(_) => {
                    println!("✓ Updated {}", path.display());
                    completed_updates.push((path, name));
                }
                Err(e) => {
                    eprintln!("✗ Failed to update {}: {}", path.display(), e);
                    
                    // Rollback everything
                    eprintln!("\nOperation failed. Rolling back all changes...");
                    
                    // Remove newly added submodules
                    for added_path in completed_new {
                        if let Err(e) = remove_submodule(root_repo, added_path) {
                            eprintln!("Warning: Failed to remove {}: {}", added_path.display(), e);
                        }
                    }
                    
                    // Restore all original states (including partially updated ones)
                    tracker.rollback(root_repo)?;
                    
                    return Err(anyhow!("Operation failed and was rolled back"));
                }
            }
        }
    }
    
    println!("\nAll operations completed successfully!");
    Ok(())
}

/// Remove a submodule (for rollback of new additions)
fn remove_submodule(repo: &Repository, path: &Path) -> Result<()> {
    let path_str = path.to_string_lossy();
    
    // Use git commands for clean removal
    Command::new("git")
        .args(&["submodule", "deinit", "-f", &path_str])
        .status()?;
    
    Command::new("git")
        .args(&["rm", "-f", &path_str])
        .status()?;
    
    // Clean up .git/modules directory
    let modules_path = PathBuf::from(".git/modules").join(path);
    if modules_path.exists() {
        fs::remove_dir_all(&modules_path)?;
    }
    
    Ok(())
}

/// Add a new submodule
fn add_submodule(
    repo: &mut Repository,
    path: &Path,
    info: &Repo,
    opts: &Opts,
) -> Result<()> {
    let mut submod = repo.submodule(info.url.as_str(), path, true)?;
    let subrepo = submod.open()?;
    fetch(&subrepo, "origin", &info.version)?;
    checkout_to_version(&subrepo, &info.version, !opts.no_checkout)?;
    submod.add_finalize()?;
    Ok(())
}

/// Update an existing submodule
fn update_submodule(
    repo: &mut Repository,
    name: &str,
    info: &Repo,
    opts: &Opts,
) -> Result<()> {
    repo.submodule_set_url(name, info.url.as_str())?;
    let mut submod = repo.find_submodule(name)?;
    let subrepo = submod.open()?;
    fetch(&subrepo, "origin", &info.version)?;
    checkout_to_version(&subrepo, &info.version, !opts.no_checkout)?;
    submod.add_finalize()?;
    Ok(())
}
```

**Key Design Features**:
1. **Pre-operation State Capture**: Record all submodule commits before any changes
2. **Atomic Operations**: All succeed or all fail with complete rollback
3. **Clean State Requirement**: Repository must be clean before operations
4. **Comprehensive Rollback**: Restores exact commit SHAs, not just branches

**Usage in Main Flow**:
```rust
fn main() -> Result<()> {
    let opts = Opts::parse();
    let mut root_repo = Repository::open(".")?;
    
    // 1. Parse repos file
    let repos_list = parse_repos_file(&opts.repo_file)?;
    
    // 2. Validate clean state (includes submodule state validation)
    perform_preflight_checks(&opts, &repos_list.repositories, &root_repo)?;
    
    // 3. Categorize repositories
    let (new_repos, update_repos, _) = categorize_repositories(&opts, &repos_list, &root_repo)?;
    
    // 4. Process with rollback on failure
    process_repositories_with_rollback(&mut root_repo, new_repos, update_repos, &opts)?;
    
    Ok(())
}
```

**Advantages of This Approach**:
- **Data Safety**: Never loses user work due to clean state requirement
- **Predictable**: Either all operations succeed or none do
- **Git-Native**: Uses Git's own mechanisms for state management
- **Simple Recovery**: Users know exactly what state they're in after failure

### 4. Add Input Validation
**Issue**: Multiple validation gaps (duplicate submodule names, invalid paths, unclean submodule states)  
**Complexity**: Low

**Proposed Solution**:
```rust
fn validate_repositories(
    repos: &IndexMap<PathBuf, Repo>, 
    prefix: &Path,
    root_repo: &Repository,
) -> Result<()> {
    let mut seen_names = HashSet::new();
    let mut seen_paths = HashSet::new();
    
    for (path, repo) in repos {
        // Validate submodule name uniqueness
        let full_path = prefix.join(path);
        let name = full_path.to_string_lossy().to_string();
        
        if !seen_names.insert(name.clone()) {
            bail!("Duplicate submodule name: {}", name);
        }
        
        // Validate path uniqueness
        if !seen_paths.insert(full_path.clone()) {
            bail!("Duplicate submodule path: {}", full_path.display());
        }
        
        // Validate URL format
        if repo.url.scheme() != "git" && repo.url.scheme() != "ssh" && repo.url.scheme() != "https" {
            bail!("Invalid repository URL scheme: {}", repo.url);
        }
        
        // Validate path safety
        if path.is_absolute() {
            bail!("Repository path must be relative: {}", path.display());
        }
        
        if path.components().any(|c| c == std::path::Component::ParentDir) {
            bail!("Repository path cannot contain '..' components: {}", path.display());
        }
    }
    
    Ok(())
}

/// Validate that existing submodules are in a clean state
fn validate_submodule_states(root_repo: &Repository) -> Result<()> {
    for submodule in root_repo.submodules()? {
        let name = submodule.name()
            .ok_or_else(|| anyhow!("Submodule without name"))?;
        let path = submodule.path();
        
        // Check if submodule is initialized
        if submodule.workdir_id().is_none() {
            bail!(
                "Submodule '{}' at {} is not initialized. \
                Please run 'git submodule update --init' first.",
                name, path.display()
            );
        }
        
        // Open the submodule repository
        let sub_repo = match submodule.open() {
            Ok(repo) => repo,
            Err(_) => {
                bail!(
                    "Cannot open submodule '{}' at {}. \
                    It may be deinitialized or corrupted.",
                    name, path.display()
                );
            }
        };
        
        // Check for uncommitted changes
        let statuses = sub_repo.statuses(None)?;
        if !statuses.is_empty() {
            let modified_count = statuses.iter()
                .filter(|s| {
                    let flags = s.status();
                    flags.contains(git2::Status::WT_MODIFIED) ||
                    flags.contains(git2::Status::INDEX_MODIFIED) ||
                    flags.contains(git2::Status::WT_NEW) ||
                    flags.contains(git2::Status::INDEX_NEW)
                })
                .count();
            
            if modified_count > 0 {
                bail!(
                    "Submodule '{}' at {} has uncommitted changes. \
                    Please commit or stash changes before running vcs2git.",
                    name, path.display()
                );
            }
        }
        
        // Check if HEAD is detached (normal for submodules) but ensure it matches expected commit
        let head_oid = sub_repo.head()?.target()
            .ok_or_else(|| anyhow!("Submodule HEAD has no target"))?;
        let expected_oid = submodule.workdir_id()
            .ok_or_else(|| anyhow!("No workdir commit for submodule"))?;
        
        if head_oid != expected_oid {
            bail!(
                "Submodule '{}' at {} is checked out to a different commit than expected. \
                Expected: {}, Actual: {}. \
                Please run 'git submodule update' to synchronize.",
                name, path.display(), expected_oid, head_oid
            );
        }
    }
    
    Ok(())
}

/// Enhanced pre-flight validation
fn perform_preflight_checks(
    opts: &Opts,
    repos: &IndexMap<PathBuf, Repo>,
    root_repo: &Repository,
) -> Result<()> {
    // Validate repository definitions
    validate_repositories(repos, &opts.prefix, root_repo)?;
    
    // Validate existing submodule states
    println!("Checking existing submodule states...");
    validate_submodule_states(root_repo)?;
    
    // Additional check for the working tree of the main repository
    let statuses = root_repo.statuses(None)?;
    let has_staged_changes = statuses.iter().any(|s| {
        s.status().contains(git2::Status::INDEX_NEW) ||
        s.status().contains(git2::Status::INDEX_MODIFIED) ||
        s.status().contains(git2::Status::INDEX_DELETED)
    });
    
    if has_staged_changes {
        bail!(
            "The repository has staged changes. \
            Please commit or reset staged changes before running vcs2git."
        );
    }
    
    println!("All validation checks passed.");
    Ok(())
}
```

**Usage in Main Flow**:
```rust
fn main() -> Result<()> {
    let opts = Opts::parse();
    let root_repo = Repository::open(".")?;
    let repos_list = parse_repos_file(&opts.repo_file)?;
    
    // Perform all validation before any modifications
    perform_preflight_checks(&opts, &repos_list.repositories, &root_repo)?;
    
    // Continue with processing...
}
```

## Performance Improvements

### 5. Implement Parallel Processing
**Issue**: Sequential processing is slow for large repository lists  
**Complexity**: High

**Proposed Solution**:
```rust
use tokio::task::JoinSet;
use std::sync::Arc;

async fn process_repositories_parallel(
    repos: Vec<(PathBuf, &Repo)>,
    root_repo_path: PathBuf,
    max_concurrent: usize,
) -> Result<Vec<ProcessResult>> {
    let mut set = JoinSet::new();
    let semaphore = Arc::new(tokio::sync::Semaphore::new(max_concurrent));
    
    for (path, repo) in repos {
        let permit = Arc::clone(&semaphore);
        let root_path = root_repo_path.clone();
        let repo_clone = repo.clone();
        
        set.spawn(async move {
            let _permit = permit.acquire().await.unwrap();
            process_single_repository(path, repo_clone, root_path).await
        });
    }
    
    let mut results = Vec::new();
    while let Some(res) = set.join_next().await {
        results.push(res??);
    }
    
    Ok(results)
}

async fn process_single_repository(
    path: PathBuf,
    repo: Repo,
    root_repo_path: PathBuf,
) -> Result<ProcessResult> {
    // Spawn blocking task for Git operations
    tokio::task::spawn_blocking(move || {
        let mut root_repo = Repository::open(&root_repo_path)?;
        // ... existing Git operations ...
        Ok(ProcessResult::Success(path))
    }).await?
}
```

**CLI Integration**:
```rust
#[derive(Parser)]
struct Opts {
    // ... existing fields ...
    
    /// Maximum number of concurrent repository operations
    #[clap(long, default_value = "4")]
    pub max_concurrent: usize,
}
```

### 6. Add Configurable Timeouts
**Issue**: No timeouts for Git operations  
**Complexity**: Medium

**Proposed Solution**:
```rust
use std::time::Duration;

fn fetch_with_timeout(
    repo: &Repository,
    remote: &str,
    version: &str,
    timeout: Duration,
) -> Result<(), git2::Error> {
    // Set up timeout callback
    let mut cb = RemoteCallbacks::new();
    cb.credentials(|_url, username, _allowed_types| {
        Cred::ssh_key_from_agent(username.unwrap())
    });
    
    // Add progress callback with timeout check
    let start_time = std::time::Instant::now();
    cb.progress(move |progress| {
        if start_time.elapsed() > timeout {
            // Signal timeout - libgit2 will abort
            false
        } else {
            true
        }
    });
    
    let mut fetch_opts = FetchOptions::new();
    fetch_opts.remote_callbacks(cb);
    
    repo.find_remote(remote)?
        .fetch(&[version], Some(&mut fetch_opts), None)?;
    
    Ok(())
}
```

## Authentication Enhancements

### 7. Multiple Authentication Methods
**Issue**: Only SSH agent authentication supported  
**Complexity**: High

**Proposed Solution**:
```rust
#[derive(Debug, Clone)]
enum AuthMethod {
    SshAgent,
    SshKey { private_key: PathBuf, public_key: Option<PathBuf> },
    HttpsToken { token: String },
    HttpsUserPass { username: String, password: String },
}

impl AuthMethod {
    fn create_credential(&self, url: &str, username: Option<&str>) -> Result<Cred, git2::Error> {
        match self {
            AuthMethod::SshAgent => {
                Cred::ssh_key_from_agent(username.unwrap_or("git"))
            }
            AuthMethod::SshKey { private_key, public_key } => {
                let public_key = public_key.as_ref().map(|p| p.as_path());
                Cred::ssh_key(username.unwrap_or("git"), public_key, private_key, None)
            }
            AuthMethod::HttpsToken { token } => {
                Cred::userpass_plaintext(username.unwrap_or("git"), token)
            }
            AuthMethod::HttpsUserPass { username: user, password } => {
                Cred::userpass_plaintext(user, password)
            }
        }
    }
}

fn create_auth_callback(auth_methods: &[AuthMethod]) -> RemoteCallbacks {
    let mut cb = RemoteCallbacks::new();
    let methods = auth_methods.to_vec();
    
    cb.credentials(move |url, username, allowed_types| {
        for method in &methods {
            if let Ok(cred) = method.create_credential(url, username) {
                return Ok(cred);
            }
        }
        Err(git2::Error::from_str("No valid authentication method found"))
    });
    
    cb
}
```

**CLI Integration**:
```rust
#[derive(Parser)]
struct Opts {
    // ... existing fields ...
    
    /// SSH private key file for authentication
    #[clap(long)]
    pub ssh_key: Option<PathBuf>,
    
    /// Personal access token for HTTPS authentication
    #[clap(long)]
    pub token: Option<String>,
    
    /// Username for HTTPS authentication
    #[clap(long)]
    pub username: Option<String>,
    
    /// Password for HTTPS authentication (prompt if not provided)
    #[clap(long)]
    pub password: Option<String>,
}
```

## User Experience Improvements

### 8. Add Progress Reporting
**Issue**: No feedback during long-running operations  
**Complexity**: Low

**Proposed Solution**:
```rust
use indicatif::{ProgressBar, ProgressStyle};

fn create_progress_bar(total: usize) -> ProgressBar {
    let pb = ProgressBar::new(total as u64);
    pb.set_style(
        ProgressStyle::default_bar()
            .template("[{elapsed_precise}] {bar:40.cyan/blue} {pos}/{len} {msg}")
            .unwrap()
            .progress_chars("##-"),
    );
    pb
}

fn process_repositories_with_progress(repos: Vec<(PathBuf, &Repo)>) -> Result<()> {
    let pb = create_progress_bar(repos.len());
    
    for (i, (path, repo)) in repos.iter().enumerate() {
        pb.set_message(format!("Processing {}", path.display()));
        
        // Process repository...
        match process_repository(path, repo) {
            Ok(_) => pb.println(format!("✓ Added {}", path.display())),
            Err(e) => pb.println(format!("✗ Failed {}: {}", path.display(), e)),
        }
        
        pb.inc(1);
    }
    
    pb.finish_with_message("Complete!");
    Ok(())
}
```

### 9. Add Dry-Run Mode
**Issue**: No way to preview operations  
**Complexity**: Low

**Proposed Solution**:
```rust
#[derive(Parser)]
struct Opts {
    // ... existing fields ...
    
    /// Show what would be done without making changes
    #[clap(long)]
    pub dry_run: bool,
}

fn main() -> Result<()> {
    let opts = Opts::parse();
    
    if opts.dry_run {
        println!("DRY RUN MODE - No changes will be made");
        return dry_run_analysis(&opts);
    }
    
    // ... normal processing ...
}

fn dry_run_analysis(opts: &Opts) -> Result<()> {
    // Parse and validate without making changes
    let repos_list: ReposFile = parse_repos_file(&opts.repo_file)?;
    let selected_repos = filter_repositories(&repos_list, opts)?;
    
    println!("Would process {} repositories:", selected_repos.len());
    for (path, repo) in selected_repos {
        println!("  + {} ({}@{})", path.display(), repo.url, repo.version);
    }
    
    Ok(())
}
```

## Testing and Quality Assurance

### 10. Add Comprehensive Test Suite
**Issue**: Limited test coverage  
**Complexity**: Medium

**Proposed Solution**:
```rust
#[cfg(test)]
mod tests {
    use super::*;
    use tempfile::TempDir;
    
    #[test]
    fn test_check_disjoint_logic() {
        let set1: HashSet<i32> = [1, 2, 3].iter().cloned().collect();
        let set2: HashSet<i32> = [3, 4, 5].iter().cloned().collect();
        
        // Should fail because sets are not disjoint
        assert!(check_disjoint(&set1, &set2).is_err());
        
        let set3: HashSet<i32> = [6, 7, 8].iter().cloned().collect();
        
        // Should succeed because sets are disjoint
        assert!(check_disjoint(&set1, &set3).is_ok());
    }
    
    #[test]
    fn test_repository_validation() {
        let mut repos = IndexMap::new();
        repos.insert(
            PathBuf::from("test/repo1"),
            Repo {
                r#type: RepoType::Git,
                url: "https://github.com/test/repo1".parse().unwrap(),
                version: "main".to_string(),
            },
        );
        
        assert!(validate_repositories(&repos, &PathBuf::from("src")).is_ok());
    }
    
    #[test]
    fn test_integration_with_temp_repo() {
        let temp_dir = TempDir::new().unwrap();
        let repo_path = temp_dir.path().join("test_repo");
        
        // Create temporary git repository
        let repo = Repository::init(&repo_path).unwrap();
        
        // Test operations...
    }
}
```

## Configuration and Flexibility

### 11. Add Configuration File Support
**Issue**: No persistent configuration options  
**Complexity**: Medium

**Proposed Solution**:
```rust
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
struct Config {
    pub max_concurrent: usize,
    pub timeout_seconds: u64,
    pub default_auth_method: String,
    pub ssh_key_path: Option<PathBuf>,
    pub progress_bar: bool,
    pub default_prefix: Option<PathBuf>,
}

impl Default for Config {
    fn default() -> Self {
        Self {
            max_concurrent: 4,
            timeout_seconds: 300,
            default_auth_method: "ssh-agent".to_string(),
            ssh_key_path: None,
            progress_bar: true,
            default_prefix: None,
        }
    }
}

fn load_config() -> Result<Config> {
    let config_path = dirs::config_dir()
        .ok_or_else(|| anyhow!("Could not find config directory"))?
        .join("vcs2git")
        .join("config.toml");
    
    if config_path.exists() {
        let content = fs::read_to_string(&config_path)?;
        Ok(toml::from_str(&content)?)
    } else {
        Ok(Config::default())
    }
}
```

## Implementation Priority Matrix

| Issue                    | Priority | Complexity | Impact | Timeline  |
|--------------------------|----------|------------|--------|-----------|
| Fix disjoint check logic | Critical | Low        | High   | Immediate |
| Fix error messages       | High     | Low        | Medium | v0.3.1    |
| Add input validation     | High     | Low        | High   | v0.3.1    |
| Add progress reporting   | Medium   | Low        | High   | v0.4.0    |
| Add dry-run mode         | Medium   | Low        | Medium | v0.4.0    |
| Implement rollback       | Medium   | Medium     | High   | v0.4.0    |
| Add parallel processing  | Medium   | High       | High   | v0.4.0    |
| Multiple auth methods    | Low      | High       | Medium | v0.5.0    |
| Configuration file       | Low      | Medium     | Low    | v0.5.0    |
| Comprehensive tests      | High     | Medium     | High   | Ongoing   |

## Recommended Implementation Order

1. **Immediate (v0.3.1)**: Critical bug fixes and validation
2. **Short-term (v0.4.0)**: UX improvements and basic parallelization
3. **Medium-term (v0.5.0)**: Advanced features and configuration
4. **Long-term (v1.0.0)**: Comprehensive testing and polish