Crate helia_unixfs

Crate helia_unixfs 

Source
Expand description

§Helia UnixFS

A Rust implementation of the IPFS UnixFS filesystem, providing file and directory operations with content-addressed storage.

§Overview

UnixFS is a protobuf-based format for representing files and directories on IPFS. This crate provides a high-level interface for:

  • File Operations: Store and retrieve files with automatic chunking for large files
  • Directory Operations: Create, modify, and traverse directory structures
  • Metadata Support: Unix-style permissions (mode) and modification times (mtime)
  • Content Addressing: All operations return CIDs (Content Identifiers)
  • Efficient Chunking: Automatic chunking for files >1MB with configurable chunk size

§Core Concepts

§Content Addressing

Every file and directory is identified by a CID, ensuring:

  • Immutability: Content cannot be changed without changing the CID
  • Deduplication: Identical content has the same CID
  • Verification: Content can be verified against its CID

§DAG-PB vs Raw Blocks

  • Small files (<256KB): Can be stored as either DAG-PB or raw blocks
  • Large files (>256KB): Automatically chunked and stored as DAG-PB with links
  • Directories: Always stored as DAG-PB with links to entries

§Chunking Strategy

Large files are split into chunks for efficient storage and retrieval:

  • Default chunk size: 262,144 bytes (256KB)
  • Configurable: Set chunk_size in AddOptions
  • Merkle DAG: Chunks are organized in a balanced tree structure

§Usage Examples

§Basic File Operations

use std::sync::Arc;
use rust_helia::create_helia_default;
use helia_unixfs::{UnixFS, UnixFSInterface, AddOptions};
use bytes::Bytes;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a Helia node
    let helia = create_helia_default().await?;
    let fs = UnixFS::new(Arc::new(helia));
     
    // Add a small file
    let data = Bytes::from("Hello, IPFS!");
    let cid = fs.add_bytes(data, None).await?;
    println!("File CID: {}", cid);
     
    // Read the file back
    let content = fs.cat(&cid, None).await?;
    println!("Content: {:?}", content);
     
    // Add with options
    let data2 = Bytes::from("Important data");
    let cid2 = fs.add_bytes(data2, Some(AddOptions {
        pin: true,  // Pin for persistence
        raw_leaves: true,  // Use raw blocks for leaves
        ..Default::default()
    })).await?;
     
    Ok(())
}

§File with Metadata

// Create file with Unix permissions and timestamp
let file = FileCandidate {
    path: "document.txt".to_string(),
    content: Bytes::from("Important document"),
    mode: Some(0o644),  // rw-r--r--
    mtime: Some(UnixFSTime::now()),
};

let cid = fs.add_file(file, None).await?;

§Directory Operations

// Create an empty directory
let dir_cid = fs.add_directory(None, None).await?;

// Add a file to the directory
let file_data = Bytes::from("README content");
let file_cid = fs.add_bytes(file_data, None).await?;
let updated_dir = fs.cp(&file_cid, &dir_cid, "README.md", None).await?;

// Create a subdirectory
use helia_unixfs::MkdirOptions;
let dir_with_subdir = fs.mkdir(
    &updated_dir,
    "docs",
    Some(MkdirOptions {
        mode: Some(0o755),  // rwxr-xr-x
        ..Default::default()
    })
).await?;

// List directory contents
let entries = fs.ls(&dir_with_subdir, None).await?;
// Iterate through entries...

§Large File Handling

// Large files are automatically chunked
let large_data = Bytes::from(vec![0u8; 5_000_000]); // 5MB

let cid = fs.add_bytes(large_data, Some(AddOptions {
    chunk_size: Some(524_288), // 512KB chunks
    ..Default::default()
})).await?;

// Read with offset and length for efficient partial reads
use helia_unixfs::CatOptions;
let partial = fs.cat(&cid, Some(CatOptions {
    offset: Some(1_000_000),  // Start at 1MB
    length: Some(100_000),     // Read 100KB
})).await?;

§Working with Statistics

use helia_unixfs::{UnixFSStat, FileStat, DirectoryStat};

let stats = fs.stat(cid, None).await?;

match stats {
    UnixFSStat::File(file_stats) => {
        println!("File size: {} bytes", file_stats.size);
        println!("Blocks: {}", file_stats.blocks);
        if let Some(mode) = file_stats.mode {
            println!("Mode: {:o}", mode);
        }
    }
    UnixFSStat::Directory(dir_stats) => {
        println!("Directory with {} entries", dir_stats.entries);
        println!("Total size: {} bytes", dir_stats.size);
    }
}

§Performance Characteristics

§File Size Guidelines

  • < 256KB: Single block, fast add/retrieve
  • 256KB - 1MB: Single block with DAG-PB wrapper
  • > 1MB: Automatically chunked into 256KB blocks
  • Very large (>100MB): Efficient streaming with balanced Merkle tree

§Memory Usage

  • Small files: Loaded entirely into memory
  • Large files: Chunked streaming, constant memory usage
  • Directories: Efficient lazy evaluation of entries

§Operation Complexity

  • add_bytes(): O(n) where n = file size
  • cat(): O(n) where n = bytes read
  • ls(): O(m) where m = number of entries
  • cp(): O(m) where m = directory size
  • stat(): O(1) - constant time

§Thread Safety

All UnixFS operations are thread-safe:

  • Uses Arc<dyn Helia> for shared access
  • All methods use &self (immutable borrow)
  • Safe to share UnixFS instance across threads
  • Concurrent operations are supported
let helia = create_helia_default().await?;
let fs = Arc::new(UnixFS::new(Arc::new(helia)));

// Clone and use in multiple tasks
let fs1 = Arc::clone(&fs);
let fs2 = Arc::clone(&fs);

tokio::spawn(async move {
    // Use fs1 in this task
});

tokio::spawn(async move {
    // Use fs2 in this task
});

§Error Handling

All operations return Result<T, UnixFSError>:

match fs.cat(cid, None).await {
    Ok(data) => println!("Read {} bytes", data.len()),
    Err(UnixFSError::NotAFile { cid }) => {
        println!("Not a file: {}", cid);
    }
    Err(UnixFSError::NotUnixFS { cid }) => {
        println!("Not a UnixFS node: {}", cid);
    }
    Err(e) => println!("Error: {}", e),
}

§Limitations

§Current Limitations

  • Symlinks: Not yet implemented (returns error)
  • HAMTs: Large directories (>10,000 entries) not optimized
  • Inline CIDs: Very small files not inlined in parent blocks
  • Trickle DAG: Only uses balanced DAG structure

§Future Enhancements

  • Support for UnixFS v2 features
  • HAMT-sharded directories for very large directories
  • Trickle DAG option for better streaming
  • More compression options

§Compatibility

This implementation is compatible with:

  • go-ipfs/Kubo: Full compatibility with standard IPFS nodes
  • js-ipfs: Compatible with JavaScript IPFS implementations
  • @helia/unixfs: Compatible with TypeScript Helia implementation

§Examples Directory

See the examples/ directory for more usage examples:

  • 01_simple_file.rs - Basic file operations
  • 02_large_file.rs - Chunked file handling
  • 03_directories.rs - Directory operations
  • 04_metadata.rs - Working with permissions and times

Re-exports§

pub use chunker::*;
pub use dag_pb::*;
pub use errors::*;
pub use unixfs::*;

Modules§

chunker
dag_pb
data
Nested message and enum types in Data.
errors
UnixFS-specific error types
unixfs
UnixFS implementation for Helia
unixfs_pb

Structs§

AddOptions
Options for adding content
CatOptions
Options for reading content
CpOptions
Options for copying content
Data
UnixFS data structure
DirectoryCandidate
Directory candidate for adding to UnixFS
DirectoryStat
Directory statistics
FileCandidate
File candidate for adding to UnixFS
FileStat
File statistics
LsOptions
Options for listing directory contents
Metadata
Metadata for files/directories
MkdirOptions
Options for making directories
RmOptions
Options for removing content
StatOptions
Options for file/directory statistics
UnixFSEntry
UnixFS directory entry
UnixFSTime
UnixFS timestamp
UnixTime
Unix timestamp

Enums§

UnixFSStat
Union type for file and directory statistics
UnixFSType
UnixFS entry types

Traits§

UnixFSInterface
Main UnixFS interface trait

Functions§

create_unixfs
Create a UnixFS instance from a Helia node