sfid : Distributed Snowflake ID Generator with Auto-Allocated Machine ID

Features
Installation
Quick Start
API Reference
ID Structure
Architecture
Machine ID Allocation
Tech Stack
Project Structure
Why "Process ID" Instead of "Machine ID"?
History

Features

Lock-free atomic ID generation
Redis-based automatic machine ID allocation
Heartbeat mechanism with auto-release on crash
Clock drift tolerance (sequence borrowing)
Sequence exhaustion handling (timestamp advance)
Configurable epoch

Installation

cargo add sfid

With specific features:

cargo add sfid -F snowflake,auto_pid,parse

Quick Start

Manual Machine ID

use sfid::{Snowflake, EPOCH};

let sf = Snowflake::new(EPOCH, 1);
let id = sf.next();
println!("{id}");

Auto-Allocated Machine ID (Redis)

use sfid::{Snowflake, EPOCH};

#[tokio::main]
async fn main() -> sfid::Result<()> {
  let sf = Snowflake::auto("myapp", EPOCH).await?;
  let id = sf.next();
  println!("{id}");
  Ok(())
}

Parse ID

use sfid::parse;

let parsed = parse(id);
println!("ts: {}, pid: {}, seq: {}", parsed.ts, parsed.pid, parsed.seq);

API Reference

Constants

Name	Type	Description
`EPOCH`	`u64`	Default epoch: 2025-12-22 00:00:00 UTC
`MAX_PID`	`u32`	Maximum machine ID count (1024)
`PID_BITS`	`u32`	Machine ID bits (10)

Structs

`Snowflake`

ID generator with atomic state.

Method	Description
`new(epoch, pid)`	Create with manual machine ID
`auto(app, epoch)`	Create with Redis-allocated machine ID
`next()`	Generate next ID

`Pid`

Machine ID handle with heartbeat. Stops heartbeat on drop.

Method	Description
`id()`	Get allocated machine ID

`ParsedId`

Parsed ID components.

Field	Type	Description
`ts`	`u64`	Timestamp offset from epoch (ms)
`pid`	`u16`	Machine ID
`seq`	`u16`	Sequence number

Functions

Name	Description
`allocate(app)`	Allocate machine ID from Redis
`parse(id)`	Parse ID into components

ID Structure

64-bit signed integer:

┌───────┬──────────────────────────┬────────────┬──────────────┐
│ 1 bit │        41 bits           │  10 bits   │   12 bits    │
│ sign  │     timestamp (ms)       │ machine ID │   sequence   │
│  (0)  │   (offset from epoch)    │  (0-1023)  │   (0-4095)   │
└───────┴──────────────────────────┴────────────┴──────────────┘

Timestamp: ~69 years from epoch
Machine ID: 1024 instances
Sequence: 4096 IDs per millisecond per instance

Architecture

graph TD
  A[Application] --> B[Snowflake]
  B --> C{auto_pid?}
  C -->|Yes| D[allocate]
  D --> E[Redis]
  E --> F[Pid + Heartbeat]
  F --> B
  C -->|No| G[Manual PID]
  G --> B
  B --> H[next]
  H --> I[Atomic State]
  I --> J[ID]

Machine ID Allocation

Flow

graph TD
  A[Start] --> B[Generate random start]
  B --> C[Try SET NX key]
  C --> D{Success?}
  D -->|Yes| E[Start heartbeat]
  D -->|No| F{Already owned?}
  F -->|Yes| E
  F -->|No| G[Try next ID]
  G --> C
  E --> H[Return Pid]

Redis Key Format

sfid:{app}:{pid_le_bytes}

Heartbeat

Interval: 3 minutes
Expiration: 10 minutes
Auto-release on process exit (Drop trait)

Tech Stack

Crate	Purpose
coarsetime	Fast timestamp retrieval
fred	Redis client
tokio	Async runtime
uuid	Unique identifier generation
thiserror	Error handling

Project Structure

sfid/
├── src/
│   ├── lib.rs      # Module exports
│   ├── snowflake.rs # ID generator
│   ├── pid.rs      # Machine ID allocation
│   ├── bits.rs     # Bit constants
│   ├── parse.rs    # ID parsing
│   └── error.rs    # Error types
├── tests/
│   └── main.rs     # Integration tests
└── Cargo.toml

Why "Process ID" Instead of "Machine ID"?

Traditional Snowflake implementations use "machine ID" or "worker ID", assuming one generator per physical machine. This assumption breaks in modern deployments:

Containers: Multiple instances on same host
Kubernetes: Pods scale dynamically
Serverless: No persistent machine identity
Microservices: Multiple services per node

"Process ID" (pid) better reflects reality — each running process needs unique identifier, regardless of physical location. This naming:

Avoids confusion with OS-level machine identifiers
Accurately describes the allocation granularity
Works naturally with container orchestration
Supports multiple generators per host

The 10-bit limit (1024) applies to concurrent processes, not machines.

History

In 2010, Twitter faced a scaling crisis. Their MySQL-based ID generation couldn't keep up with the explosive growth of tweets. The auto-increment approach required coordination between database shards, creating bottlenecks and single points of failure. They were migrating to Cassandra and sharded MySQL (using Gizzard), neither of which provided built-in unique ID generation.

Twitter's requirements were demanding: tens of thousands of IDs per second, high availability, rough time-ordering (tweets posted around the same time should have proximate IDs), and everything must fit in 64 bits. They evaluated MySQL ticket servers (like Flickr's), UUIDs (required 128 bits), and Zookeeper sequential nodes (coordination overhead hurt availability).

In June 2010, Twitter announced Snowflake — an uncoordinated approach composing timestamp, worker number, and sequence number. Worker numbers were assigned via Zookeeper at startup. The implementation went live in October 2010.

The bit allocation was carefully chosen:

41 bits for timestamp: ~69 years of operation
10 bits for machine ID: 1024 concurrent generators (Twitter splits this into 5-bit datacenter ID + 5-bit worker ID)
12 bits for sequence: 4096 IDs per millisecond per generator

Twitter open-sourced Snowflake in Scala, and the design spread rapidly:

Discord adopted it in 2015 (epoch: 2015-01-01)
Instagram modified the format: 41-bit timestamp, 13-bit shard ID, 10-bit sequence
Mastodon uses 48-bit timestamp (UNIX epoch) + 16-bit sequence
Sony's Sonyflake adjusted bit allocation for longer lifespan

The name "Snowflake" captures the essence: like snowflakes in nature, each ID is unique, yet they all share the same elegant structure. Today, Snowflake-style IDs are ubiquitous in distributed systems — from social media to databases to message queues.

About

This project is an open-source component of js0.site ⋅ Refactoring the Internet Plan.

We are redefining the development paradigm of the Internet in a componentized way. Welcome to follow us:

sfid : 自动分配机器号的分布式雪花 ID 生成器

特性

无锁原子 ID 生成
基于 Redis 自动分配机器号
心跳机制，进程崩溃自动释放
时钟回拨容错（序列号借用）
序列号耗尽处理（时间戳推进）
可配置纪元

安装

cargo add sfid

指定特性：

cargo add sfid -F snowflake,auto_pid,parse

快速开始

手动指定机器号

use sfid::{Snowflake, EPOCH};

let sf = Snowflake::new(EPOCH, 1);
let id = sf.next();
println!("{id}");

自动分配机器号 (Redis)

use sfid::{Snowflake, EPOCH};

#[tokio::main]
async fn main() -> sfid::Result<()> {
  let sf = Snowflake::auto("myapp", EPOCH).await?;
  let id = sf.next();
  println!("{id}");
  Ok(())
}

解析 ID

use sfid::parse;

let parsed = parse(id);
println!("ts: {}, pid: {}, seq: {}", parsed.ts, parsed.pid, parsed.seq);

API 参考

常量

名称	类型	说明
`EPOCH`	`u64`	默认纪元：2025-12-22 00:00:00 UTC
`MAX_PID`	`u32`	机器号上限 (1024)
`PID_BITS`	`u32`	机器号位数 (10)

结构体

`Snowflake`

原子状态 ID 生成器。

方法	说明
`new(epoch, pid)`	手动指定机器号创建
`auto(app, epoch)`	Redis 自动分配机器号创建
`next()`	生成下个 ID

`Pid`

带心跳的机器号句柄，drop 时停止心跳。

方法	说明
`id()`	获取分配的机器号

`ParsedId`

解析后的 ID 组件。

字段	类型	说明
`ts`	`u64`	相对纪元的时间戳偏移 (ms)
`pid`	`u16`	机器号
`seq`	`u16`	序列号

函数

名称	说明
`allocate(app)`	从 Redis 分配机器号
`parse(id)`	解析 ID 为组件

ID 结构

64 位有符号整数：

┌───────┬──────────────────────────┬────────────┬──────────────┐
│ 1 bit │        41 bits           │  10 bits   │   12 bits    │
│ 符号  │       时间戳 (ms)         │   机器号   │    序列号    │
│  (0)  │     (相对纪元偏移)        │  (0-1023)  │   (0-4095)   │
└───────┴──────────────────────────┴────────────┴──────────────┘

时间戳：纪元起约 69 年
机器号：1024 实例
序列号：每实例每毫秒 4096 ID

架构

graph TD
  A[应用] --> B[Snowflake]
  B --> C{auto_pid?}
  C -->|是| D[allocate]
  D --> E[Redis]
  E --> F[Pid + 心跳]
  F --> B
  C -->|否| G[手动 PID]
  G --> B
  B --> H[next]
  H --> I[原子状态]
  I --> J[ID]

机器号分配

流程

graph TD
  A[启动] --> B[生成随机起点]
  B --> C[尝试 SET NX key]
  C --> D{成功?}
  D -->|是| E[启动心跳]
  D -->|否| F{已拥有?}
  F -->|是| E
  F -->|否| G[尝试下个 ID]
  G --> C
  E --> H[返回 Pid]

Redis 键格式

sfid:{app}:{pid_le_bytes}

心跳

间隔：3 分钟
过期：10 分钟
进程退出自动释放 (Drop trait)

技术栈

Crate	用途
coarsetime	快速时间戳获取
fred	Redis 客户端
tokio	异步运行时
uuid	唯一标识生成
thiserror	错误处理

目录结构

sfid/
├── src/
│   ├── lib.rs      # 模块导出
│   ├── snowflake.rs # ID 生成器
│   ├── pid.rs      # 机器号分配
│   ├── bits.rs     # 位常量
│   ├── parse.rs    # ID 解析
│   └── error.rs    # 错误类型
├── tests/
│   └── main.rs     # 集成测试
└── Cargo.toml

为何用"进程号"而非"机器号"？

传统雪花实现使用"机器号"或"工作节点号"，假设每台物理机运行一个生成器。这一假设在现代部署中已不成立：

容器：同一主机运行多个实例
Kubernetes：Pod 动态伸缩
Serverless：无持久机器身份
微服务：单节点多服务

"进程号"(pid) 更贴合现实——每个运行中的进程需要唯一标识，与物理位置无关。这一命名：

避免与操作系统级机器标识混淆
准确描述分配粒度
与容器编排自然契合
支持单主机多生成器

10 位限制 (1024) 针对并发进程数，而非机器数。

历史

2010 年，Twitter 面临扩展危机。基于 MySQL 的 ID 生成无法跟上推文的爆发式增长。自增方案需要数据库分片间协调，造成瓶颈和单点故障。他们正迁移至 Cassandra 和分片 MySQL（使用 Gizzard），两者都不提供内置唯一 ID 生成。

Twitter 的需求很苛刻：每秒数万 ID、高可用、大致按时间排序（相近时间发布的推文应有相近 ID）、且必须装进 64 位。他们评估了 MySQL ticket 服务器（如 Flickr 方案）、UUID（需要 128 位）、Zookeeper 顺序节点（协调开销影响可用性）。

2010 年 6 月，Twitter 宣布 Snowflake——无协调方案，组合时间戳、工作节点号、序列号。工作节点号在启动时通过 Zookeeper 分配。实现于 2010 年 10 月上线。

位分配经过精心设计：

41 位时间戳：约 69 年运行周期
10 位机器号：1024 个并发生成器（Twitter 拆分为 5 位数据中心 ID + 5 位工作节点 ID）
12 位序列号：每生成器每毫秒 4096 个 ID

Twitter 以 Scala 开源了 Snowflake，设计迅速传播：

Discord 于 2015 年采用（纪元：2015-01-01）
Instagram 修改了格式：41 位时间戳、13 位分片 ID、10 位序列号
Mastodon 使用 48 位时间戳（UNIX 纪元）+ 16 位序列号
Sony 的 Sonyflake 调整位分配以延长寿命

"Snowflake"（雪花）之名道出本质：如同自然界的雪花，每个 ID 独一无二，却共享同样优雅的结构。如今，雪花式 ID 在分布式系统中无处不在——从社交媒体到数据库再到消息队列。

关于

本项目为 js0.site ⋅ 重构互联网计划的开源组件。

我们正在以组件化的方式重新定义互联网的开发范式，欢迎关注：

sfid 0.1.1

sfid : Distributed Snowflake ID Generator with Auto-Allocated Machine ID

Table of Contents

Features

Installation

Quick Start

Manual Machine ID

Auto-Allocated Machine ID (Redis)

Parse ID

API Reference

Constants

Structs

Snowflake

Pid

ParsedId

Functions

ID Structure

Architecture

Machine ID Allocation

Flow

Redis Key Format

Heartbeat

Tech Stack

Project Structure

Why "Process ID" Instead of "Machine ID"?

History

About

sfid : 自动分配机器号的分布式雪花 ID 生成器

目录

特性

安装

快速开始

手动指定机器号

自动分配机器号 (Redis)

解析 ID

API 参考

常量

结构体

Snowflake

Pid

ParsedId

函数

ID 结构

架构

机器号分配

流程

Redis 键格式

心跳

技术栈

目录结构

为何用"进程号"而非"机器号"？

历史

关于

`Snowflake`

`Pid`

`ParsedId`

`Snowflake`

`Pid`

`ParsedId`