cipher_identifier
A Rust library and CLI tool for identifying classical ciphers based on statistical analysis of ciphertext. It calculates various statistical metrics and compares them against known patterns for different cipher types to determine the most likely cipher used.
Features
- Analyzes ciphertext using multiple statistical tests
- Identifies the most likely cipher from 58 different classical cipher types
- Provides detailed statistical information about the ciphertext
- Command-line interface for easy use
- Can be used as a library in other Rust projects
Supported Ciphers
The tool can identify the following 58 classical cipher types:
Cipher Types | |||
---|---|---|---|
6x6bifid | 6x6playfair | Autokey | Bazeries |
Beaufort | CONDI | Grandpre | Grandpre10x10 |
Gromark | NihilistSub6x6 | Patristocrat | Quagmire I |
Quagmire II | Quagmire III | Quagmire IV | Slidefair |
Swagman | Variant | Vigenere | amsco |
bifid | cadenus | checkerboard | cmBifid |
columnar | compressocrat | digrafid | foursquare |
fractionatedMorse | grille | homophonic | keyphrase |
monomeDinome | morbit | myszkowski | nicodemus |
nihilistSub | nihilistTramp | numberedKey | periodicGromark |
phillips | playfair | pollux | porta |
portax | progressiveKey | ragbaby | redefence |
routeTramp | runningKey | sequenceTramp | seriatedPlayfair |
simplesubstitution | syllabary | tridigital | trifid |
trisquare | twosquare |
Installation
Prerequisites
- Rust and Cargo (install from rustup.rs)
Building from Source
# Clone the repository
# Build the project
# The binary will be available at target/release/cipher_identifier
Usage
Command Line Interface
# Analyze ciphertext provided directly
# Analyze ciphertext from a file
# Highlight a specific cipher in the results
Command Line Options
--text
,-t
: The ciphertext to analyze--file
,-f
: Input file containing ciphertext--number
,-n
: The top n most likely ciphers to display (default: 5)--cipher
,-c
: Highlight a specific cipher in the list--verbose
,-v
: Increase verbosity level--help
,-h
: Display help information
API Documentation
Library Overview
The cipher_identifier
library provides a comprehensive API for analyzing ciphertext and identifying classical ciphers. It can be integrated into other Rust projects to add cipher identification capabilities.
Key Components
- Cipher Identification: Core functionality to identify the most likely cipher type
- Statistical Tests: Various algorithms to analyze text patterns
- Cipher Type Definitions: Data structures with cipher metadata
API Output Format
Cipher Identification
The main function identify_cipher
returns a vector of CipherScore
pairs, which are tuples of (String, f64)
representing the cipher name and its score:
// Type definition
pub type CipherScore = ;
// Example output
The scores represent the "distance" between the statistical properties of the input text and the expected properties of each cipher type. Lower scores indicate better matches.
Statistical Tests
The get_all_stats
function returns a HashMap<String, f64>
containing the results of all statistical tests:
// Example output
Cipher Type Information
The CipherType
struct provides metadata about each cipher type:
// Example output for a single cipher type
CipherType
Using the API
Identifying Ciphers
use identify_cipher;
Getting Statistical Information
use get_all_stats;
Using the CipherAnalyzer
use CipherAnalyzer;
Loading Cipher Type Definitions
use ;
Integration Example
Here's a complete example of how to integrate the cipher_identifier library into another project:
use ;
Statistical Tests
The tool uses the following statistical tests to analyze ciphertext:
- IoC (Index of Coincidence): Measures the probability of two randomly selected letters being the same
- MIC (Mutual Index of Coincidence): Measures the maximum periodic index of coincidence
- MKA (Mean Kappa Test): Measures the average kappa value for the text
- DIC (Digraphic Index of Coincidence): Measures the frequency of digraphs (pairs of letters)
- EDI (Even Distribution Index): Measures how evenly distributed the digraphs are
- LR (Length Ratio): Measures the ratio of unique n-grams to total possible n-grams
- ROD (Repeat Order Distribution): Measures the distribution of repeated patterns
- LDI (Letter Distribution Index): Measures how closely the letter distribution matches expected frequencies
- SDD (Standard Deviation Distribution): Measures the standard deviation of letter frequencies
- Shannon Entropy: Measures the information content or randomness of the text
- Binary Random Test: Tests whether the text appears random when converted to binary
Benchmarking
The library includes a benchmarking module to test the accuracy of the cipher identification algorithm:
use run_benchmark;
The test data should be a JSON file with each line containing a test case in the format: