pydocstring

A zero-dependency Rust parser for Python docstrings (Google / NumPy style).

Produces a unified syntax tree with byte-precise source locations on every token — designed as infrastructure for linters and formatters.

Python bindings are also available as pydocstring-rs.

Features

Full syntax tree — builds a complete AST, not just extracted fields; traverse it with the built-in Visitor + walk
Typed nodes per style — style-specific accessors like GoogleArg, NumPyParameter with full type safety
Byte-precise source locations — every token carries its exact byte range for pinpoint diagnostics
Zero dependencies — pure Rust, no external crates, no regex
Error-resilient — never panics; malformed input still yields a best-effort tree
Style auto-detection — hand it a docstring, get back Style::Google, Style::NumPy, or Style::Plain

Installation

[dependencies]
pydocstring = "0.1.13"

Usage

Parsing

use pydocstring::parse::google::{parse_google, GoogleDocstring, GoogleSectionKind};

let input = "Summary.\n\nArgs:\n    x (int): The value.\n    y (int): Another value.";
let result = parse_google(input);
let doc = GoogleDocstring::cast(result.root()).unwrap();

println!("{}", doc.summary().unwrap().text(result.source()));

for section in doc.sections() {
    if section.section_kind(result.source()) == GoogleSectionKind::Args {
        for arg in section.args() {
            println!("{}: {}",
                arg.name().text(result.source()),
                arg.r#type().map(|t| t.text(result.source())).unwrap_or(""));
        }
    }
}

NumPy style works the same way — use parse_numpy / NumPyDocstring instead.

Style Auto-Detection

use pydocstring::parse::{detect_style, Style};

assert_eq!(detect_style("Summary.\n\nArgs:\n    x: Desc."), Style::Google);
assert_eq!(detect_style("Summary.\n\nParameters\n----------\nx : int"), Style::NumPy);
assert_eq!(detect_style("Just a summary."), Style::Plain);

Style::Plain covers docstrings with no recognised section markers: summary-only, summary + extended summary, and unrecognised styles such as Sphinx.

Unified Auto-Detecting Parser

Use parse() to let the library detect the style and parse in one step:

use pydocstring::parse::parse;
use pydocstring::syntax::SyntaxKind;

let result = parse("Summary.\n\nArgs:\n    x: Desc.");
assert_eq!(result.root().kind(), SyntaxKind::GOOGLE_DOCSTRING);

let result = parse("Just a summary.");
assert_eq!(result.root().kind(), SyntaxKind::PLAIN_DOCSTRING);

Source Locations

Every token carries byte offsets for precise diagnostics:

use pydocstring::parse::google::{parse_google, GoogleDocstring, GoogleSectionKind};

let result = parse_google("Summary.\n\nArgs:\n    x (int): The value.");
let doc = GoogleDocstring::cast(result.root()).unwrap();

for section in doc.sections() {
    if section.section_kind(result.source()) == GoogleSectionKind::Args {
        for arg in section.args() {
            let name = arg.name();
            println!("'{}' at byte {}..{}",
                name.text(result.source()), name.range().start(), name.range().end());
        }
    }
}

Syntax Tree

The parse result is a tree of SyntaxNode (branches) and SyntaxToken (leaves), each tagged with a SyntaxKind. Use pretty_print() to visualize:

use pydocstring::parse::google::parse_google;

let result = parse_google("Summary.\n\nArgs:\n    x (int): The value.");
println!("{}", result.pretty_print());

GOOGLE_DOCSTRING@0..42 {
  SUMMARY: "Summary."@0..8
  GOOGLE_SECTION@10..42 {
    GOOGLE_SECTION_HEADER@10..15 {
      NAME: "Args"@10..14
      COLON: ":"@14..15
    }
    GOOGLE_ARG@20..42 {
      NAME: "x"@20..21
      OPEN_BRACKET: "("@22..23
      TYPE: "int"@23..26
      CLOSE_BRACKET: ")"@26..27
      COLON: ":"@27..28
      DESCRIPTION: "The value."@29..39
    }
  }
}

Visitor Pattern

Walk the tree with the Visitor trait for style-agnostic analysis:

use pydocstring::syntax::{Visitor, walk, SyntaxToken, SyntaxKind};
use pydocstring::parse::google::parse_google;

struct NameCollector<'a> {
    source: &'a str,
    names: Vec<String>,
}

impl Visitor for NameCollector<'_> {
    fn visit_token(&mut self, token: &SyntaxToken) {
        if token.kind() == SyntaxKind::NAME {
            self.names.push(token.text(self.source).to_string());
        }
    }
}

let result = parse_google("Summary.\n\nArgs:\n    x: Desc.\n    y: Desc.");
let mut collector = NameCollector { source: result.source(), names: vec![] };
walk(result.root(), &mut collector);
assert_eq!(collector.names, vec!["Args", "x", "y"]);

Style-Independent Model (IR)

Convert any parsed docstring into a style-independent intermediate representation for analysis or transformation:

use pydocstring::parse::google::{parse_google, to_model::to_model};

let parsed = parse_google("Summary.\n\nArgs:\n    x (int): The value.\n");
let doc = to_model(&parsed).unwrap();

assert_eq!(doc.summary.as_deref(), Some("Summary."));
for section in &doc.sections {
    if let pydocstring::model::Section::Parameters(params) = section {
        assert_eq!(params[0].names, vec!["x"]);
        assert_eq!(params[0].type_annotation.as_deref(), Some("int"));
    }
}

Emitting (Code Generation)

Re-emit a Docstring model in any style — useful for style conversion or formatting:

use pydocstring::model::{Docstring, Section, Parameter};
use pydocstring::emit::google::emit_google;
use pydocstring::emit::numpy::emit_numpy;

let doc = Docstring {
    summary: Some("Brief summary.".into()),
    sections: vec![Section::Parameters(vec![Parameter {
        names: vec!["x".into()],
        type_annotation: Some("int".into()),
        description: Some("The value.".into()),
        is_optional: false,
        default_value: None,
    }])],
    ..Default::default()
};

let google = emit_google(&doc);
assert!(google.contains("Args:"));

let numpy = emit_numpy(&doc);
assert!(numpy.contains("Parameters\n----------"));

Combine parsing and emitting to convert between styles:

use pydocstring::parse::google::{parse_google, to_model::to_model};
use pydocstring::emit::numpy::emit_numpy;

let parsed = parse_google("Summary.\n\nArgs:\n    x (int): The value.\n");
let doc = to_model(&parsed).unwrap();
let numpy_text = emit_numpy(&doc);
assert!(numpy_text.contains("Parameters\n----------"));

Supported Sections

Both styles support the following section categories. Typed accessor methods are available on each style's section node.

Category	Google	NumPy
Parameters	`args()` → `GoogleArg`	`parameters()` → `NumPyParameter`
Returns	`returns()` → `GoogleReturns`	`returns()` → `NumPyReturns`
Yields	`yields()` → `GoogleYields`	`yields()` → `NumPyYields`
Raises	`exceptions()` → `GoogleException`	`exceptions()` → `NumPyException`
Warns	`warnings()` → `GoogleWarning`	`warnings()` → `NumPyWarning`
See Also	`see_also_items()` → `GoogleSeeAlsoItem`	`see_also_items()` → `NumPySeeAlsoItem`
Attributes	`attributes()` → `GoogleAttribute`	`attributes()` → `NumPyAttribute`
Methods	`methods()` → `GoogleMethod`	`methods()` → `NumPyMethod`
Free text (Notes, Examples, etc.)	`body_text()`	`body_text()`

Root-level accessors: summary(), extended_summary() (NumPy also has deprecation()). PlainDocstring exposes only summary() and extended_summary().

Development

cargo build
cargo test
cargo run --example parse_auto
cargo run --example parse_google
cargo run --example parse_numpy

pydocstring 0.1.13