Crate feedparser_rs

Crate feedparser_rs 

Source
Expand description

§feedparser-rs: High-performance RSS/Atom/JSON Feed parser

A pure Rust implementation of feed parsing with API compatibility for Python’s feedparser library. Designed for 10-100x faster feed parsing with identical behavior.

§Quick Start

use feedparser_rs::parse;

let xml = r#"
    <?xml version="1.0"?>
    <rss version="2.0">
        <channel>
            <title>Example Feed</title>
            <link>https://example.com</link>
            <item>
                <title>First Post</title>
                <link>https://example.com/post/1</link>
            </item>
        </channel>
    </rss>
"#;

let feed = parse(xml.as_bytes()).unwrap();
assert!(!feed.bozo);
assert_eq!(feed.feed.title.as_deref(), Some("Example Feed"));
assert_eq!(feed.entries.len(), 1);

§Supported Formats

FormatVersionsDetection
RSS0.90, 0.91, 0.92, 2.0<rss> element
RSS 1.0RDF-based<rdf:RDF> with RSS namespace
Atom0.3, 1.0<feed> with Atom namespace
JSON Feed1.0, 1.1version field starting with https://jsonfeed.org

§Namespace Extensions

The parser supports common feed extensions:

  • iTunes/Podcast (itunes:) - Podcast metadata, categories, explicit flags
  • Podcast 2.0 (podcast:) - Transcripts, chapters, funding, persons
  • Dublin Core (dc:) - Creator, date, rights, subject
  • Media RSS (media:) - Thumbnails, content, descriptions
  • Content (content:encoded) - Full HTML content
  • Syndication (sy:) - Update frequency hints
  • GeoRSS (georss:) - Geographic coordinates
  • Creative Commons (cc:, creativeCommons:) - License information

§Type-Safe URL and MIME Handling

The library uses semantic newtypes for improved type safety:

use feedparser_rs::{Url, MimeType, Email};

// Url - wraps URL strings without validation (bozo-compatible)
let url = Url::new("https://example.com/feed.xml");
assert_eq!(url.as_str(), "https://example.com/feed.xml");
assert!(url.starts_with("https://")); // Deref to str

// MimeType - uses Arc<str> for efficient cloning
let mime = MimeType::new("application/rss+xml");
let clone = mime.clone(); // Cheap: just increments refcount

// Email - wraps email addresses
let email = Email::new("author@example.com");

These types implement Deref<Target=str>, so string methods work directly:

use feedparser_rs::Url;

let url = Url::new("https://example.com/path?query=1");
assert!(url.contains("example.com"));
assert_eq!(url.len(), 32);

§The Bozo Pattern

Following Python feedparser’s philosophy, this library never panics on malformed input. Instead, it sets the bozo flag and continues parsing:

use feedparser_rs::parse;

// XML with undefined entity - triggers bozo
let xml_with_entity = b"<rss version='2.0'><channel><title>Test &#xFFFF;</title></channel></rss>";

let feed = parse(xml_with_entity).unwrap();
// Parser handles invalid characters gracefully
assert!(feed.feed.title.is_some());

The bozo flag indicates the feed had issues but was still parseable.

§Resource Limits

Protect against malicious feeds with ParserLimits:

use feedparser_rs::{parse_with_limits, ParserLimits};

// Customize limits for untrusted input
let limits = ParserLimits {
    max_entries: 100,
    max_text_length: 50_000,
    ..Default::default()
};

let xml = b"<rss version='2.0'><channel><title>Safe</title></channel></rss>";
let feed = parse_with_limits(xml, limits).unwrap();

§HTTP Fetching

With the http feature (enabled by default), fetch feeds from URLs:

use feedparser_rs::parse_url;

// Simple fetch
let feed = parse_url("https://example.com/feed.xml", None, None, None)?;

// With conditional GET for caching
let feed2 = parse_url(
    "https://example.com/feed.xml",
    feed.etag.as_deref(),      // ETag from previous fetch
    feed.modified.as_deref(),  // Last-Modified from previous fetch
    Some("MyApp/1.0"),         // Custom User-Agent
)?;

if feed2.status == Some(304) {
    println!("Feed not modified since last fetch");
}

§Core Types

§Module Structure

  • types - All data structures for parsed feeds
  • namespace - Handlers for namespace extensions (iTunes, Podcast 2.0, etc.)
  • util - Helper functions for dates, HTML sanitization, encoding
  • compat - Python feedparser API compatibility layer
  • http - HTTP client for fetching feeds (requires http feature)

Re-exports§

pub use types::Content;
pub use types::Email;
pub use types::Enclosure;
pub use types::Entry;
pub use types::FeedMeta;
pub use types::FeedVersion;
pub use types::Generator;
pub use types::Image;
pub use types::ItunesCategory;
pub use types::ItunesEntryMeta;
pub use types::ItunesFeedMeta;
pub use types::ItunesOwner;
pub use types::LimitedCollectionExt;
pub use types::MediaContent;
pub use types::MediaThumbnail;
pub use types::MimeType;
pub use types::ParsedFeed;
pub use types::Person;
pub use types::PodcastChapters;
pub use types::PodcastEntryMeta;
pub use types::PodcastFunding;
pub use types::PodcastMeta;
pub use types::PodcastPerson;
pub use types::PodcastSoundbite;
pub use types::PodcastTranscript;
pub use types::PodcastValue;
pub use types::PodcastValueRecipient;
pub use types::Source;
pub use types::Tag;
pub use types::TextConstruct;
pub use types::TextType;
pub use types::Url;
pub use types::parse_duration;
pub use types::parse_explicit;
pub use namespace::syndication::SyndicationMeta;
pub use namespace::syndication::UpdatePeriod;
pub use http::FeedHttpClient;
pub use http::FeedHttpResponse;

Modules§

compat
Compatibility utilities for Python feedparser API Compatibility utilities for feedparser API
http
HTTP client module for fetching feeds from URLs
namespace
Namespace handlers for extended feed formats
types
Type definitions for feed data structures
util
Utility functions for feed parsing

Structs§

ParseOptions
Parser configuration options
ParserLimits
Parser limits for protecting against denial-of-service attacks

Enums§

FeedError
Feed parsing errors
LimitError
Errors that occur when parser limits are exceeded

Functions§

detect_format
Auto-detect feed format from raw data
parse
Parse feed from raw bytes
parse_url
Parse feed from HTTP/HTTPS URL
parse_url_with_limits
Parse feed from URL with custom parser limits
parse_with_limits
Parse feed with custom parser limits

Type Aliases§

Result
Result type for feed parsing operations