pkuinfo-spider 0.1.3

基于微信公众号后台超链接查找的文章爬虫 CLI
Documentation

pkuinfo-spider

A command-line crawler for WeChat Official Account (微信公众号) articles.

crates.io License

Search, list, and scrape articles from WeChat Official Accounts via the MP backend. Convert articles to clean Markdown for archival or further processing.

Install

cargo install pkuinfo-spider

Or install the complete PKU toolkit:

cargo install pku-cli

Setup

[!NOTE] Unlike other PKU tools, this crawler uses WeChat QR login, not IAAA. You need a WeChat account with an Official Account platform access (a test account works).

info-spider login                # Scan QR code with WeChat
info-spider status               # Verify session

Usage

Search Official Accounts

info-spider search "人民日报"
# Returns: fakeid list for matching accounts

Options:

  • -n <count> — Number of results (1-20, default 5)
  • --format table|json — Output format

List Articles

info-spider articles --name "人民日报"
# Or use fakeid directly (skips one search step):
info-spider articles --fakeid <FAKEID>

Options:

  • --begin <offset> — Pagination start
  • --count <n> — Articles per page (default 5, max ~20)
  • -l, --limit <n> — Total articles to fetch across pages
  • --delay-ms <ms> — Random delay between requests (anti-crawler, default 1500)
  • --format table|json|jsonl

Scrape Single Article

info-spider scrape https://mp.weixin.qq.com/s/xxxxx
info-spider scrape <url> -o article.md

Converts the article to clean Markdown, preserving text, links, and image references.

How It Works

The crawler mimics normal user behavior: login → new article → hyperlink panel → search account → list articles. Configurable random delays between requests help bypass anti-crawl risk controls.

Session data (token, fingerprint, bizuin) is stored in ~/.config/info-spider/.

Anti-Crawler Notes

  • Use --delay-ms to add random jitter between requests (default 1500ms)
  • Avoid running long article crawls back-to-back; WeChat may rate-limit or block
  • If you hit a block, wait a few hours before retrying

Links