pkuinfo-spider
A command-line crawler for WeChat Official Account (微信公众号) articles.
Search, list, and scrape articles from WeChat Official Accounts via the MP backend. Convert articles to clean Markdown for archival or further processing.
Install
Or install the complete PKU toolkit:
Setup
[!NOTE] Unlike other PKU tools, this crawler uses WeChat QR login, not IAAA. You need a WeChat account with an Official Account platform access (a test account works).
Usage
Search Official Accounts
# Returns: fakeid list for matching accounts
Options:
-n <count>— Number of results (1-20, default 5)--format table|json— Output format
List Articles
# Or use fakeid directly (skips one search step):
Options:
--begin <offset>— Pagination start--count <n>— Articles per page (default 5, max ~20)-l, --limit <n>— Total articles to fetch across pages--delay-ms <ms>— Random delay between requests (anti-crawler, default 1500)--format table|json|jsonl
Scrape Single Article
Converts the article to clean Markdown, preserving text, links, and image references.
How It Works
The crawler mimics normal user behavior: login → new article → hyperlink panel → search account → list articles. Configurable random delays between requests help bypass anti-crawl risk controls.
Session data (token, fingerprint, bizuin) is stored in ~/.config/info-spider/.
Anti-Crawler Notes
- Use
--delay-msto add random jitter between requests (default 1500ms) - Avoid running long article crawls back-to-back; WeChat may rate-limit or block
- If you hit a block, wait a few hours before retrying
Links
- Repository: github.com/pkuinfo/pkucli
- Full documentation: See the main README
- Flow notes:
docs/wechat-mp-flow.md - Claude Code Skill:
pku-info-spideron Clawhub