Uninews
Uninews is a universal news smart scraper written in Rust.
It downloads a news article from a given URL, cleans the HTML content, and leverages CloudLLM (via OpenAI) to convert the content into Markdown format with minimal loss.
With its powerful translation capabilities, Uninews can seamlessly translate articles into multiple languages while preserving formatting, making it ideal for multilingual content processing.
The final output (via API) is a JSON object containing the article's title, the Markdown-formatted content (translated if specified), and a featured image URL.
It can be used both as a library and as a command-line tool in Linux, Mac and Windows.
When used as a command-line tool, it outputs the final Markdown with the contents of the news article or blog post in the requested language.
Usage
uninews --help
A universal news scraper for extracting content from various news blogs and news sites.
Usage: uninews [OPTIONS] <URL>
Arguments:
<URL> The URL of the news article to scrape
Options:
-l, --language <LANGUAGE> Optional output language (default: english) [default: english]
-j, --json Output the result as JSON instead of human-readable text
-h, --help Print help
-V, --version Print version
Features
- Scraping & Cleaning: Extracts the main content of a news article by targeting the
<article>tag (or falling back to<body>) and removing unwanted elements. - Markdown Conversion: Uses
Model::GPT54(gpt-5.4) through the CloudLLM Rust API to convert the cleaned HTML content into near-lossless Markdown. - X.com / Twitter Support: Reads individual tweets and full X threads via the X API v2, assembling the thread chronologically before converting it to Markdown.
- Reusable Library: The
universal_scrapefunction is exposed for easy integration into other Rust projects. - Multilanguage Support: The
universal_scrapefunction accepts an optional language parameter to specify the language of the article to scrape, otherwise it defaults to English.
Installation
You need to have Rust and Cargo installed on your system.
If you do have Rust installed, follow these steps:
- Install Uninews:
If you don't have Rust installed, follow these steps to install Rust and build from source:
- Install Rust:
On Unix/macOS:
|
- Verify Installation
- Clone the Project:
- Build & Install the Project:
- Run it in the command line:
# make sure to either export the OPEN_AI_SECRET token before running it
# or you can set it on the same statement and not export it
OPEN_AI_SECRET=sk-xxxxxxxxxxxxxxxxxxxxxxxxxx
X.com / Twitter Support
To read tweets and X threads, set:
X_API_KEYas your X App Consumer KeyX_API_SECRETas your X App Consumer Secret
uninews will exchange them for an app-only bearer token automatically.
You can obtain both values from your X App dashboard under Keys and tokens.
Environment variables for X.com
| Variable | Required | Description |
|---|---|---|
X_API_KEY |
Yes | X App Consumer Key / API Key from the Keys and tokens page. |
X_API_SECRET |
Yes | X App Consumer Secret / API Secret from the same Keys and tokens page. |
UNINEWS_CHROME_USER_DATA_DIR |
No | Chrome user-data directory for the secondary X Article browser fallback, if X withholds the article body from its web GraphQL payload and guest HTML. |
UNINEWS_CHROME_PROFILE_DIR |
No | Chrome profile directory name such as Default or Profile 1, used with UNINEWS_CHROME_USER_DATA_DIR. |
UNINEWS_CHROME_BINARY |
No | Override the Chrome/Chromium executable used for the secondary X Article browser fallback. |
When a URL starts with https://x.com/ or https://twitter.com/, uninews will:
- Extract the tweet ID from the URL.
- Fetch the tweet (and its author info) via the X API v2.
- If the post is only sharing an external article link, follow the expanded article URL and scrape the linked article directly.
- If the post is only sharing an X Article link (
x.com/i/article/...), fetch the article body from X's web GraphQL tweet payload. - Only if X still withholds the article body there, fall back to the linked article URL / browser fallback path.
- Otherwise, attempt to retrieve the full thread from the same author using the recent-search endpoint (covers the last 7 days).
- Sort all thread tweets chronologically (oldest → newest).
- Pass the assembled content through the AI formatter, preserving the scraped article wording and structure as closely as possible.
For x.com/i/article/... links, uninews now first asks X's web GraphQL endpoint for the article title and body text tied to the linking tweet. If X still hides the article body there, uninews will try a local Chrome headless fallback automatically. If X still serves the guest wall, point UNINEWS_CHROME_USER_DATA_DIR at a logged-in Chrome user-data directory and optionally set UNINEWS_CHROME_PROFILE_DIR.
When those variables are set, uninews clones the selected Chrome profile into a temporary directory before launching headless Chrome, so your normal Chrome session can stay open and the live profile lock is not touched.
Example on macOS:
If either X_API_KEY or X_API_SECRET is missing, a clear error message is returned instead of silently failing.
This is not OAuth 1.0a user-context authentication. uninews uses your Consumer Key and Consumer Secret to obtain an OAuth 2.0 app-only bearer token for read-only X API requests.
Command line usage
A universal news scraper for extracting content from various news blogs and newsites.
Usage: uninews [OPTIONS] <URL>
Arguments:
<URL> The URL of the news article to scrape
Options:
-l, --language <LANGUAGE> Optional output language (default: english) [default: english]
-j, --json Output the result as JSON instead of human-readable text
-h, --help Print help
-V, --version Print version
Integrating it with your rust project
uninews requires the OPEN_AI_SECRET environment variable to be set, you can set it in your code before calling the universal_scrape function.
If you've loaded your OPEN_AI_SECRET from a file or some other means, you can set it like this so uninews won't break:
std::env::set_var("OPEN_AI_SECRET", my_open_ai_secret);
use ;
// Scrape the URL and convert its content to Markdown in the requested language.
// By default, uninews uses cloudllm::clients::openai::Model::GPT54 for near-lossless formatting.
let post = universal_scrape.await;
if !post.error.is_empty
// Print the title and Markdown-formatted content.
println!;
Licensed under the MIT License.
Copyright (c) 2026 Ángel León