bgpkit_parser/
lib.rs

1/*!
2BGPKIT Parser aims to provide the most ergonomic MRT/BGP/BMP message parsing Rust API.
3
4BGPKIT Parser has the following features:
5- **performant**: comparable to C-based implementations like `bgpdump` or `bgpreader`.
6- **actively maintained**: we consistently introduce feature updates and bug fixes, and support most of the relevant BGP RFCs.
7- **ergonomic API**: a three-line for loop can already get you started.
8- **battery-included**: ready to handle remote or local, bzip2 or gz data files out of the box
9
10# Examples
11
12For complete examples, check out the [examples folder](https://github.com/bgpkit/bgpkit-parser/tree/main/examples).
13
14## Parsing single MRT file
15
16Let's say we want to print out all the BGP announcements/withdrawal from a single MRT file, either located remotely or locally.
17Here is an example that does so.
18
19```no_run
20use bgpkit_parser::BgpkitParser;
21let parser = BgpkitParser::new("http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2").unwrap();
22for elem in parser {
23    println!("{}", elem)
24}
25```
26
27Yes, it is this simple!
28
29You can even do some more interesting iterator operations that are event shorter.
30For example, counting the number of announcements/withdrawals in that file:
31```no_run
32use bgpkit_parser::BgpkitParser;
33let url = "http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2";
34let count = BgpkitParser::new(url).unwrap().into_iter().count();
35println!("total: {}", count);
36```
37
38and it prints out
39```text
40total: 255849
41```
42
43## Parsing multiple MRT files with BGPKIT Broker
44
45[BGPKIT Broker][broker-repo] library provides search API for all RouteViews and RIPE RIS MRT data files. Using the
46broker's Rust API ([`bgpkit-broker`][broker-crates-io]), we can easily compile a list of MRT files that we are interested
47in for any time period and any data type (`update` or `rib`). This allows users to gather information without needing to
48know about the locations of specific data files.
49
50[broker-repo]: https://github.com/bgpkit/bgpkit-broker
51[broker-crates-io]: https://crates.io/crates/bgpkit-broker
52
53The example below shows a relatively more interesting example that does the following:
54- find all BGP archive data created on time 1634693400
55- filter to only BGP updates files
56- find all announcements originated from AS13335
57- print out the total count of the announcements
58
59```no_run
60use bgpkit_parser::{BgpkitParser, BgpElem};
61
62let broker = bgpkit_broker::BgpkitBroker::new()
63    .ts_start("1634693400")
64    .ts_end("1634693400")
65    .page(1);
66
67for item in broker.into_iter().take(2) {
68    log::info!("downloading updates file: {}", &item.url);
69    let parser = BgpkitParser::new(item.url.as_str()).unwrap();
70
71    log::info!("parsing updates file");
72    // iterating through the parser. the iterator returns `BgpElem` one at a time.
73    let elems = parser
74        .into_elem_iter()
75        .filter_map(|elem| {
76            if let Some(origins) = &elem.origin_asns {
77                if origins.contains(&13335.into()) {
78                    Some(elem)
79                } else {
80                    None
81                }
82            } else {
83                None
84            }
85        })
86        .collect::<Vec<BgpElem>>();
87    log::info!("{} elems matches", elems.len());
88}
89```
90
91## Filtering BGP Messages
92
93BGPKIT Parser also has a built-in [Filter] mechanism. When creating a new [`BgpkitParser`] instance,
94once can also call `add_filter` function to customize the parser to only show matching messages
95when iterating through [BgpElem]s.
96
97For all types of filters, check out the [Filter] enum documentation.
98
99```no_run
100use bgpkit_parser::BgpkitParser;
101
102/// This example shows how to parse an MRT file and filter by prefix.
103env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
104
105log::info!("downloading updates file");
106
107// create a parser that takes the buffered reader
108let parser = BgpkitParser::new("http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2").unwrap()
109    .add_filter("prefix", "211.98.251.0/24").unwrap();
110
111log::info!("parsing updates file");
112// iterating through the parser. the iterator returns `BgpElem` one at a time.
113for elem in parser {
114    log::info!("{}", &elem);
115}
116log::info!("done");
117```
118
119
120## Parsing Real-time Data Streams
121
122BGPKIT Parser also provides parsing functionalities for real-time data streams, including [RIS-Live][ris-live-url]
123and [BMP][bmp-rfc]/[OpenBMP][openbmp-url] messages. See the examples below and the documentation for more.
124
125### Parsing Messages From RIS-Live
126
127Here is an example of handling RIS-Live message streams. After connecting to the websocket server,
128we need to subscribe to a specific data stream. In this example, we subscribe to the data stream
129from on collector (`rrc21`). We can then loop and read messages from the websocket.
130
131```no_run
132# #[cfg(feature = "rislive")]
133use bgpkit_parser::parse_ris_live_message;
134use serde_json::json;
135use tungstenite::{connect, Message};
136
137const RIS_LIVE_URL: &str = "ws://ris-live.ripe.net/v1/ws/?client=rust-bgpkit-parser";
138
139/// This is an example of subscribing to RIS-Live's streaming data from one host (`rrc21`).
140///
141/// For more RIS-Live details, check out their documentation at https://ris-live.ripe.net/manual/
142fn main() {
143    // connect to RIPE RIS Live websocket server
144    let (mut socket, _response) =
145        connect(RIS_LIVE_URL)
146            .expect("Can't connect to RIS Live websocket server");
147
148    // subscribe to messages from one collector
149    let msg = json!({"type": "ris_subscribe", "data": {"host": "rrc21"}}).to_string();
150    socket.send(Message::Text(msg)).unwrap();
151
152    loop {
153        let msg = socket.read().expect("Error reading message").to_string();
154#       #[cfg(feature = "rislive")]
155        if let Ok(elems) = parse_ris_live_message(msg.as_str()) {
156            for elem in elems {
157                println!("{}", elem);
158            }
159        }
160    }
161}
162```
163
164### Parsing OpenBMP Messages From RouteViews Kafka Stream
165
166[RouteViews](http://www.routeviews.org/routeviews/) provides a real-time Kafka stream of the OpenBMP
167data received from their collectors. Below is a partial example of how we handle the raw bytes
168received from the Kafka stream. For full examples, check out the [examples folder on GitHub](https://github.com/bgpkit/bgpkit-parser/tree/main/examples).
169
170```ignore
171let bytes = m.value;
172let mut reader = Cursor::new(Vec::from(bytes));
173let header = parse_openbmp_header(&mut reader).unwrap();
174let bmp_msg = parse_bmp_msg(&mut reader);
175match bmp_msg {
176    Ok(msg) => {
177        let timestamp = header.timestamp;
178        let per_peer_header = msg.per_peer_header.unwrap();
179        match msg.message_body {
180            MessageBody::RouteMonitoring(m) => {
181                for elem in Elementor::bgp_to_elems(
182                    m.bgp_message,
183                    timestamp,
184                    &per_peer_header.peer_ip,
185                    &per_peer_header.peer_asn
186                )
187                {
188                    info!("{}", elem);
189                }
190            }
191            _ => {}
192        }
193    }
194    Err(_e) => {
195        let hex = hex::encode(bytes);
196        error!("{}", hex);
197        break
198    }
199}
200```
201
202[ris-live-url]: https://ris-live.ripe.net
203[bmp-rfc]: https://datatracker.ietf.org/doc/html/rfc7854
204[openbmp-url]: https://www.openbmp.org/
205
206### Archive filtered MRT records to a new MRT file on disk
207
208The example will download one MRT file from RouteViews, filter out all the BGP messages that
209are not originated from AS3356, and write the filtered MRT records to disk. Then it re-parses the
210filtered MRT file and prints out the number of BGP messages.
211
212```no_run
213use bgpkit_parser::Elementor;
214use itertools::Itertools;
215use std::io::Write;
216
217let mut updates_encoder = bgpkit_parser::encoder::MrtUpdatesEncoder::new();
218
219bgpkit_parser::BgpkitParser::new(
220    "http://archive.routeviews.org/bgpdata/2023.10/UPDATES/updates.20231029.2015.bz2",
221).unwrap()
222    .add_filter("origin_asn", "3356").unwrap()
223    .into_iter()
224    .for_each(|elem| {
225        updates_encoder.process_elem(&elem);
226    });
227
228let mut mrt_writer = oneio::get_writer("as3356_mrt.gz").unwrap();
229mrt_writer.write_all(updates_encoder.export_bytes().as_ref()).unwrap();
230drop(mrt_writer);
231```
232
233# Command Line Tool
234
235`bgpkit-parser` is bundled with a utility commandline tool `bgpkit-parser-cli`.
236
237## Installation
238
239### Install compiled binaries
240
241You can install the compiled `bgpkit-parser` CLI binaries with the following methods:
242- **Homebrew** (macOS): `brew install bgpkit/tap/bgpkit-parser`
243- [**Cargo binstall**](https://github.com/cargo-bins/cargo-binstall): `cargo binstall bgpkit-parser`
244
245### From source
246
247You can install the tool by running
248```bash
249cargo install bgpkit-parser --features cli
250```
251or checkout this repository and run
252```bash
253cargo install --path . --features cli
254```
255
256## Usage
257
258Run `bgpkit-parser --help` to see the full list of options.
259
260```text
261MRT/BGP/BMP data processing library
262
263Usage: bgpkit-parser [OPTIONS] <FILE>
264
265Arguments:
266  <FILE>  File path to a MRT file, local or remote
267
268Options:
269  -c, --cache-dir <CACHE_DIR>    Set the cache directory for caching remote files. Default behavior does not enable caching
270      --json                     Output as JSON objects
271      --psv                      Output as full PSV entries with header
272      --pretty                   Pretty-print JSON output
273  -e, --elems-count              Count BGP elems
274  -r, --records-count            Count MRT records
275  -o, --origin-asn <ORIGIN_ASN>  Filter by origin AS Number
276  -p, --prefix <PREFIX>          Filter by network prefix
277  -4, --ipv4-only                Filter by IPv4 only
278  -6, --ipv6-only                Filter by IPv6 only
279  -s, --include-super            Include super-prefix when filtering
280  -S, --include-sub              Include sub-prefix when filtering
281  -j, --peer-ip <PEER_IP>        Filter by peer IP address
282  -J, --peer-asn <PEER_ASN>      Filter by peer ASN
283  -m, --elem-type <ELEM_TYPE>    Filter by elem type: announce (a) or withdraw (w)
284  -t, --start-ts <START_TS>      Filter by start unix timestamp inclusive
285  -T, --end-ts <END_TS>          Filter by end unix timestamp inclusive
286  -a, --as-path <AS_PATH>        Filter by AS path regex string
287  -h, --help                     Print help
288  -V, --version                  Print version
289
290```
291
292# Data Representation
293
294There are two key data structures to understand for the parsing results: [MrtRecord] and [BgpElem].
295
296## `MrtRecord`: unmodified MRT information representation
297
298The [MrtRecord] is the data structure that holds the unmodified, complete information parsed from the MRT data file.
299
300```ignore
301pub struct MrtRecord {
302    pub common_header: CommonHeader,
303    pub message: MrtMessage,
304}
305
306pub enum MrtMessage {
307    TableDumpMessage(TableDumpMessage),
308    TableDumpV2Message(TableDumpV2Message),
309    Bgp4Mp(Bgp4Mp),
310}
311```
312
313`MrtRecord` record representation is concise, storage efficient, but often less convenient to use. For example, when
314trying to find out specific BGP announcements for certain IP prefix, we often needs to go through nested layers of
315internal data structure (NLRI, announced, prefix, or even looking up peer index table for Table Dump V2 format), which
316could be irrelevant to what users really want to do.
317
318## [BgpElem]: per-prefix BGP information, MRT-format-agnostic
319
320To facilitate simpler data analysis of BGP data, we defined a new data structure called [BgpElem] in this crate. Each
321[BgpElem] contains a piece of self-containing BGP information about one single IP prefix.
322For example, when a bundled announcement of three prefixes P1, P2, P3 that shares the same AS path is processed, we break
323the single record into three different [BgpElem] objects, each presenting a prefix.
324
325```ignore
326pub struct BgpElem {
327    pub timestamp: f64,
328    pub elem_type: ElemType,
329    pub peer_ip: IpAddr,
330    pub peer_asn: Asn,
331    pub prefix: NetworkPrefix,
332    pub next_hop: Option<IpAddr>,
333    pub as_path: Option<AsPath>,
334    pub origin_asns: Option<Vec<Asn>>,
335    pub origin: Option<Origin>,
336    pub local_pref: Option<u32>,
337    pub med: Option<u32>,
338    pub communities: Option<Vec<Community>>,
339    pub atomic: Option<AtomicAggregate>,
340    pub aggr_asn: Option<Asn>,
341    pub aggr_ip: Option<IpAddr>,
342}
343```
344
345The main benefit of using [BgpElem] is that the analysis can be executed on a per-prefix basis, generic to what the
346backend MRT data format (bgp4mp, tabledumpv1, tabledumpv2, etc.). The obvious drawback is that we will have to duplicate
347information to save at each elem, that consuming more memory.
348
349# RFCs Support
350
351We support most of the RFCs and plan to continue adding support for more recent RFCs in the future.
352Here is a list of relevant RFCs that we support or plan to add support.
353
354If you would like to see any specific RFC's support, please submit an issue on GitHub.
355
356## BGP
357
358- [X] [RFC 2042](https://datatracker.ietf.org/doc/html/rfc2042): Registering New BGP Attribute Types
359- [X] [RFC 3392](https://datatracker.ietf.org/doc/html/rfc3392): Capabilities Advertisement with BGP-4
360- [X] [RFC 4271](https://datatracker.ietf.org/doc/html/rfc4271): A Border Gateway Protocol 4 (BGP-4)
361- [X] [RFC 4724](https://datatracker.ietf.org/doc/html/rfc4724): Graceful Restart Mechanism for BGP
362- [X] [RFC 4456](https://datatracker.ietf.org/doc/html/rfc4456): BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)
363- [X] [RFC 5065](https://datatracker.ietf.org/doc/html/rfc5065): Autonomous System Confederations for BGP
364- [X] [RFC 6793](https://datatracker.ietf.org/doc/html/rfc6793): BGP Support for Four-Octet Autonomous System (AS) Number Space
365- [X] [RFC 7911](https://datatracker.ietf.org/doc/html/rfc7911): Advertisement of Multiple Paths in BGP (ADD-PATH)
366- [ ] [RFC 8950](https://datatracker.ietf.org/doc/html/rfc8950): Advertising IPv4 Network Layer Reachability Information (NLRI) with an IPv6 Next Hop
367- [X] [RFC 9072](https://datatracker.ietf.org/doc/html/rfc9072): Extended Optional Parameters Length for BGP OPEN Message Updates
368- [X] [RFC 9234](https://datatracker.ietf.org/doc/html/rfc9234):  Route Leak Prevention and Detection Using Roles in UPDATE and OPEN Messages
369
370## MRT
371
372- [X] [RFC 6396](https://datatracker.ietf.org/doc/html/rfc6396): Multi-Threaded Routing Toolkit (MRT) Routing Information Export Format
373- [ ] [RFC 6397](https://datatracker.ietf.org/doc/html/rfc6397): Multi-Threaded Routing Toolkit (MRT) Border Gateway Protocol (BGP) Routing Information Export Format with Geo-Location Extensions
374- [X] [RFC 8050](https://datatracker.ietf.org/doc/html/rfc8050): Multi-Threaded Routing Toolkit (MRT) Routing Information Export Format with BGP Additional Path Extensions
375
376## BMP
377
378- [X] [RFC 7854](https://datatracker.ietf.org/doc/html/rfc7854): BGP Monitoring Protocol (BMP)
379- [X] [RFC 8671](https://datatracker.ietf.org/doc/html/rfc8671): Support for Adj-RIB-Out in the BGP Monitoring Protocol (BMP)
380- [X] [RFC 9069](https://datatracker.ietf.org/doc/html/rfc9069): Support for Local RIB in the BGP Monitoring Protocol (BMP)
381
382## Communities
383
384We support normal communities, extended communities, and large communities.
385
386- [X] [RFC 1977](https://datatracker.ietf.org/doc/html/rfc1977): BGP Communities Attribute
387- [X] [RFC 4360](https://datatracker.ietf.org/doc/html/rfc4360): BGP Extended Communities Attribute
388- [X] [RFC 5668](https://datatracker.ietf.org/doc/html/rfc5668): 4-Octet AS Specific BGP Extended Community
389- [X] [RFC 5701](https://datatracker.ietf.org/doc/html/rfc5701): IPv6 Address Specific BGP Extended Community Attribute
390- [X] [RFC 7153](https://datatracker.ietf.org/doc/html/rfc7153): IANA Registries for BGP Extended Communities Updates 4360, 5701
391- [X] [RFC 8097](https://datatracker.ietf.org/doc/html/rfc8097): BGP Prefix Origin Validation State Extended Community
392- [X] [RFC 8092](https://datatracker.ietf.org/doc/html/rfc8092): BGP Large Communities
393
394## FlowSpec
395
396- [ ] [RFC 8955](https://datatracker.ietf.org/doc/html/rfc8955) Dissemination of Flow Specification Rules
397- [ ] [RFC 8956](https://datatracker.ietf.org/doc/html/rfc8956) Dissemination of Flow Specification Rules for IPv6
398- [ ] [RFC 9117](https://datatracker.ietf.org/doc/html/rfc9117) Revised Validation Procedure for BGP Flow Specifications Updates 8955
399
400# Built with ❤️ by BGPKIT Team
401
402<a href="https://bgpkit.com"><img src="https://bgpkit.com/Original%20Logo%20Cropped.png" alt="https://bgpkit.com/favicon.ico" width="200"/></a>
403*/
404
405#![doc(
406    html_logo_url = "https://raw.githubusercontent.com/bgpkit/assets/main/logos/icon-transparent.png",
407    html_favicon_url = "https://raw.githubusercontent.com/bgpkit/assets/main/logos/favicon.ico"
408)]
409#![allow(clippy::new_without_default)]
410#![allow(clippy::needless_range_loop)]
411
412#[cfg(feature = "parser")]
413pub mod encoder;
414#[cfg(feature = "parser")]
415pub mod error;
416pub mod models;
417#[cfg(feature = "parser")]
418pub mod parser;
419
420pub use models::BgpElem;
421pub use models::MrtRecord;
422#[cfg(feature = "parser")]
423pub use parser::*;