User Agent Parser
This module implements the browserscope / uap standard for rust, allowing the extraction of various metadata from user agents.
The browserscope standard is data-oriented, with regexes.yaml
specifying the matching and extraction from user-agent strings. This
library implements the maching protocols and provides various types to
make loading the dataset easier, however it does not provide the
data itself, to avoid dependencies on serialization libraries or
constrain loading.
Dataset loading
The crate does not provide any sort of precompiled data file, or
dedicated loader, however [Regexes
] implements
[serde::Deserialize
] and can load a regexes.yaml
file or any
format-preserving conversion thereof (e.g. loading from json or cbor
might be preferred if the application already depends on one of
those):
# let ua_str = "";
let f = std::fs::File::open("regexes.yaml")?;
let regexes: ua_parser::Regexes = serde_yaml::from_reader(f)?;
let extractor = ua_parser::Extractor::try_from(regexes)?;
# Ok::<(), Box<dyn std::error::Error>>(())
All the data-description structures are also Plain Old Data, so they can be embedded in the application directly e.g. via a build script:
let parsers = vec!;
Extraction
The crate provides the ability to either extract individual information sets (user agent — browser, OS, and device) or extract all three in a single call.
The three infosets are are independent and non-overlapping so while the full extractor may be convenient if only one is needed a complete extraction is unnecessary overhead, and the extractors themselves are somewhat costly to create and take up memory.
Complete Extractor
For the complete extractor, it is simply converted from the
[Regexes
] structure. The resulting [Extractor
] embeds all three
module-level extractors as attributes, and [Extractor::extract
]-s
into a 3-uple of ValueRef
s.
Individual Extractors
The individual extractors are in the [user_agent
], [os
], and
[device
] modules, the three modules follow the exact same model:
- a
Parser
struct which specifies individual parser configurations, used as inputs to theBuilder
- a
Builder
, into which the relevant parsers can bepush
-ed - an
Extractor
created from theBuilder
, from which the user canextract
aValueRef
- the
ValueRef
result of data extraction, which may borrow from (and is thus lifetime-bound to) theParser
substitution data and the user agent string it was extracted from - for convenience, an owned
Value
variant of theValueRef
use ;
let e = new
.push?
.push?
.push?
.push?
.push?
.push?
.push?
.build?;
assert_eq!;
assert_eq!;
assert_eq!;
# Ok::
Performances
The package has not been profiled or optimised yet, but it seems rather competitive with uap-cpp (tested on an M1 Pro MBP):
> ./UaParserBench
> ./UaParserBench
> target/release/examples/bench
> target/release/examples/bench