popgetter-cli
Library and associated command-line application for exploring and fetching popgetter data.
Quickstart
- Install Rust
- Install CLI:
- Run the CLI with e.g.:
Examples
List countries with countries subcommand
Get a list of available data:
Searching metadata with metrics subcommand
Summarising and specific metadata fields
Get a summary of all data:
Get a summary of data for a given country:
Get the list of metadata fields:
Get a list of geometry levels for a given country:
Searching metrics
An example search using a regex for search text combined with a given country and geometry level:
Downloading data
An example search using a regex for search text combined with a given country and geometry level:
where the --dev flag is used here to enable output with CRS transformed to EPSG:4326 since all data is provided here in EPSG:4326.
Downloading data with recipes
Recipe files provide an alternative to using the command line flags. An example recipe can be downloaded with:
LLM integration (experimental)
It is possible to also search and generate data requests supported by LLMs.
The below steps are required for this experimental functionality implemented in the popgetter-llm crate.
- Install with
llmfeature:
- Set-up two Azure LLM endpoints for:
- Text embeddings (
text-embedding-3-small) - Text generation (
gpt-4o)
- Text embeddings (
- Assign the API key for the two endpoints to the following environment variable, with e.g.:
Note: currently only Azure endpoints are supported.
-
Install and run Docker
-
Initialize the Qdrant database:
-
Construct the database with embeddings derived from metadata using the popgetter CLI:
This process will take several hours to run and will construct the Qdrant database for all the metadata (around 3GB total size).
-
With the database populated, search queries can be performed using the embeddings to:
- Return search results based on embedding similarity
- Generate a data request specifications directly from the query
-
For search results based on embedding similarity, e.g.:
- With
output-formatset to--output-format SearchResultsToRecipe, the metric IDs from the search results are included in a recipe:
- With
output-formatset to--output-format DataRequestSpec, the data request specification is produced directly from the search results through a second prompt:
RUST_LOG=info
Note: This output format is highly experimental and may produce incorrect data request specifications.