Module chewdata::connector::bucket_select
source · Expand description
Filter data file with S3 select queries and read data into AWS/Minio bucket. Use Bucket connector in order to write into the bucket.
§Configuration
key | alias | Description | Default Value | Possible Values |
---|---|---|---|---|
type | - | Required in order to use this connector | bucket | bucket |
metadata | meta | Override metadata information | null | crate::Metadata |
endpoint | - | Endpoint of the connector | null | String |
access_key_id | - | The access key used for the authentification | null | String |
secret_access_key | - | The secret access key used for the authentification | null | String |
region | - | The bucket’s region | us-east-1 | String |
bucket | - | The bucket name | null | String |
path | key | The path of the resource. Can use * in order to read multiple files with the same content type | null | String |
parameters | params | The parameters used to remplace variables in the path | null | Object or Array of objects |
query | - | S3 select query | select * from s3object | See AWS S3 select |
limit | - | Limit the number of files to read with the wildcard mode in the path | null | Unsigned number |
skip | - | Skip N files before to start to read the next files with the wildcard mode in the path | null | Unsigned number |
§Examples
[
{
"type": "r",
"connector": {
"type": "bucket_select",
"bucket": "my-bucket",
"path": "data/my_file.jsonl",
"endpoint": "{{ BUCKET_ENDPOINT }}",
"access_key_id": "{{ BUCKET_ACCESS_KEY_ID }}",
"secret_access_key": "{{ BUCKET_SECRET_ACCESS_KEY }}",
"region": "{{ BUCKET_REGION }}",
"query": "select * from s3object[*].results[*] r where r.number = 20"
},
"document" : {
"type": "jsonl"
}
}
]