# Databse schema for an unnamed crawler
## Datastrures
### Origin
An origin is used tp uniqely identify a webservice, it consists of a schme (usually http or https) a domain name and a port number (it should be useful for your own private services too)
## Databse
### Origins
* origin_id
* schema
* domain_name
* port
### Ratelimit
Maps an origin to ratelimit information (crawl delay, last request)
### Requests
Contains information on ongoing and completed requests
* request_id
* worker_id
* origin_id
* url
* result (unreachable, timeout, request successful)
* time_request_sent
* request_duration
* comand_id
### Commands
* command_id
* url
* command (check, discover, index, preview, robotstxt, …)
* causal_parent_command_id
* causal_parent_request_id
* time_requested
* time_finished
* requesting_worker_id
* executing_worker_id
* status (waiting, running, finished, failed, …)
### File Results
* file_id
* http_status_code //or http equivalent
* request_id
* mimetype
* filesize
* date_fetched
* canonical_url
* date_created
* date_last_modified
### Events
* file_id
* date
* event_type (file_updated, file_published, date_of_event_represented_by_file)
### Content
* file_id
* index // number that orders the entries by occourrance
* text
* context (body, article, main, footer, header, metadata)
* element_type (title, description, headline, paragraph)
* element_level
### Links
* file_id
*