# Scraping cheatsheet
`xan` scraping language should be very reminiscent of CSS/SCSS syntax, as
it follows the same selection principles (it is probably useful, when
using --evaluate-file, to save your scrapers on disk using the `.css`,
`.sass` or `.scss`, extension to get proper syntax highlighting).
This language is able to:
1. perform complex element selection using nested CSS selectors
and/or custom expressions
2. to extract and process data from selected elements
For instance, here is a simple example selecting links contained in a
h2 tag:
```scss
h2 > a {
title: text;
url: attr("href");
}
```
The above scraper will extract a "title" column containing the text
of selected tag and a "url" column containing its "href" attribute value.
Each inner directive is understood as:
`<column-name>: <extractor-function>;`
A full list of extractor functions can be found at the end of this help.
And processing using a moonblade expression taking `value` as the extractor
function's output value is also possible:
```scss
h2 > a {
title: text, lower(value)[10:];
url: attr("href");
}
```
In which case, inner directives will be understood as:
`<column-name>: <extractor-function>, <processing-expression>;`
Multiple selection rules can be given per scraper, like in a CSS stylesheet:
```scss
[data-id=45] {
title: text;
}
script[type="application/ld+json"] {
data: json_ld("NewsArticle");
}
```
Selections can be nested:
```scss
.main-content {
h2 {
title: text;
}
a.main-link {
url: attr("href");
}
}
```
Selection can use expressions to navigate freely through the DOM (see
a comprehensive list of all selector functions at the end of this help):
```scss
first("h2", containing="Summary").parent() {
title: text;
}
main > p {
all("a") {
urls: attr("href");
}
}
```
`:scope` or `&` can be used to ease nested selection (see how we are able
to select direct children of `main > p`):
```scss
main > p {
& > a {
url: attr("href");
}
}
```
`:scope` and `&` are also useful when using `xan scrape --foreach`
because we sometimes need a way to select from the scope of an already
selected element.
The following example assumes we gave --foreach "h2 > a" to `xan scrape`:
```scss
& {
title: text;
url: attr("href");
}
```
For more examples of real-life scrapers, check out this link:
https://github.com/medialab/xan/tree/master/docs/scrapers