Capricorn
Parse html according to configuration
Example:
Parse html, test.yml, more...
last:
selects:
- '.a'
nodes:
last: true
# text_attr_html: # Default text, can be omitted
# text: true
last1:
selects:
- '.aa'
nodes:
last: true
Multi-version regular matching parsing html, regexes_match_parse_html.yml, more...
regexes_match_parse_html:
- regex: 'error'
version: 1
err: "" # Custom error message, return error message directly if the regular expression matches successfully
fields:
last:
selects:
- '.a'
nodes:
last: true
# text_attr_html: # Default text, can be omitted
# text: true
- regex: '.*?'
version: 1
fields:
last:
selects:
- '.a'
nodes:
last: true
# text_attr_html: # Default text, can be omitted
# text: true
last1:
selects:
- '.aa'
nodes:
last: true
text:
selects:
- '.b'
test.html:
Title
first
111111
last
bbb
fffddffeddggdd
fffddffeddggdd
last
bbb
last
bbb
parent
prev
children1
children2
next
Parse html code, more...
let yml = read_file("./test_html/test.yml").unwrap(); let params: parse::HashMapSelectParams = serde_yaml::from_str(&yml).unwrap(); let html = read_file("./test_html/test.html").unwrap(); let r = parse::parse_html(¶ms, &html);
Multi-version regular matching parsing html code, more...
let yml = read_file("./test_html/regexes_match_parse_html.yml").unwrap();
let v: match_html::MatchHtmlVec = serde_yaml::from_str(&yml).unwrap();
let html = read_file("./test_html/test.html").unwrap();
let r = v.regexes_match_parse_html(html)?;