Shaman supports a wide range of CSS selectors, including custom syntaxes for dealing with most of the real-world web pages.

Standard CSS selectors

Selector Description
* All elements
div Elements with the specified tag name
#id Elements with the specified id
.class Elements with the specified class
[attr] Elements with the specified attribute defined
[attr='value'] Elements with the specified attribute name and value
[attr~='word'] Attribute includes the specified word (whitespace-separated)
Elements with attribute not equal to value (or without attribute)
[attr^='prefix'] Attribute starts with 'prefix'
[attr$='suffix'] Attribute ends with 'suffix'
[attr*='search'] Attribute contains 'search'
:first-child Elements that are the first child of their parent
:last-child Elements that are the last child of their parent
:nth-child(n) Elements that are the n-th child of their parent (1-based)
:nth-last-child(n) Elements that are the nth-last-child of their parent (1-based)
:only-child Elements that are the only child of their parent
:empty Elements that have no children
div > p Selects the children of the matched elements
div p Selects the descendant of the matched elements
prev + next Selects all next elements matching "next" that are immediately preceded by a sibling "prev"
prev ~ siblings Selects all sibling elements that follow after the "prev" element, have the same parent, and match the filtering "siblings" selector.
Elements that contain an element that matches the sub-expression
Elements that do not match the specified sub-expression
Elements whose InnerText contains the specified text
Selects the n-th matched element (zero based)

Non-standard selectors

Selector Description
Selects the parent(s) of the matched node(s)
div[attr%='[0-9]*'] Elements whose attr attribute matches the specified regex
span:matches('ab?') Elements whose inner text matches the specified regex
Performs the initial selection at the top level of the search context instead of the descendant nodes.
For example, node.QuerySelector("/:select-parent") == node.ParentNode.
Without the slash, the result would be "the parent of the first descendant", probably not what you want.
body:split-after(hr) Groups the children of <body> into a pseudo-element every time a <hr> is found.
Each <hr> will be the first child of its own group.
Nodes before the first <hr> will be ignored.
Note that the sub-selector (hr) must only match direct children of the context node.
You may want to use body:split-after(/* > hr) to force this behavior (see the previous selector)
body:split-before(hr) Similar to the previous one, except that every <hr> will be the last of its own group.
Nodes after the last <hr> will be ignored.
body:split-between(hr) Similar to the previous one, except that only content between two <hr>s will be included. <hr>s themselves won't be part of the groups.
body:split-all(hr) Similar to the previous one, except that content before the first <hr> and after the last <hr> will be included too.
.main:before(hr) Selects the children of .main preceding the first <hr> child, and groups them into a single pseudo-element (<hr> is excluded).
.main:after(h1) Selects the children of .main following the first <h1> child, and groups them into a single pseudo-element (<h1> is excluded).
.main:between(h1; hr) Selects the children of .main between the first <h1> child and the first following <hr> (possibly the same element), grouping them into a single pseudo-element. <h1> and <hr> are not part of the group. Note the semicolon ( ; ) used to separate the two parameters.
:last Selects the last matched element
:heading-content(h2:contains('Users')) Groups the next siblings of the specified <h2> node into a new pseudo-element, up to the following <h2> or <h1> (if any)
tr:nth-cell(3) Returns the nth cell (zero based) of a table row, taking colspan attributes into account.
li:skip(2) Skips the first 2 matched nodes.
tr:skip-last(2) Skips the last 2 matched nodes.

:split-* selectors

:split-* selectors are often useful to extract items that span across multiple nodes and are not clearly separated from each other. For example:
<div class="main">

    <b>Kate Havnevik</b> -
    <span>Nowhere warm</span><br>

    <b>Poets of the Fall</b> -
    <span>Where do we draw the line</span><br>

We can use .main:split-after(b) as a selector for the ListExtraction attribute. The resulting nodes will be:
    <b>Kate Havnevik</b> -
    <span>Nowhere warm</span><br>
</fizzler-node-group >
    <b>Poets of the Fall</b> -
    <span>Where do we draw the line</span><br>

JSON selectors

It is also possible to navigate JSON structures using selectors. Note however that tag names must be written in lower case, regardless of how they are in the real JSON. Some of the values you need are often found inside data-* attributes or JavaScript event handlers.

The following selectors make it possible to navigate inside the JSON structures of these nodes:

<a href="#" onclick="showDetails({name: 'John Doe', info: {phone: '555-1212'}})">Show details</a>
[Extraction(DetailsLevel.D, "a:json-attr-token('onclick', 'showDetails(') > name")]
public string FirstName;

[Extraction(DetailsLevel.D, "a:json-attr-token('onclick', 'showDetails(') > info > phone")]
public string Phone;
The showDetails( token is searched textually inside the specified attribute, then the JSON code is parsed till its end is detected, and the remaining JavaScript code is ignored. Additionally, script:json-token('var data =') can be used for extracting JSON structures from <script> nodes, and div:json-attr('data-info') when the attribute itself is already a valid JSON structure. If the element directly contains JSON data (eg. ), use script:reparse-json > val

HTML reparsing

Sometimes you might have some HTML code inside an HTML attribute itself, or inside of a JSON string. In this case, you can navigate inside the inner HTML using :reparse-html.
<img title="<div class=popup><span class=votes>5</span></div>" src="/images/1975691.jpg">
img:reparse-html-attr('title') > .votes

{ "myjson": { "description": "<h1 class=title>Introduction</h1>" } }
myjson > description:reparse-html > h1.title