Often URLs alone are not enough to retrieve the data you need from a web server.

Metaparameters are inserted in the fragment (#) of the URL, and make it possible to specify additional information.

Note: a tool is included to easily create simple and concise meta-URLs starting from the text of a raw HTTP request.

JavaScript evaluation

By default, Shaman does not execute the JavaScript code it finds in the pages. However, you can ask Shaman to load the page into a real, headless browser, and let it run its JavaScript code. When the page is loaded, a snapshot of the HTML is captured from the document DOM, and extraction proceeds as usual. In order to enable JavaScript evaluation, simply add the $js=1 metaparameter. This will preprocess the page using a headless WebKit engine, and wait for any XMLHttpRequest to complete. In the current version, it is not possible to specify additional metaparameters (such as $cookie- and $post- if you enable JavaScript).

POST parameters

It is possible to perform a POST using $post-fieldName, for example:

http://www.example.com/ajax/req.php#$post-action=comments&$post-article={Id}

In the case of JSON fields, you can split the JSON structure into multiple metaparameters: for example, the following POST request:

POST /req.php

id=5&
user={"name": "John Doe", "info": {"phone": "555-1212"}}
can be written as
http://www.example.com/req.php#
    $post-id=5&
    $json-post-user.name=John+Doe&
    $json-post-user.info.phone=555-1212

Values will be reserialized as strings by default, unless you add a tilde (~) at the end of the field name: $json-post-example.all~=false.

Regardless of the kind of response (eg. HTML, JSON or XML), the resulting node is still navigable using the CSS selectors syntax, where the JSON field names become element names. Array items are represented by item pseudo-elements.

Cookies

Additional cookies can be specified using $cookie-cookieName metaparameters, for example:

http://www.example.com/products#$cookie-haspicture=1

Method

By default, GET is used unless a $(json-)post-* parameter is present (in this case POST is used). You can force a particular method by using $method=PUT

Headers

Additional HTTP headers can be specified using $header-Header-Name metaparameters, for example:

http://www.example.com/products#$header-X-Requested-With=XMLHttpRequest

Additional metaparameters

$json-token=onload%3D"initializePage( returns the JSON data following the specified JavaScript code.

$allow-redir=0 allows (1) or forbids (0) redirects. The default behavior is to allow (and follow) redirects.

$timeout=8000 specifies a custom amount of time to wait before timing out the request (in milliseconds).

$formbutton=input[type='submit'] performs a form post before returning the result.

$assert-selector=h1:contains('Results') ensures that the returned page contains at least one element that matches the specified selector. Otherwise, an error is thrown.

$forbid-selector=.overquota ensures that no elements in the page match the specified selector. Otherwise, an error is thrown.

$response-encoding=ISO-8859-1 ignores the encoding suggested by the web server and uses the specified one when reading the response.

$content-type=application/json ignores the content type suggested by the web server and uses the specified one when parsing the response. In most cases, this is unnecessary and Shaman will parse JSON even if it's returned as text/html

$assume-text=1 specifies that no attempts must be made to parse the response as JSON, HTML or XML. The returned object will be an HTML document with a single, unparsed text element.

$assume-html=1 specifies that the response should be parsed as HTML, even if it does not look like something other than HTML.

$json-wrapped-html=response specifies that the actual response will be wrapped inside of a JSON object, and a selector is provided to preprocess the page:

{ "response": "<div class=results>...</div>" }

$content-type=text/html overrides the response content type returned by the server.