Introduction

The following snippet provides a short example of a very simple connector.

[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products")]
public class Product : Entity
{

    [Extraction(DetailsLevel.L, "h3")]
    public string Name;

    [Extraction(DetailsLevel.L, "a", "href")]
    public Uri Url;

}
Every field has been marked with an attribute that specifies a CSS selector, and a ListExtraction attribute defines how to select the various items within the products page.

Connectors can be hand-written, or generated by the automatic analyzer (which can be later improved manually using the editor)

Detail levels

Data can often be retrieved in multiple ways, for example from a results page or from a details page. The same class can be used to capture both of them, using different DetailLevels:
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products")]
public class Product : Entity
{

    [Extraction(DetailsLevel.L, "h3")]
    public string Name;

    [Extraction(DetailsLevel.L, "a", "href")]
    [DetailsUrl(DetailsLevel.D)]
    public Uri Url;

    [Extraction(DetailsLevel.D, "#full_description")]
    public string Description;
}
Multiple list extractions can also be specified (for example, a list page and a search results page), by using DetailsLevel.L2, L3 and so on. Note: the values of the DetailsLevel enumeration are just mnemonic names. While L and D stand for "List" and "Details", they are fully interchangeable.

Entity relationships

It is possible to define one-to-one and one-to-many relationships between entities. In this case for example, we specify that inside a details page (D) of User, there will be a list of products, where the name of the items will be represented by the L2 extraction of Product.

[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products")]
public class Product : Entity
{
    [Extraction(DetailsLevel.L, "a.user", "href")]
    public User PostedBy;

    [Extraction(DetailsLevel.L2, ".smalltext")]
    public string Name;
}

public class User : Entity
{
    [DetailsUrl(DetailsLevel.D)]
    public Uri Url;

    [Inverse("PostedBy")]
    [ListExtraction(".product", DetailsLevel.L2, DetailsLevel.D)]
    public IPagedEnumerable<Product> PostedProducts;
}

Pagination and sorting

Multiple ListExtraction attributes can be specified in order to represent multiple ways the items can be retrieved, including results for particular search queries.
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products/")]
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products/?q={:SearchTerms}")]
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products/?sort=oldest", SortOrder=SortOrder.Oldest)]
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products/best/week", SortOrder=SortOrder.BestLastWeek)]
If every list extraction represents items using the same HTML markup, it is possibly to simply reuse the same DetailsLevel identifier. Pagination can be represented in two different ways:
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products/{@Page}")]
[ListExtraction("#products > li", DetailsLevel.L, "http://www.example.com/products/", NextPageLinkSelector="a[rel='next']")]
In the first case, every URL will be built using the specified model, in the second case we only specify the URL of the first page, and then a CSS selector for the "Next page" link. Other special parameters for the URL include:
  • {@Page}: the page number, 1-based
  • {@Page0}: the page number, 0-based
  • {@Start}: the index of the first item, 1-based
  • {@Start0}: the index of the first item, 0-based
If the parameter ends with a minus sign, the parameter will be omitted for the first page (eg. /products/{@Page-})

If the ListExtraction is applied to a field (and not to the class itself), it is possible to refer to fields of the same class:

public class User : Entity
{
    public int Id;

    [ListExtraction("#userInfo .product", DetailsLevel.L, "http://www.example.com/users/{Id}/products/")]
    public IPagedEnumerable<Product> PostedProducts;
}
public class Product : Entity
{
    [Extraction(DetailsLevel.L, ".product-name")]
    public string Name;
}