Class: HtmlDocument

jala.HtmlDocument(source)

This class provides easy access to the elements of an arbitrary HTML document. By using TagSoup, Dom4J and Jaxen even invalid HTML can be parsed, turned into an object tree and easily be processed with XPath expressions.

Constructor

new HtmlDocument(source)

Construct a new HTML document.
Parameters:
Name Type Description
source String The HTML source code.
Source:
Returns:
A new HTML document.

Methods

getAll(elementName)

Retrieves all elements by name from the document. The returned object structure is compatible for usage in jala.XmlWriter.
Parameters:
Name Type Description
elementName String The name of the desired element
Source:
Returns:
The list of available elements in the document
Get all link elements of the HTML document.
Source:
Returns:
A list of link elements.

scrape(xpathExpr)

Get all document nodes from an XPath expression.
Parameters:
Name Type Description
xpathExpr String An XPath expression.
Source:
Returns:
A list of HTML elements.

toString()

Get a string representation of the HTML document.
Source:
Returns:
A string representation of the HTML document.