Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

KRL makes extensive use of jQuery selectors in actions to position, modify and insert elements into web pages. The query() operator allows jQuery selectors to be used inside a KRL ruleset to extract data from Web pages. (This is often called screen scraping.)
The query() operator works on HTML strings or on arrays of HTML strings. The HTML is usually loaded by the ruleset using a dataset declaration:
dataset r_html:HTML <- "http://www.htmldog.com/examples/darwin.html"
dataset q_data:HTML <- "http://www.htmldog.com/examples/tablelayout1.html"
The :HTML after the name of the dataset is a hint to KRL that it can skip the JSON parsing stage that is the default when reading data sets.
The query() operator takes an argument that is a jQuery selector string, a comma-separated jQuery string, or an array of jQuery selector strings. query()supports only a subset of the jQuery selectors for now:

  • element
  • #id
  • .class
  • [attr]
  • [attr=value]

I'll describe the use of each of these in the sections that follow.

element

An element selector is denoted by the element name. An element selector matches all elements of a particular type. For example:
r_html.query("h1"); // returns an array of all <h1> elements
r_html.query("caption,h1");
//returns an array of all <caption> or <h1> elements

#id

An #id selector, denoted by the ID value with the pound sign (#) prepended, matches all elements with a specific ID. For example:
q_html.query("#c_link");
// returns array of all elements like <... id="c_link">
#id selectors can be compounded with element selectors as follows:
q_html.query("a#c_link");
// returns array of all elements like <a id="c_link">

.class

A .class selector, denoted by the class value with a period (.) prepended, matches all elements with a specific class. For example:
q_html.query(".header");
// returns array of elements like <... class="header">
Again, .class selectors can be compounded with element selectors as follows:
q_html.query("p.header");
// returns array of elements like <p class="header">
Or you can combine #id and .class selectors:
q_html.query("#c_link.header");
// returns array of elements like <... id="c_link" class="header">

[attr]

An [attr], or attribute, selector is denoted by the attribute name enclosed in square brackets. The [attr] selector matches all elements with an attribute, even if the attribute value is empty.
q_html.query("[style]");
// returns array of elements like <... style="...">
Combinations of [attr]and other selectors work as you'd expect:
q_html.query("td[style]");
// returns array of elements like <td style="...">

[attr=value]

An [attr=value], or attribute value, selector is denoted by the attribute name and value (as they would appear in the HTML) enclosed in square brackets. The [attr=value] selector matches all elements with an attribute set to a specific value.
q_html.query("[align=center]");
// returns array of elements like <... align="center">
q_html.query("td[align=center]");
// returns array of elements like <td align="center">
Again, you can combine more than one [attr=value] specification:
q_html.query("[align=center][colspan=2]");
// returns array of elements like <... align="center" colspan="2">

Multiple Selectors

You can stack selectors. The examples you've seen had no spaces and thus selected a single element with all the required elements and attributes. If you put spaces between the selectors, they select separate, nested elements matching the specification. For example:
q_html.query("div#header span p[align=center]");
// returns array of elements like
// <div id="header">...<span>...<p align="center"/>...</span>...</div>
If query() is applied to an array of strings, the selector will be applied to each array element:
html_arr = [q_html,r_html];
combo_arr = html_arr.query("a");
// returns array of elements like <a> from both q_html and r_html
You can join multiple selectors together as one string separated by commas or as an array of selector strings:
r_html.query("caption,h1");
r_html.query(["caption","h1"]);
Note that these are different from the following:
r_html.query(["caption h1"]);
The former expressions match either <caption> or <h1>, whereas the latter matches <h1> elements enclosed within <caption> elements.
query() will return an empty array if no HTML matched the selector or the selector syntax was wrong.

  • No labels