Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

The query() operator is overloaded and has two functions: 

Querying Persistent Variables

The query() operator can be used to search structured persistent variables. The syntax is

<persistent variable>.query(<hashpath>, {
  'requires' : <join operator>,
  'conditions : [
     { 'search_key' : <path_to_field>,
       'operator' : <mongo $ comparison op>,
       'value' : <value> },
     ...
  ]
  },| <extended result>
  );

<hashpath> will be the empty array, [], if the key is at root.

search_key is the path from the <hashpath> to the field that you want to compare.  If that path does not exist for an entry, it will not be considered even as a null

The operation is to search the given path in each value in the hash for those that meet the comparison test given by the operator and value. Multiple conditions can be given and they are joined using with a $and or $or. 

For example to do a twitter timeline search where the entries have been assigned a unique key to transform the array into a hash:

ent:tweets.query([],{
  'requires' :  '$and', 
  'conditions'   : [
    {
      'search_key' : ['retweeted_status', 'favorite_count'],
      'operator' : '$gt',
      'value' : 5
    },
    {
      'search_key' : ['retweeted_status', 'favorite_count'],
      'operator' : '$lt',
      'value' : 200
    }
  ]})

This will return an array of <hashpaths> (array of arrays) that is essentially an array of the keys to the matching values

[
   [     'a32'   ],
   [     'a31'   ],
   [     'a30'   ]
]

You can then use the values to get the entire element with ent:tweets{[ 'a32' ]}.

 if the third argument to query() is not null, the actual result, rather than a path to it will be returned. 

The following function from Fuse uses query() to return trips by their end date, given a start and end date. 

trips = function(start, end){
  utc_start = common:convertToUTC(start);
  utc_end = common:convertToUTC(end);
      
  ent:trip_summaries.query([], 
      { 
       'requires' : '$and',
       'conditions' : [
          { 
     	   'search_key' : [ 'endWaypoint', 'timestamp'],
       	   'operator' : '$gte',
       	   'value' : utc_start 
	      },
     	  {
       	   'search_key' : [ 'endWaypoint', 'timestamp' ],
       	   'operator' : '$lte',
       	   'value' : utc_end 
	      }
	   ]},
	"return_values"
  )
};

Note that KRL stores data in persistent variables in a "flat format" and therefore, all comparisons are string or numeric comparisons. The query() operator doesn't know about dateTime and other special formats for values. 

Querying HTML 

KRL makes extensive use of jQuery selectors in actions to position, modify and insert elements into web pages. The query() operator allows jQuery selectors to be used inside a KRL ruleset to extract data from Web pages. (This is often called screen scraping.)
The query() operator works on HTML strings or on arrays of HTML strings. The HTML is usually loaded by the ruleset using a dataset declaration:
dataset r_html:HTML <- "http://www.htmldog.com/examples/darwin.html"
dataset q_data:HTML <- "http://www.htmldog.com/examples/tablelayout1.html"
The :HTML after the name of the dataset is a hint to KRL that it can skip the JSON parsing stage that is the default when reading data sets.
The query() operator takes an argument that is a jQuery selector string, a comma-separated jQuery string, or an array of jQuery selector strings. query()supports only a subset of the jQuery selectors for now:

  • element
  • #id
  • .class
  • [attr]
  • [attr=value]

I'll describe the use of each of these in the sections that follow.

element

An element selector is denoted by the element name. An element selector matches all elements of a particular type. For example:
r_html.query("h1"); // returns an array of all <h1> elements
r_html.query("caption,h1");
//returns an array of all <caption> or <h1> elements

#id

An #id selector, denoted by the ID value with the pound sign (#) prepended, matches all elements with a specific ID. For example:
q_html.query("#c_link");
// returns array of all elements like <... id="c_link">
#id selectors can be compounded with element selectors as follows:
q_html.query("a#c_link");
// returns array of all elements like <a id="c_link">

.class

A .class selector, denoted by the class value with a period (.) prepended, matches all elements with a specific class. For example:
q_html.query(".header");
// returns array of elements like <... class="header">
Again, .class selectors can be compounded with element selectors as follows:
q_html.query("p.header");
// returns array of elements like <p class="header">
Or you can combine #id and .class selectors:
q_html.query("#c_link.header");
// returns array of elements like <... id="c_link" class="header">

[attr]

An [attr], or attribute, selector is denoted by the attribute name enclosed in square brackets. The [attr] selector matches all elements with an attribute, even if the attribute value is empty.
q_html.query("[style]");
// returns array of elements like <... style="...">
Combinations of [attr]and other selectors work as you'd expect:
q_html.query("td[style]");
// returns array of elements like <td style="...">

[attr=value]

An [attr=value], or attribute value, selector is denoted by the attribute name and value (as they would appear in the HTML) enclosed in square brackets. The [attr=value] selector matches all elements with an attribute set to a specific value.
q_html.query("[align=center]");
// returns array of elements like <... align="center">
q_html.query("td[align=center]");
// returns array of elements like <td align="center">
Again, you can combine more than one [attr=value] specification:
q_html.query("[align=center][colspan=2]");
// returns array of elements like <... align="center" colspan="2">

Multiple Selectors

You can stack selectors. The examples you've seen had no spaces and thus selected a single element with all the required elements and attributes. If you put spaces between the selectors, they select separate, nested elements matching the specification. For example:
q_html.query("div#header span p[align=center]");
// returns array of elements like
// <div id="header">...<span>...<p align="center"/>...</span>...</div>
If query() is applied to an array of strings, the selector will be applied to each array element:
html_arr = [q_html,r_html];
combo_arr = html_arr.query("a");
// returns array of elements like <a> from both q_html and r_html
You can join multiple selectors together as one string separated by commas or as an array of selector strings:
r_html.query("caption,h1");
r_html.query(["caption","h1"]);
Note that these are different from the following:
r_html.query(["caption h1"]);
The former expressions match either <caption> or <h1>, whereas the latter matches <h1> elements enclosed within <caption> elements.
query() will return an empty array if no HTML matched the selector or the selector syntax was wrong.

  • No labels