Importing data from other systems
Everything depends on the format that the other system can use to export its internal data. This page will discuss techniques for parsing various data interchange formats.
JSON would be preferable because it can easily be converted to a KRL map. Use the string operator decode
on the JSON as-a-string to produce a KRL map.
CSV
Comma-separated values is a venerable data interchange format, and can be easily parsed with KRL functions. We will use the string operators split
, replace
, and extract
, and the array operators tail
, filter
, and map
, and the map operator filter
.
For example here are a few lines of a spreadsheet containing data that we might want to import into a pico.
The first step is to have the spreadsheet export the data in CSV format. This same data, in CSV format shows some of the challenges involved in parsing the data. For this sample, inspection of the data showed no commas within the first and last name fields, but commas did appear for some records in the hall of fame column. In CSV format, such values will be surrounded by double quotes.
Dividing a file into lines
We'll start with a function to import the file and split it into lines.
import = function(content) { newline = (13.chr() + "?" + 10.chr()).as("RegExp"); content.split(newline) .tail() .filter(function(s){s}) }
Line 1 defines the import
function, which expects the entire exported file as a single string. Line 2 defines a regular expression which will match the end of a line within the string. Line 3 splits the entire content into an array of lines, each one a string. Line 4 keeps all of the lines except for the first one, which is a header line and not actual data. Finally, line 5 throws away any line which is the empty string (there will be one such after the last newline sequence in the data file), using the fact that KRL treats an empty string as a falsy value.
Alternative way to divide a file into lines
import = function(content) { content.extract(re#(.+)#g) .tail() }
This technique lets the regular expression (applied "globally" (i.e. as many times as needed) to the content
) do the work of splitting the file into lines. The String operator extract()
will return all matches of "one character (or more) on a line" as an array of strings. We still discard the first (header) line, but no longer need to filter out empty lines.
Obtaining an input file
While it would be possible to make a query to the import
function and pass it the entire exported file as a string, we found it preferable to have the function obtain the data file from a URL.
import = function(url) { newline = (13.chr() + "?" + 10.chr()).as("RegExp"); http:get(url){"content"}.split(newline) .tail() .filter(function(s){s}) }
WIth this change, we can call the import function, passing it the URL where it will get the contents of the exported data and split it into lines.
Parsing a line
Having a file divided into lines may be enough for some pico-based applications. In the case we are discussing, there is some structure, a pattern, to each line.
graduand_map = function(line) { line.extract(re#^([\w ]+),(\w+),(\d+),(\d+),(.*)$#) } import = function(url) { newline = (13.chr() + "?" + 10.chr()).as("RegExp"); http:get(url){"content"}.split(newline) .tail() .filter(function(s){s}) .map(function(s){graduand_map(s)}) }
New line 9 in the import
function maps each line into whatever is returned by the new function graduand_map
to which it passes each line.
Line 2 in that function divides each line into an array of five parts by pattern matching. So, now, the import
function returns an array of arrays.
Modifying the graduand_map
function as follows will make the import
function return an array of maps instead, with each map having the keys "fn", "ln", "id", and "hf".
graduand_map = function(line) { parts = line.extract(re#^([\w ]+),(\w+),(\d+),(\d+),(.*)$#); { "fn": parts[0], "ln": parts[1], "id": parts[2].as("Number")*10 + parts[3].as("Number"), "hf": parts[4] } }
In this specific case, it was desirable to combine the values in the third and fourth columns into a single identifier.
Dealing with double quoted values
We add a new function to handle the value from the fifth column by simply removing the double quotes (if present), and splitting it on commas.
hall_of_fame = function(hf) { hf.replace(re#"#g,"").split(re#, #) } graduand_map = function(line) { parts = line.extract(re#^([\w ]+),(\w+),(\d+),(\d+),(.*)$#); { "fn": parts[0], "ln": parts[1], "id": parts[2].as("Number")*10 + parts[3].as("Number"), "hf": hall_of_fame(parts[4]) } }
The splitting, which produces an array of strings for the fifth column, "hf", value, is done in line 2. Be sure to notice the "g" after the regular expression. This instructs the replace
operator to remove not just the first instance of a double quote character, but to do it globally.
The hall_of_fame
function is called on each line of the data file in line 9.
Dealing with infrequent values
We modify the graduand_map
function as shown.
graduand_map = function(line) { parts = line.extract(re#^([\w ]+),(\w+),(\d+),(\d+),(.*)$#); { "fn": parts[0], "ln": parts[1], "id": parts[2].as("Number")*10 + parts[3].as("Number"), "hf": parts[4] => hall_of_fame(parts[4]) | null }.filter(function(v,k){v}) }
Notice the use of a ternery expression in line 6 so that we don't call the hall_of_fame
function when the fifth column has no value (i.e. is the empty string), but rather puts a null
in the "hf" field of the map that it returns. Finally, line 7 filters out any null
values from the returned map, which means that records with no fifth column string will have a map without the "hf" key.
Saving the imported data
What we will need is a rule (two rules in our case) to react to availability of the import data.
rule intialization { select when graduands_collection csv_available where ent:graduands.isnull() fired { ent:graduands := {}; } } rule import_graduands_collection { select when graduands_collection csv_available foreach import(event:attr("url")) setting(map) pre { key = "g" + (map{"id"}.as("String")); } fired { ent:graduands{key} := map; } }
The initialization
rule will fire first, if applicable, making ent:graduands
an empty map.
The import_graduands_collection
rule will then fire. Line 9 specifies that the remainder of the rule will repeat for each map
in the array returned by the import
function. It constructs a key
for each of these, and uses that key to store the map
into ent:graduands
in line 14.
After importing the sample data file, this is what the corresponding entries look like in the map stored in ent:graduands
.
Importing corrected data files
Data changes, it's just a fact of life. One way to deal with this is to delete the entity variable (manually, in the "Rulesets" tab for the pico), and send the graduands_collection:csv_available
event again. Having deleted the entity variable, the initialization
rule will fire again before the import begins.
Computation within foreach
In a related project, the raw data didn't include something that could be used as a key. Instead of a unique position within a page, we only have a row number:
First,Last name,Page,Row Paul,Blogorodow,43,1 David,Braun,43,1 Brian,Duell,43,1 Earl,Hall,43,1 Rodney,Haynes,43,2 Cheryl,Jensen,43,2 Laurie,Jensen,43,2 Gordon,Layton,43,2 Duncan,Loree,43,3 Shirley,O'Donnell,43,3 Logan,Porter,43,3 Bob,Vik,43,3 Blair,Valgardson,43,4 Linda,Wall,43,4
We'll use an import function similar to the one used previously (tweaked to allow for a surname like "O'Donnell").
Then, in the rule which does the import from a CSV file into an entity variable, we'll use logic to compute a number within each row, and use that to make our key
unique.
rule import_undergraduates_collection { select when undergraduates_collection csv_available foreach import(event:attr("url")) setting(map) pre { row = map{"id"}; num_on_row = row == ent:last_row => ent:last_num_on_row + 1 | 1; key = "u" + row + num_on_row; } fired { ent:undergraduates{key} := map; ent:last_row := row; ent:last_num_on_row := num_on_row; } }
The logic requires keeping track of the last row value seen and the last number seen within the current row. Notice that in the prelude we don't use names bound to values, but rather entity variables.
The first time we reference ent:last_row
its value will be null (which will not be equal to the first row
value we see). The first time we reference ent:last_num_on_row
its value will be 1
.
If we want to clean up these entity variable, we can either do it manually in the UI after completing the import, or use the on final
guard condition in the postlude. The postlude would then look like:
fired { ent:undergraduates{key} := map; ent:last_row := row; ent:last_num_on_row := num_on_row; clear ent:last_row on final; clear ent:last_num_on_row on final; }
The clearing will then only happen on the last iteration of the foreach
, when we know that we won't be needing these values again.
Copyright Picolabs | Licensed under Creative Commons.