Decoding WWW-Form Encoded Content or Query Parameters

This function decodes form-encoded content (e.g. "q=5&value=%20space%20")

decode_content = function(content) {
   content.split(re#&#).map(function(x){x.split(re#=#)}).collect(function(a){a[0]}).map(function(v,k){a = v[0];a[1]})
}

The function decode_content, given the sample value above, will produce this map:

{ "q": "5", "value": "%20space%20" }

Explanation of the chain of operators

Here is the same function, spread out over named helper functions and with added comments. The comments refer to generic query parameters, "n0=v0&n1=v1&...nk=vk", and show intermediate values during the chain of operations.

    nv_split = function(x){x.extract(re#([^=]*)=?(.*)#)}  // name-value split
    frst = function(a){a.head()}                          // first element of an array
    dup_1st = function(v){ v[0][1] }          // second element of first embedded array
    decode_content = function(content) {
      content        //"n0=v0&n1=v1&...nk=vk"
      .split(re#&#)  //["n0=v0","n1=v1",..."nk=vk"]
      .map(nv_split) //[["n0","v0"],["n1","v1"],...["nk","vk"]]
      .collect(frst) //{"n0":[["n0","v0"]],"n1":[["n1","v1"]],..."nk":[["nk","vk"]]}
      .map(dup_1st)  //{"n0":"v0","n1":"v1",..."nk":"vk"}
    }

The string operator split in line 6 evaluates to produce an array of (sub-)strings.

The array operator map in line 7 applies the function (in this case, nv_split, defined in line 1) to each element of the array to produce a new array of the same size. The string operator extract in line 1 matches the string against the given regular expression and produces an array with an element for each of the capture groups, in this case, two elements corresponding to the name and the value of one of the input parameters.

The array operator collect in line 8 applies the function (in this case, frst, defined in line 2) and produces a map with keys computed by the function, and each key collecting like elements from the array into a sub-array. The array operator head in line 2 produces the first element of the array, and is equivalent to a[0].

The map operator map applies the function (in this case, dup_1st, defined in line 3) to each key-value entry in the subject map to produce a new map with the same keys and the computed values.

Duplicate names

What happens when a parameter name is duplicated in the input string? The code shown above keeps only the first of such duplicate values.

Here is an alternative, which keeps all values, in an array in the case of duplicates.

    dup_all = function(v){
      v.length() == 1 => v[0][1]
                       | v.map(function(w){w[1]})
    }
    decode_content = function(content) {
      content        //"n0=v0&n0=v1&...nk=vk"
      .split(re#&#)  //["n0=v0","n0=v1",..."nk=vk"]
      .map(nv_split) //[["n0","v0"],["n0","v1"],...["nk","vk"]]
      .collect(frst) //{"n0":[["n0","v0"],["n0","v1"]],..."nk":[["nk","vk"]]}
      .map(dup_all)  //{"n0":["v0","v1"],..."nk":"vk"}
    }

Decoding special characters

The alert reader will have noticed the entry "value": "%20space%20" in the resulting map. The sequence %20 should be a space character so that the entry should look like "value": " space " instead.

This can be done with KRL code like this:

    decode_val = function(s) {
      s.replace(re#+#g," ")
       .replace(re#%(..)#g,hex2char)
    }
    dup_1st = function(v){ decode_val(v[0][1]) }

The string operator replace, finds an occurrence of its first argument, a regular expression, in the string and replaces it with the second argment string value. When the regular expression has the modifier g then all matching occurrences are replaced. According to a post in stackoverflow, "Encoders used the "+" character as a replacement for a space character in the early days", so we'll replace them first (line 2), and then do %20 (line 3). Actual plus signs would appear as %2B (or %2b).

Recent feature

This is a new feature (available in the engine starting at v0.45.5), which allows the operator replace to accept a function as its second argument. The function is passed the match and then the capture group and returns a string to be used as the replacement string. Because of the g in the regular expression, it will be called for each match.

The function hex2char could be written as:

    hex2char = function(encoded,hex) { // string len 2 -> character or encoded
      val = hex2dec(hex);
      hex.length() == 2 && val && val < 128
        => (val).chr()
         | encoded
    }

We get the decimal equivalent of the string passed in, which would be in hexadecimal, in line 2. Line 3 verifies that everything is valid, and returns a string with a single character (line 4), using the number operator chr. If something is wrong, we'll return instead, the value that we were asked to replace (line 5), in this case, %20, leaving the caller's string unchanged.

The function hex2dec could be written as:

    hex2dec = function(hex) { // string len k -> number in [0,16^k-1] or null
      hex like re#^[0-9A-Fa-f]*$#
        => hex.split("")
              .map(function(h){hexDigit2dec(h)})
              .reduce(function(a,h){a*16+h},0)
         | null
    }

This function converts a number in hexadecimal notation to the same value in decimal. Line 2 verifies that the input string consists solely of hexadecimal digits, using the like infix operator.

If the string named hex represents a valid hexadecimal number, the string operator split produces an array of single characters, in line 3.

In line 4, the function passed into the array operator map produces an array with the decimal value of each of the hexadecimal digits.

The array operator reduce in line 5 converts the array of decimal values into the decimal number corresponding to the input.

Line 6 returns null in the case where the input is not a valid hexadecimal number.

The function hexDigit2dec could be written as:

    o0 = "0".ord(); // 48
    oa = "a".ord(); // 97
    hexDigit2dec = function(h){ // character -> number in [0,15]
      o = h.lc().ord(); // number in [48,57] or [97,102]
      h like re#^[0-9A-Fa-f]$#
        => (o < oa
             => o - o0       // number in [0-9]
              | o - oa + 10) // number in [10-15]
         | 0
    }

KRL uses the ASCII character set in strings. The codes for the decimal digits correspond to the numbers 48 to 57, while the codes for the first six lowercase letters correspond to the numbers 97 to 102. We map the name o0 to the code value for "0" and the name oa to the code value for "a" in lines 1 and 2, respectively. We use the string operator ord to obtain the code values.

Line 4 applies the string operator lc to get the lowercase version of the string value passed in, and then obtains its code value, which will be a decimal number in one of the ranges shown in the comment.

Line 5 is a sanity check to verify that we were indeed passed a valid hexadecimal digit, once again using the infix operator like.

If all is well, we determine (line 6) whether the character is in the range "0" to "9" or not. If so, we subtract the code for "0" from its code to produce a number in the range 0 to 9. Otherwise, we subtract the code for "a" from its code and add ten to produce a number in the range 10 to 15, and one of these numbers will be returned as the value of the hexDigit2dec function.

We return the value zero if we were given a character which is not a hexadecimal digit.

Ruleset

A ruleset with this code can be found at https://raw.githubusercontent.com/Picolab/pico_lessons/master/decode_content/edu.byu.picos.www.krl

Copyright Picolabs | Licensed under Creative Commons.