Tuesday, September 3, 2013

Going Functional

Local functions, like the map, concat and bang operators, originally appeared in MarkLogic 6, but even when working heavily with the technology, it is easy to miss the memo that they were released. If you are familiar with JavaScript, local functions should make perfect sense, but from other languages, local functions (and functions as arguments) are actually fairly advanced features.
Anyone who has worked with XQuery knows how to create a modular function. First you define a namespace:
declare namespace foo = "http://www.myexample.com/xmlns/foo";
or, if you’re actually creating a library module:
module namespace foo = "http://www.myexample.com/xmlns/foo";
This is used to group the functionality of all functions within “foo:”.
Once this declaration is made, you can then declare a function within the module namespace. For instance, suppose that you have a function called foo:title-case that takes a string (or node with text content, and returns a result where every first letter of each word is given in upper case and all other letters in the word are lower case:
declare function foo:title-case($expr as item()) as item(){
 fn:string-join(fn;tokenize(fn:string($expr),"\s") ! 
     (fn:upper-case(fn:substring(.,1,1) || fn:lower-case(fn:substring(.,2))," ")
 };
Then, in a different xquery script you can import the foo library and use the function:
import module namespace foo = "http://www.myexample.com/xmlns/foo" at "/lib/core/foo.xq";
 let $title := xdmp:get-request-field("title")
 return <div>{foo:title-case($title)}</div>
So far, so good. This kind of approach is very useful for establishing formal APIs, and when designing your applications, you should first look to modularize your code like this. These are public interfaces.
Not all functions, however, need to be (or even should be) public or global. With MarkLogic 6.0, a new kind of function was implemented, one that looks a lot more like a JavaScript function than it does XQuery. Thus makes use of the “function” function, and has the general form:
let $f := function($param1 as typeIn1,$param2 as typeIn2) as typeOut {
      (: function body :)
      }
For instance, suppose that you wanted a function that gave you the distance between two points, represented by a two or three dimensional sequence. This would then be written as:
let $distance:= function($pt1 as xs:double*,$pt2 as xs:double*) as xs:double {
      if (fn:count($pt1) = fn:count($pt2) ) then
             let $dif := for $index in (1 to fn:count($pt1)) return $pt2[$index] - $pt1[$index]
             let $difsquare := math:pow($dif,2)
             return math:pow(fn:sum($difsquare),0.5)
     else fn:error(xs:QName("xdmp:custom-error"),"Error: Mismatch of coordinates","The number of coordinates in $pt1 does not match that in $pt2")
     }
let $point1 := (0,0)
let $pointt2 := (50,40)
return $distance($point1,$point2)
There are several things to note here. The first is that when a function is assigned to a variable, that variable can then be invoked like a function by providing arguments – e.g., $distance($point1,$point2). This carries a number of implications, not least of which being that if a function is assigned to a local variable, then it has only local scope – it cannot be used outside of its calling scope by that name. Now, if a variable was declared globally in a module, this syntax could be used:
declare variable $my:distance := function(..){..};
is perfectly valid, and externally, this would be invoked as
$my:distance($pt1,$pt2)
Of course at that point there’s probably not much benefit to declaring the function as a variable rather than simply as a function, but it is possible.
A third point is that you can raise errors within such a locally defined function in precisely the same way you would a module function, wth the fn:error() function. Finally, local functions do not end with semi-colons.
In both of the functions given above, there are very few reasons to make these local. Where such functions do come in handy is when they employ a principle called closure, which means that the function is able to encapsulate a temporary bit of state that persists as long as the function itself does.  A simple helloworld function provides a good example of this.
let $greeting := "Hello"
let $hello-world := function($name as xs:string?) as xs:string {
     $greeting || ", " || (if (fn:empty($name)) then "World" else $name)
     }
return $hello-world("Kurt")
==> "Hello, Kurt"
In this case, the variable $greeting  is passed into the function directly, being already defined.  The function can then reference this variable directly, effectively saving state.
In practice, this is still pretty cumbersome. Where it begins to make a difference is when the function in turn generates a function as output. We can then create a whole set of generators:
declare namespace my = "http://example.com/xmlns/my";
declare function my:generate-greeting($greet-term as xs:string) as xdmp:function {
      function($name as xs:string?) as xs:string {$greet-term || ", " || 
          (if (fn:empty($name)) then "World" else $name)}
      };
let $greet1 := my:generate-greeting("Hello")
let $greet2 := my:generate-greeting("Welcome")
return ($greet1("Kurt"),$greet2("Kurt"))
The xdmp:function identifies the output of the function as a function itself. The two variables $greet1 and $greet2 then become functions that take a name and generate the appropriate output message. Using a function to create a set of functions is well known in programming, and is identified as the factory pattern – each factory creates multiple functions or objects based upon some input parameter.
In working with the MarkLogic semantics capability (in MarkLogic 7), this factory pattern can definitely prove useful.  For instance, consider a Sparql query builder. Let’s say that there are certain queries which occur quite often, and as such it makes sense to save them as files. For instance, the following retrieves the name of all items that link to a specific sem:iri.  To make this query you need to specify all of the prefixes used within the SPARQL query, need to concatenate this (cleanly) with the query, set the sort order and lmit size, then if desired serialize these out to different formats. Because you may do this a number of times (this is a remarkably common query), it seems like this may be a good candidate for a factory. The following illustrates one such factory:
import module namespace sem="http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy";
declare namespace sp="http://example.com/xmlns/sparql";
declare function sp:sparql-factory($namespace-map as item(),$query as xs:string?,$file-path as xs:string?) as xdmp:function {
     let $query := if ($file-path = "") then $query else fn:unparsed-text($file-path)
      let $prefixes := fn:string-join(for $key in map:keys($namespace-map) return ("prefix "||$key||": <"||
           map:get($namespace-map,$key) || ">"),"&#13;")||"&#13;"
      return function($arg-map as item()) as item()* {sem:sparql($prefixes||$query||" ",$arg-map)} 
 };

declare function sp:curie-factory($ns-map as item()) as xdmp:function {
    function($curie as xs:string) as sem:iri {sem:curie-expand($curie,$ns-map)}
 }; 

declare function sp:merge-maps($ns-maps as item()*) as item(){
    if (fn:count($ns-maps)=1) then 
         $ns-maps
   else 
         let $map := map:map()
         let $_ := for $ns-map in $ns-maps return for $key in map:keys($ns-map) return map:put($map,$key,map:get($ns-map,$key))
         return $map
 };
declare variable $sp:ns-map := map:new((
map:entry("rdf","http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
map:entry("context","http://disney.com/xmlns/context"),
map:entry("semantics","http://marklogic.com/semantics"),
map:entry("fn","http://www.w3.org/2005/xpath-functions#"),
map:entry("xs","http://www.w3.org/2001/XMLSchema#"),
map:entry("rdfs","http://www.w3.org/2000/01/rdf-schema#"),
map:entry("owl","http://www.w3.org/2002/07/owl#"),
map:entry("skos","http://www.w3.org/2004/02/skos/core#"),
map:entry("xdmp","http://marklogic.com/xdmp#"),
map:entry("entity","http://example.com/xmlns/class/Entity/"),
map:entry("class","http://example.com/xmlns/class/Class/"),
map:entry("map","http://marklogic.com/map#")));
let $curie := sp:curie-factory($sp:ns-map)
let $links-query := sp:sparql-factory($sp:ns-map,"","/sparql/linkList.sp")
return $links-query(map:entry("s",$curie("class:Person"))) ! map(.,"name")
This code actually defines two factories. The curie-factory takes a map of prefixes and namespaces and binds these into a function that will attempt to match a term’s namespace with one of those in the map. If it’s there, then this function will generate the appropriate sem:iri from the shortened curie form.
The second function sparql-factory, takes the map of namespaces indexed by prefix and uses this to generate the prefix headers for the  query. This can become significant when the number of namespaces is high, and it’s a small step from this to saving these namespaces in a file and updating them when new namespaces are added.
The function in turn generates a new function that either takes the query supplied in text or loads it in from a supplied text file stored in the data directory.  The newly created function can then take a map of parameters to return an associated json map or sem:triples list. In this case the output is a list of names that satisfy the query itself.
One final note, while talking about maps. You can assign a function to a map, which can then be persisted and retrieved.  In the above example, for instance, you could have persisted the functions:
xdmp:document-insert("/functions/named-functions",
        map:new((map:entry(" curie",$curie),map:entry("links",$links-query)))
in another routine, you could then retrieve these:
let $fn-map = map:map(fn:doc("/functions/named-functions"))
let $links-fn := map:get($fn-map,"links")
let $curie-fn := map:get($fn-map,"curie")
let $links := $links-fn(map:entry("s",$curie-fn("class:Person"))
return $links ! map(.,"name")
This becomes especially useful when tracking “named queries” produced by users.
There are other design patterns that can also use functions as arguments (most notably the decorator pattern), but in general the principle is the same – by using local functions it becomes to create general functions that either produce or consume functions of their owns, giving you considerably more power in writing code.