Sunday, October 9, 2016

Simple decision trees using xml in F# and Fsharp.Data.Xsd package

This post is about new xml based "Open" and "Save" features, and is the follow up of my latest post about "animal quiz" game
 (http://tonyxzt.blogspot.it/2016/09/adding-simple-gtk-gui-in-console-f.html). The "game" is a simple "dialog based yes-no decision tree builder" based on console or on gtk (enabled by the "gui" parameter on command line) written in F# (under Mac Os X using Mac/Xamarin/Visual Studio Code+Ionide).

Sources are on github: https://github.com/tonyx/fsharpanimalquizgtk

Here I share some ideas about the solution I implemented for those "Open" and "Save" features.

I used an Xml/Xsd  package available via nuget PM> Install-Package FSharp.Data.Xsd

(webpage https://giacomociti.github.io/FSharp.Data.Xsd )

For me a convenient way to represent the tree in Xml for me is represented by the following examples.

Single node decision tree containing just one element

<node>
    <animal>
        elephant
    </animal>
</node>

A two leaves decision tree with a single yes-no question on the root to distinguish those two leaves (i.e. single animal) nodes:


<node>
    <question>
        is it big?
    </question>       
    <yesBranch>
       <node>
          <animal>
             elephant
          </animal>
       </node>
    </yesBranch>
    <noBranch>
        <node>
           <animal>
              cat
           </animal>
        </node>
    </noBranch>
</node>


The following xml schema can validate the previous instances (and more of course, that I verified by some NUnit based tests )


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
    attributeFormDefault="unqualified">
    <xs:element name="node">
        <xs:complexType>
            <xs:sequence>
                <xs:choice>
                    <xs:element name="animal" type="xs:string"/>
                    <xs:sequence>
                        <xs:element name="question" type="xs:string"/>
                        <xs:element name="yesBranch">
                            <xs:complexType>
                                <xs:sequence>
                                    <xs:element ref="node"/>
                                </xs:sequence>
                            </xs:complexType>
                        </xs:element>
                        <xs:element name="noBranch">
                            <xs:complexType>
                                <xs:sequence>
                                    <xs:element ref="node"/>
                                </xs:sequence>
                            </xs:complexType>
                        </xs:element>
                    </xs:sequence>
                </xs:choice>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>


(I don't know if this implementation is the simplest possible, but I know that it works)

According to this schema, a node is a sequence of one of the two choices:
a single element named animal, or a sequence of following three element:
-a question (to distinguish categories of animals by yes or not)
-a "yesBranch" (which contains the subtree related to the "yes" answer to that question)
- a "noBranch" (which contains the subtree related to the "no" answer to that question

yesBranch and noBranch of course use recursivity so they contain again the "node" element, which is the root of this xml type.

I need to wrap xml tree from and to the internal representation of the tree. They   are logically equivalent (though field names may slightly differ from the corresponding xml elements, and that is still ok for me):

type KnowledgeTree = AnimalName of string | SubTree of Tree
and Tree = {QuestionstringYesBranchKnowledgeTreeNoBranchKnowledgeTree}



Wrapping internal tree representation to the corresponding xml string is based on producing plain xml text recursively:

let rec treeToXml tree =
    match tree with 
        | AnimalName name -> "<node><animal>" + name + "</animal></node>"
        | SubTree {Question=questionYesBranch=yBranchNoBranch=nBranch } ->  
           "<node><question>" + question + "</question>" + "<yesBranch>" + treeToXml yBranch + "</yesBranch><noBranch>" + treeToXml nBranch + "</noBranch></node>"



(this function does not depend on the Schema: if I want change the schema, I need to change treeToXml accordingly, few lines of code anyway)

To wrap the xml representation to an object, I followed the official documentation which introduces a XmlProvider type which needs to see the schema.

For instance I have

 type KnowledgeBaseXmlSchema = XmlProvider<Schema="""
        <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
            elementFormDefault="qualified" attributeFormDefault="unqualified">
            <xs:element name = "node">
                <xs:complexType>      

  ....


and then use the Parse method from the "KnowledgeBaseXmlSchema" as follows:

let xmlWrapped = KnowlegeBaseXmlSchema.Parse """<node>
....


Now the xmlWrapped object is coherent, in term of its fields and their types, with the xml datatype as defined by the schema. Particularly optional nodes are represented by option types.

Let's see go a little bit through this:
The Animal got from the KnowledgeBaseSchema.Parse is option because it follows the element in the schema which says it is optional as well. The editor Visual Studio Code+Ionide help detects statically this kind of things.





The other way around is true. If an element is mandatory, it is not "wrapped" into an option anymore:
so in the following lines of code, the myNode is not option, but its type is
XmlProvider<...>.Node,  and in fact is a mandatory element according to the xml schema.






All that said, let's take a look to the xmlToTree function:

let rec xmlToTree (tree:KnowledgeBaseXmlSchema.Node) =      
    match tree.Animal with
    | Some x -> AnimalName x
    | None ->  match (tree.Question,tree.YesBranch,tree.NoBranch) with 
      | (Some y1,Some y2,Some y3) -> SubTree {Question=y1YesBranch=xmlToTree y2.Node ; NoBranch = xmlToTree y3.Node }
      | _ -> failwith "error wrapping xml tree"  
    



I had to specify the type (KnowledgeBaseXmlSchema.Node) of the parameter, tree, because the type inferences does not work in some cases (and this is one of these cases)

(such thing is not terrible, because in the rare case type inferences won't work, I can just inspect the parameter type we are going to pass to it and then modify the function signature accordingly)

As I mentioned, the "Animal" is an instance of an option type so I have to pattern match it by the "Some X" syntax, and if it the match ha success, then the corresponding internal tree can be simply instantiated by:

AnimalName 

(single node tree)

If the root is not an animal (leaf node), then it should match the "None" part of the pattern matching instead, and in that case I assume that there must be a sequence of the following elements: a Question, a YesBranch and a Nobranch which are options as well.
So I try to match them as a triplet like (Some y1, Some y2, Some y3) and simply build a "SubTree" represented by the Question and the recursive call of the same function to the yesBranch and noBranch subnodes.
(note that this implementation is recursive, but it is not tail-recursive)

(Nunit tests are available in this file on github if needed to play more with what I share in this post https://github.com/tonyx/fsharpanimalquizgtk/blob/master/fsharpAnimalQuizKata/XmlTests.fs)

Conclusion is:
- options types are useful
- the fact that optional nodes are represented by option types by using an xml/xsd processing library is very useful
-kudos to static analysis tool provide by editors. They helps checking those things before compiling or testing.

About possible new feature I may add to this toy project:
-adding scrollbars on the main window (to easily visualize trees when they will grow)
-indenting xml output files, to make them more human readable
-better error handling in xml processing, and asking confirmation in case of re-writing existing file, for obvious reasons.

Thanks for reading. Enjoy.
Toni.




No comments: