February 7, 2017

Quick Recursive XML Parsing

Just this week, some friends and family and I needed a common place to all upload photos from my son's birthday party; something like a shared google drive or dropbox account.

Not everyone has google account and not everyone has dropbox accounts. And some people's emails wouldn't let them send large files.

So, I started thinking about how I might use AWS S3. It turns out that it's possible to build an html form to allow people to post files into s3 buckets. If you're interested in the details, here's a good overview.

Of course after getting a basic static html form working, I immediately started coding up a solution using clojurescript (of course)!

Amazon AWS s3 sends responses to post requests as xml. For example when you try submitting multiple files in one request, here's the response:

<Error>
  <Code>InvalidArgument</Code>
  <Message>POST requires exactly one file upload per request.</Message>
  <ArgumentName>file</ArgumentName>
  <ArgumentValue>0</ArgumentValue>
  <RequestId>461ZZBA822906D222</RequestId>
  <HostId>92kNasuSIh9i/Wbbec3OGOOY4Jfag643G9dyZ8wUy08uSCrBdS7bNq6uhtw7OHLY1waOb1LSg=</HostId>
</Error>

Ideally, I wanted to be able to do something like this in order to check whether there was any errors in the response from aws:

(get-in response ["Error" "Message"])

Btw, just in case you ever need to recursively parse tree-like data structures in clojure(script), take a look at tree-seq, walk, postwalk, and postwalk-demo. Also google around to learn about zippers. If you've never heard of them before, prepare for your mind to explode!

For this case, I didn't need zippers, but I still think they're really neat. Anyway, here's my solution. This was really fun to write:

(defn xml->clj
  "Convert xml dom type document into clojure map"
  [elem] 
  (let [branch?  (fn [o] (> (.-childElementCount o) 0))
        children (fn [o] (array-seq (.-children o)))
        walk     (fn walk [node]
                   {(or (.-tagName node) "root")
                    (if (branch? node)
                      (into {} (mapcat walk (children node)))
                      (.-innerHTML node))})]
    (walk elem)))

The response returned from AWS is an XML Document, so I can simply pass it into the xml->clj function shown above. xml-clj takes the XML Document and recursively transforms it into a cljs map with tag names as the keys, and innerHTML as the values. Here's an example of the result:

{"root" 
 {"Error" 
  {"Code" "InvalidArgument", 
   "Message" "POST requires exactly one file upload per request.", 
   "ArgumentName" "file", 
   "ArgumentValue" "0", 
   "RequestId" "345EF00ABC2997D29AB", 
   "HostId" "L012rOT/sgCzQ6hUCliLbys9Jrnn6bsGz4Kgh8+FM8scflXH4uTA1RL6sDhaPFJ3hO7ABCOjs="}}}

And now, I can simply use get-in to easily inspect whatever I need:

(get-in result ["root" "Error" "Message"])

Exactly what I needed.

Happy recursive xml parsing, everyone.

Tags: clojure clojurescript