Wednesday, September 29, 2010

Lesson 6 - Parsing the HTTP Request in Clojure

In our previous post we managed to dump the http request headers back to the output to be viewed in the browser. If we want our server to do any better than what it does so far, we need to parse the request headers into a clojure data structure so we can query the various components of the request. If you are not familiar with a http request have a look at your last programs output again. The first line of the http request has three parts. A method "GET", a path "/", and the protocol "HTTP/1.1". The rest of the lines are name value pairs separated by a ":". We will parse these strings to a clojure key value map and dump the output to our browser.


(use '[clojure.contrib.str-utils2 :only (split)])
(use 'clojure.contrib.server-socket)
(import  '(java.io BufferedReader InputStreamReader PrintWriter))



(defn process-request-first-line []
  (zipmap
    [:method :path :protocol]
    (split (read-line) #"\s")))


(defn process-request []
  (loop
    [ result (process-request-first-line)
      line (read-line)]
    (if (empty? line)
      result
      (recur
        (assoc
          result
          (keyword (first (split line #":\s+")))
          (last (split line #":\s+")))
        (read-line)))))


(create-server
  8080
  (fn [in out]
    (binding
      [ *in* (BufferedReader. (InputStreamReader. in))
        *out* (PrintWriter. out)]
      (println "HTTP/1.0 200 OK")
      (println "Content-Type: text/html")
      (println "")
      (println (process-request))
      (flush))))


Run the program above and open localhost:8080 in your browser. You should see a dump of a key value map. My output was something like this.
{:path /, :protocol HTTP/1.1, :Accept application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5, :Accept-Encoding gzip,deflate,sdch, :method GET, :User-Agent Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.55 Safari/534.3, :Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.3, :Host localhost:8080, :Accept-Language en-US,en;q=0.8, :Connection keep-alive}


If all the above code and output looks a bit cryptic at the moment, don't worry, I will explain everything by the end of this lesson. The output above is a closure map with keyword - value pairs
{:path "/", :protocol "HTTP/1.1" .... etc etc }
:path is the first keyword whose value is "/", next is :protocol with value "HTTP/1.1" and so on.


Now let us look at the new source code we added.
(use '[clojure.contrib.str-utils2 :only (split)])
We first use the clojure.contrib.str-utils2 split function in our namespace. We use the :only keyword to indicate that only the split function will be imported.


(defn process-request-first-line []
  (zipmap
    [:method :path :protocol]
    (split (read-line) #"\s")))


We then define a function called process-request-first-line which processes the first line of the http request and returns a key value map of the three components of the first line. The zipmap function takes a collection of keys [:method :path :protocol] and a collection of values returned by
(split (read-line) #"\s")
and returns a map with the keys mapped to the corresponding values. The split function reads the first line and splits it with the regular expression #"\s" which is the space character, into three values. In our case the returned map will be
{:method "GET" :path "/" :protocol "HTTP/1.1"}


Let us look at the process-request function. The function loops through all the lines in the HTTP Request and fills up the map with the keyword value pairs. First the loop function binds two lexically scoped variables "result" and "line". "result" is bound to the value returned by process-request-first-line, which is the key value map above. "line" is bound to the second line of the http request. Remember that the first line was read by process-request-first-line function in the previous binding.


With the binding done we go through the expressions in the loop function. First we check if "line" is empty. If yes then we are done. Return "result". If not the recur function is called. The recur function will now bind new values to "result" and "line" and pass control back to the top of the loop.


        (assoc
          result
          (keyword (first (split line #":\s+")))
          (last (split line #":\s+")))


The assoc function takes a map "result" and adds a new keyword value pair to it. In this case the keyword is the first part of "line" before the ":" in the line.
(split line #":\s+")
This line splits "line" into two parts based on the regular expression #":\s".
(first (split line #":\s+"))
The "first" function takes the first element of the split.
(keyword (first (split line #":\s+")))
The "keyword" function converts the "first" string returned into a keyword.
(last (split line #":\s+"))
The "last" function returns the last part of the string "line" which is the value of the keyword in the previous line.
So now "result" is bound with the original map with this new key value pair added. In the last
        (read-line)))))
recur binds "line" to the next line read and we go to the top of the loop.

We really did not need the process-request-first-line function. I had used that function to simplify understanding of the code. I will now combine both the function and you will see the final source code below.


(use '[clojure.contrib.str-utils2 :only (split)])
(use 'clojure.contrib.server-socket)
(import  '(java.io BufferedReader InputStreamReader PrintWriter))


(defn process-request []
  (loop
    [ result
        (zipmap
          [:method :path :protocol]
          (split (read-line) #"\s"))
      line (read-line)]
    (if (empty? line)
      result
      (recur
        (assoc
          result
          (keyword (first (split line #":\s+")))
          (last (split line #":\s+")))
        (read-line)))))


(create-server
  8080
  (fn [in out]
    (binding
      [ *in* (BufferedReader. (InputStreamReader. in))
        *out* (PrintWriter. out)]
      (println "HTTP/1.0 200 OK")
      (println "Content-Type: text/html")
      (println "")
      (println (process-request))
      (flush))))


Lets move on to learning about variables in Clojure

2 comments: