How to handle multiple variables in a Clojure algorithm implementation?

Question

I'm new to Clojure and trying to learn by implementing some algorithms in it. The algorithm I'm writing is for calculating the node betweenness centrality metric for a graph data structure.

The function in the algorithm (Brandes algorithm) I'm trying to implement goes like this:

Here, V are the vertex of the graph and s is the starting node from which we are trying to calculate and return the shortest path metrics S, Pred and sigma

This is what I have managed to come up by using loom to create the initial graph g for each starting node start:

 (defn ss-shortest-path     
   [g start]   
    (let [nodeset (disj (nodes g) start)
        pred (apply assoc {} (interleave (nodes g) (repeat nil)))
        dist (apply assoc {start 0} (interleave nodeset (repeat -1)))
        sigma (apply assoc {start 1} (interleave nodeset (repeat 0)))
        stack []]
    (loop [queue (conj clojure.lang.PersistentQueue/EMPTY start)]
      (if (empty? queue)
        {:sigma sigma
         :pred pred
         :stack stack}
        (let [v (peek queue)
              stack (conj stack v)]
          (doseq [w (successors g v)]
            (when (= (dist w) -1)
              (do
                (conj queue w)
                (assoc dist w (+ 1 (dist v)))))
            (when (= (dist w) (+ 1 (dist v)))
                  (do
                    (assoc sigma w (+ (sigma w) (sigma v)))
                    (assoc pred w v))))
            (recur (pop queue)))))))

I know that Clojure data structures are immutable so each time I call conj or assoc in the variables pred, sigma, stack, dist a new copy is created and the original variables remain as they are.

But, I don't want to use mutable states like atoms, refs, as I have a feeling that'd be simply copying the imperative style that I already know.

So, I'm seeking the help of some experienced Clojurists to help me create this function in an idiomatic style.

Thanks in advance.

Rulle · Accepted Answer · 2018-10-31 20:47:38Z

There are two main things I would do: First, the algorithm has a state consisting of multiple "variables" (queue, stack, etc.). I would first construct a function that represents the algorithmic state using an immutable map, like

(defn initialize-state [g start]
  (let [nodeset (disj (nodes g) start)]
    {:g g
     :nodeset nodeset
     :pred (apply assoc {} (interleave (nodes g) (repeat nil)))
     :dist (apply assoc {start 0} (interleave nodeset (repeat -1)))
     :sigma (apply assoc {start 1} (interleave nodeset (repeat 0)))
     :stack []
     :queue (conj clojure.lang.PersistentQueue/EMPTY start)
     :current-vertex nil}))

I would then, in the REPL, test that this map is initialized properly for various choices of g and start.

Second, I would break down the algorithm into multiple small functions that take a state as input and returns a state as output, like this (this code will not work, you have to fill in the missing parts):

(defn next-vertex [state]
  {:pre [(state? state)]
   :post [(state? %)]}
  (let [v (peek (:queue state))]
    (-> state
        (update :stack conj v)
        (assoc :current-vertex v))))

(defn process-successor [state w]
  (let [dist-w (dist w)]
    (cond
      ;; fill in...
      )))

(defn process-successors [state]
  {:pre [(state? state)]
   :post [(state? %)]}
  (reduce
   process-successor
   state
   (successors (:g state) (:current-vertex state))))

(defn pop-queue [state]
  {:pre [(state? state)]
   :post [(state? %)]}
  (update state :queue pop))

The maps with :pre and :post keys are so called pre- and post conditions, and the state?-function could be implemented, for instance, as (defn state? [x] (and (map? x) (contains? x :queue))), just as a sanity check.

Note that for every function you write, you can test it in the REPL with some data to make sure that it works, before writing the next function. All these functions can now be wrapped together into a full state transition using comp:

(def next-state (comp pop-queue process-successors next-vertex))

Now the final algorithm reads something like this:

(defn ss-shortest-path [g start]
  (loop [state (initialize-state g start)]
    (if (empty? (:queue state))
      state
      (recur (next-state state)))))

So to sum up, implementing an algorithm is a lot easier if you break it down into smaller parts that can be developed and validated individually.

this was pretty great. exactly what I was looking for. I was able to finish the code and it worked ! just one more question though about the ":pre" and ":post" maps. what are those used for and how do I define the function "state?". would appreciate that if you could explain that a bit. thanks again for the great answer. — sfactor, Commented Oct 31, 2018 at 19:37
Glad I could help! The "{:pre ... :post ...}" are so called pre- and post-conditions, explained here: clojure.org/reference/… . And "state?" is just a function that checks that the data corresponds to our chosen representation of a "state", for instance you could implement it as (defn state? [x] (and (map? x) (contains? x :queue)). You don't have to use pre- and post-conditions, many people don't. But I have found that using assertions and pre- and post-conditions helps detecting and localizing bugs. — Rulle, Commented Oct 31, 2018 at 20:39

Carcigenicate · Accepted Answer · 2018-10-31 11:38:10Z

Neither of the other answers say this explicitly, so I thought I'd clear up the "dealing with immutability" part.

I'd say that loop is the correct construct to use here. The problem with your set up is the only accumulator your loop has is queue. Every bit of data that changes from one iteration to the next should be part of the accumulators of the loop.

In your case here, dist, sigma, pred and stack are all data that has the potential to change from one loop iteration to the next, so they should all be declared in the square brackets of the loop. Then, when you need to update one of the pieces of data, you update what's given to recur:

(loop [queue (conj clojure.lang.PersistentQueue/EMPTY start)
       pred (apply assoc {} (interleave (nodes g) (repeat nil)))
       dist (apply assoc {start 0} (interleave nodeset (repeat -1)))
       sigma (apply assoc {start 1} (interleave nodeset (repeat 0)))
       stack []]

  (if (empty? queue)
    (Return everything)
    (recur (conj queue some-data)
           (assoc pred :some-key some-data)
           (assoc dist :another-key other-data)
           (assoc sigma :key data)
           (conj stack stack-data))))

Everything given to recur (the updated immutable structures in this case) will be what's available to loop on the next iteration.

I agree with @Rulle here though that you have so many accumulators, it's much neater to package them all into their own structure instead of manually dealing with the all each iteration.

Alan Thompson · Accepted Answer · 2018-10-31 04:48:30Z

Background: here is a java version of the alg: https://github.com/jgrapht/jgrapht/blob/master/jgrapht-core/src/main/java/org/jgrapht/alg/scoring/BetweennessCentrality.java

First off, You need to define s, Pred, sigma. You should also define format for g, v, start, etc.

Secondly, I'm not sure this is the best learning exercise. You could replace Java while, for, etc with Clojure loop/recur, doseq, & others, but it still feels like a "force-fit". You could probably learn a lot more a lot faster (& a lot deeper!) by reading good books like "Clojure for the Brave & True", "Getting Clojure", etc.

The idea is that small, self-contained practice problems are enormously more efficient for learning than a single gargantuan problem.

And don't forget to bookmark:

The Clojure CheatSheet
ClojureDocs.org
clojure-doc.org (yes, different despite similar name)
The Clojure Cookbook
My personal favorite library ;)

Collectives™ on Stack Overflow

How to handle multiple variables in a Clojure algorithm implementation?

3 Answers 3

And don't forget to bookmark:

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

And don't forget to bookmark:

Related