Simple expression evaluator comparison between Haskell, Rust, and Common Lisp

Consider a simple expression language that consists only of the four basic mathematical operations – addition, subtraction, multiplication, and division. The idea of this exercise is to implement an evaluator for such a language in Haskell, and then compare it with literal translations (as far as possible) into Rust and Common Lisp. This should be interesting.

First off, the Haskell version:

module Main where

data Expr = Val Int
          | App Op Expr Expr
          deriving (Eq, Show, Ord)

data Op = Add | Sub | Mul | Div deriving (Eq, Show, Ord)


eval :: Expr -> Maybe Int
eval (Val n) = Just n
eval (App o e1 e2) = do m <- eval e1
                        n  Just (m + n)
                            Sub -> Just (m - n)
                            Mul -> Just (m * n)
                            Div -> safediv m n
                            

safediv :: Int -> Int -> Maybe Int
safediv m 0 = Nothing
safediv m n = Just $ m `div` n


e1 :: Expr 
e1 = App Add (Val 2) (App Mul (Val 3) (Val 6))


e2 :: Expr
e2 = App Mul (App Add (Val 1) (Val 3)) (App Div (Val 10) (Val 0))


main :: IO ()
main = do putStrLn $ show (eval e1)
          putStrLn $ show (eval e2)        

As simple as it gets! The `safediv` idea is directly implemented from Professor Graham Hutton’s book, “Programming in Haskell” (2nd Edition) from the chapter on “Functors, Applicatives, and Monads”. We simply use the `Maybe` monad to indicate potential for failure.

Let’s run it:

$ghc -O2 --make *.hs -o main -threaded -rtsopts
[1 of 1] Compiling Main             ( main.hs, main.o )
Linking main ...
$main
Just 20
Nothing

Excellent. For the expression `1 + 3 * 6`, we get `Just 20`, and for `10 / 0`, we get `Nothing`. As expected.

Now let’s see the Rust version:

#[derive(Debug, PartialEq, PartialOrd)]
pub enum Op {
    Add,
    Sub,
    Mul,
    Div
}

use self::Op::*;


#[derive(Debug, PartialEq, PartialOrd)]
pub enum Expr {
    Val(isize),
    App(Op, Box, Box)
}

use self::Expr::*;

impl Expr {
    fn eval(&self) -> Option {
        match *self {
            Val(ref n) => Some(*n),
            App(ref o, ref e1, ref e2) => {
                if let Some(m) = e1.eval() {
                    if let Some(n) = e2.eval() {
                        match *o {
                            Add => return Some(m + n),
                            Sub => return Some(m - n),
                            Mul => return Some(m * n),
                            Div => if n == 0 { 
                                        return None; 
                                    } else { 
                                        return Some (m / n) 
                                   }
                        }
                    } 
                }   
                None
            }
        }
    }
}

fn main() {
    let e1 = Box::new(App(Add, Box::new(Val(2)), 
                      Box::new(App(Mul, Box::new(Val(3)), Box::new(Val(6))))));
    
    let e2 = Box::new(App(Div, Box::new(Val(10)), Box::new(Val(0))));
    
    println!("{:?}, {:?}", e1.eval(), e2.eval());
}

As can be seen, it is an almost identical translation from Haskell into Rust. Algebraic Data Type (ADT) support in Rust makes it a very powerful language, and pattern matching in Rust is as (if not more) powerful as in Haskell. The only difference is the syntax, and I must say that the Haskell version has a lot less noise than the Rust version, even though they are conceptually identical.

Running it:

   Compiling playground v0.0.1 (file:///playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.02s
     Running `target/debug/playground`

Some(20), None

Excellent!

And finally, for the fun part – Common Lisp. Of course, this is not idiomatic Common Lisp by any stretch of the imagination, and the idea is to try and preserve the essence of the Haskell version whilst still being runnable Lisp. Common Lisp is, of course, a dynamically-typed language, and it doesn’t have a real equivalent of ADTs. However, the bigger discomfort, in my opinion, is that
Common Lisp’s pattern matching is almost non-existent (unlike Racket).

Nevertheless, this is a simple enough example that it does not cause any real problems. The interesting part is how the representation of the expression changes here – due to the lack of a sort of “schema” for the structure of the data, I am forced to include a dummy `nil` argument for the `Val` constructor.
Of course, I could have used varargs to handle that, but that would have led to more verbose code for little ROI. In any case, here is the code:

(defun evaluate (e)
    (let ((op (car e)))
        (cond ((eql op 'Val) (cadr e))
              (t (let ((e1 (evaluate (cadr e)))
                       (e2 (evaluate (caddr e))))
                (cond ((eql op 'Add) (+ e1 e2))
                    ((eql op 'Sub) (- e1 e2))
                    ((eql op 'Mul) (* e1 e2))
                    ((eql op 'Div) (if (= e2 0)
                                    nil
                                    (/ e1 e2)))
                    (t (error "invalid operation"))))))))                    
           
(defun main ()
    (format t "~s~%" (evaluate '(Add (Val 2 nil) (Mul (Val 3 nil) (Val 6 nil)))))
    (format t "~s~%" (evaluate '(Div (Val 10 nil) (Val 0 nil)))))
    

(main)    

And running it:

CL-USER> (main)
20
NIL
NIL

Implementing Nat in Rust (a la Haskell)

The power of Algebraic Data Types comes to the fore when dealing with complex problem domains. Most modern languages tend towards supporting some form of ADTs, but the complexity of expressing such paradigms is not completely lost when compared to languages that were designed with such support in mind.

Case in point – defining the prototypical example of an algebraic data type, defining the Natural Number type (Nat). Let’s compare the naive implementation in two languages that do have varying support for ADTs – Haskell, and Rust.

First, the Haskell version. Haskell, just like ML and other members of this family, is eminently suited for describing such types almost effortlessly:

module NatDemo where

import Test.HUnit

data Nat = Zero | Succ Nat deriving (Show, Eq)

nat2int :: Nat -> Int
nat2int Zero = 0
nat2int (Succ n) = 1 + nat2int n

int2nat :: Int -> Nat
int2nat 0 = Zero
int2nat n = Succ (int2nat (n-1))

add :: Nat -> Nat -> Nat
add Zero n = n
add (Succ m) n = Succ (add m n)


-- test cases
test1 = TestCase (assertEqual "nat to int" (nat2int (Succ (Succ Zero))) 2)
test2 = TestCase (assertEqual "int to nat" (int2nat 5) (Succ (Succ (Succ (Succ (Succ Zero))))))
test3 = TestCase (assertEqual "add two nats" (add (int2nat 5) (int2nat 6)) (int2nat 11))

tests = TestList [test1, test2, test3]

run = runTestTT tests

Running it to confirm:


Macushla:Nat-in-Rust z0ltan$ ghci
GHCi, version 8.0.2: http://www.haskell.org/ghc/  😕 for help
Loaded GHCi configuration from /Users/z0ltan/.ghci

Prelude> :l Nat.hs
[1 of 1] Compiling NatDemo          ( Nat.hs, interpreted )
Ok, modules loaded: NatDemo.

*NatDemo> run
Cases: 3  Tried: 3  Errors: 0  Failures: 0
Counts {cases = 3, tried = 3, errors = 0, failures = 0}

As can be seen, it’s almost trivial to define the Nat type in Haskell, as well as write functions that operate on this type. It reads almost exactly like the mathematical version of the concept. Simply beautiful.

Now let’s show the Rust version first, and then perhaps a comment or two on the implementation.

#[derive(Debug, PartialEq)]
enum Nat {
    Zero,
    Succ(Box<Nat>),
}

/// open it up for code readability
use Nat::{Zero, Succ};

fn nat_to_int(n: Nat) -> i32 {
    match n {
        Zero => 0,
        Succ(n) => 1 + nat_to_int(*n),
    }
}


fn int_to_nat(n: i32) -> Nat {
    match n {
        0 => Zero,
        n => Succ(Box::new(int_to_nat(n-1))),
    }
}


fn add(m: Nat, n: Nat) -> Nat {
    match (m, n) {
        (Zero, n) => n,
        (Succ(m), n) => Succ(Box::new(add(*m, n))),
    }
}

fn main() {
    assert_eq!(nat_to_int(Succ(Box::new(Succ(Box::new(Succ(Box::new(Zero))))))), 3);
    assert_eq!(int_to_nat(5), Succ(Box::new(Succ(Box::new(Succ(Box::new(Succ(Box::new(Succ(Box::new(Zero)))))))))));
    assert_eq!(add(int_to_nat(5), int_to_nat(11)), int_to_nat(16));
}

Let’s make sure that it’s all in working condition:

Macushla:Nat-in-Rust z0ltan$ rustc nat.rs && ./nat
Macushla:Nat-in-Rust z0ltan$

Excellent! It all just works. However, a couple of points to notice – first of all, because Rust doesn’t really have a runtime, we need to wrap the recursive part in a Box (amongst other choices) so that the compiler knows what size the data is (a box, being a pointer, is of a fixed size determinable at compile time). Of course, the test code is positively ghastly because we have to wrap all calls with Box::new, not to mention to remember to dereference the boxed value inside the functions.

Secondly, the code is surprisingly clean for the most part. However, writing the code in not nearly as intuitive as writing the same in Haskell (or ML for that matter). I can only imagine how much more daunting the task would be in a language without pattern matching (amongst other higher-level features) such as Java. Of course we would make it work, but the distance between the intent of the code and the code itself would be much much higher. Syntax does matter, folks!

Well, that was a nice little experiment of an evening.

Mutual Recursion demo in Rust and Racket (inspired by Haskell)

This is a quick post on a Rust version of the Haskell evens and odds program demonstrating mutual recursion (as shown in Hutton’s book).

First off, the Haskell code:

  evens :: [a] -> [a]
  evens [] = []
  evens (x:xs) = x : odds xs

  odds :: [a] -> [a]
  odds [] = []
  odds (_:xs) = evens xs

A sample run:

*Main> let string = "abcde"
*Main> evens string
"ace"
*Main> odds string
"bd"
*Main> string
"abcde"

So the whole ideas is to have the evens functions display all the characters in even positions (starting from 0), and then odds function likewise display all the characters in odd positions.

The evens function acts as the actual accumulator whilst odds is only used as a trampoline for continuing the recursion.

Now, for a rough Rust version of it (preserving the original character array):


fn main() {
    fn evens<T: Copy>(xs: &[T]) -> Vec<T> {
        if xs.is_empty() {
            Vec::new()
        } else {
            cons(&xs[0], &odds(&xs[1..]))
        }
    }

    fn odds<T: Copy>(xs: &[T]) -> Vec<T> {
        if xs.is_empty() {
            Vec::new()
        } else {
            evens(&xs[1..])
        }
    }

    fn cons<T: Copy>(x: &T, xs: &[T]) -> Vec<T> {
        let mut vec = Vec::new();

        vec.push(*x);

        for e in xs.iter() {
            vec.push(*e);
        }
        vec
    }

    let string = String::from("abcde");

    println!("{}",
             String::from_utf8(evens(&string.clone().into_bytes())).unwrap());
    println!("{}",
             String::from_utf8(odds(&string.clone().into_bytes())).unwrap());

    println!("{}", string);
}

And a quick run:

Macushla:EvensAndOdds z0ltan$ rustc evens_and_odds.rs
Macushla:EvensAndOdds z0ltan$ ./evens_and_odds
ace
bd
abcde

So, as can be clearly seen, the original string is left unmodified. Of course this version looks quite dirty, but the nice bit is that &[T] accepts parameters of type Vec (or reference variants) and vice-versa. This enables using slicing extensively and naturally inside the functions. The vector copying code could, of course, be made to work with an API call, but I feel this is much better in its explicit form.

The Racket version looks much nicer, being more amenable to functional constructs than Rust:

#lang racket

(define (evens xs)
  (if (null? xs)
      '()
      (cons (car xs) (odds (cdr xs)))))

(define (odds xs)
  (if (null? xs)
      '()
      (evens (cdr xs))))

(define (main)
  (let ([string "abcde"])
    (displayln (list->string (evens (string->list string))))
    (displayln (list->string (odds (string->list string))))
    (displayln string)))

And a final run for the Racket version:

evens-and-odds.rkt> (main)
ace
bd
abcde

The Observer pattern in Rust

Here is a simple implementation of the Observer pattern in Rust. First, the code, and then a small bit of explanation of the implementation itself.

The project structure:

Macushla:rust-observer z0ltan$ tree
.
├── Cargo.lock
├── Cargo.toml
└── src
    ├── lib.rs
    └── main.rs

1 directory, 4 files

The client code:

extern crate rust_observer;

use rust_observer::*;

fn main() {
    let mut foo = EvenCounter::new();
    let (bar, baz, quux) = (Box::new(EvenObserver::new("bar".to_string())),
                            Box::new(EvenObserver::new("baz".to_string())),
                            Box::new(EvenObserver::new("quux".to_string())));

    foo.register(bar);
    foo.register(baz);
    foo.register(quux);

    foo.run();
}

Nothing special here – just creating an Observable object of type EvenCounter, and then three instances of EvenObserver, which implements the Observer type.

The idea is to have the observers get notified whenever the observable object’s counter is an even number (it runs the counter in an infinite loop).

And finally, the actual implementation:

/// the observable type
pub trait Observable<T> {
    fn register(&mut self, observer: Box<Observer<Item = T>>);
}

pub trait Observer {
    type Item;

    fn notify(&self, val: &Self::Item);
}


/// the actual structs which implement the Observable and Observer
/// traits

/// the specific Observable
pub struct EvenCounter {
    counter: u32,
    observers: Vec<Box<Observer<Item = u32>>>,
}

impl EvenCounter {
    pub fn new() -> Self {
        EvenCounter {
            counter: 0u32,
            observers: Vec::new(),
        }
    }

    pub fn run(&mut self) {
        loop {
            use std::thread;
            use std::time::Duration;

            thread::sleep(Duration::from_millis(self.get_rand_duration()));

            self.counter += 1;

            if self.counter % 2 == 0 {
                for observer in self.observers.iter() {
                    observer.notify(&self.counter);
                }
            }
        }
    }

    fn get_rand_duration(&self) -> u64 {
        if cfg!(target_os = "windows") {
            500u64
        } else {
            use std::process::Command;
            use std::str::FromStr;

            let rand_cmd = Command::new("sh")
                .arg("-c")
                .arg("echo $(( $RANDOM%1000 + 1 ))")
                .output()
                .expect("failed to get OS RNG");

            u64::from_str(&String::from_utf8(rand_cmd.stdout).unwrap().trim()).unwrap()
        }
    }
}

impl Observable<u32> for EvenCounter {
    fn register(&mut self, observer: Box<Observer<Item = u32>>) {
        self.observers.push(observer);
    }
}

/// the specific Observer type
pub struct EvenObserver {
    name: String,
}

impl EvenObserver {
    pub fn new(name: String) -> Self {
        EvenObserver { name: name }
    }

    fn name(&self) -> &str {
        &self.name
    }
}

impl Observer for EvenObserver {
    type Item = u32;

    fn notify(&self, val: &Self::Item) {
        println!("{} got {}", self.name(), val);
    }
}

As you can see, the types themselves are implemented as traits, and then the specific concrete types used for the example implement these traits.

The Observable type is quite simple – it is generic over the type that its observers can get notified about. Registration of observers is done through the required method, register. Also note the Box type being used here instead of a real trait object. Sometimes it’s much easier just to box things up than bothering about lifetimes and such.

The Observer type is more interesting. It has a single method, notify which the observable object uses to call the observers. The type of value that is retrieved from the observable object is made an “Associate Type” of the Observer trait. I find this to be a much cleaner approach than defining the trait itself to be parameterized over a generic type.

A simple test run:

Macushla:rust-observer z0ltan$ cargo run
   Compiling rust_observer v0.1.0 (file:///Users/z0ltan/Projects/Playground/DesignPatterns/Observer/Rust/rust-observer)
    Finished dev [unoptimized + debuginfo] target(s) in 0.61 secs
     Running `target/debug/rust_observer`
bar got 2
baz got 2
quux got 2
bar got 4
baz got 4
quux got 4
bar got 6
baz got 6
quux got 6
bar got 8
baz got 8
quux got 8
bar got 10
baz got 10
quux got 10
bar got 12
baz got 12
quux got 12
bar got 14
baz got 14
quux got 14
^C

Very satisfying indeed!

A visitor pattern implementation in Rust

I have recently been studying the ANTLR parser-generator framework, and one of its core approaches is separating the core parsing logic from implementation details. It achieves this using two major patterns – the Listener pattern and the Visitor pattern. This is easily achieved in an Object-oriented language like Java. I was curious to see how that would translate into a language like Rust, hence this post.

Rust, while not quite being an Object-oriented language, does support static and dynamic dispatch. In this case, we are interested in dynamic dispatch, and “Trait Objects” are the main way in which Rust support this feature. However, one problem with using trait objects is that the runtime really doesn’t directly provide us with any way of retrieving the specific type at runtime. This means that calling any struct specific methods on the fly isn’t really that straightforward. The other problem (specific to the Visitor pattern) is that Rust doesn’t have the notion of method overloading. This means that the only sane approach to implementing the pattern in Rust would be to pattern match against the trait object and invoke the appropriate logic, which leads us back to the first point.

A bit of Googling leads us to a practical (if not so elegant) approach of determining the concrete type that implemented the trait at runtime by using std::any::Any. This approach entails providing a method that returns a trait object of type Any for each concrete type that implements the trait, and also explicitly checking the concrete type during pattern matching – there really is no way of extracting the concrete type directly. Instead, we have to try downcasting the trait object to each concrete type and checking if we have a match or not. While this approach works fine for our custom types (the types and count of which we know), this clearly wouldn’t work in a scenario where we are working with third-party types.

In any case, here is the code implementing a simple Visitor pattern where we visit each component of a House type, and perform some custom processing independent of the actual data structures.

Here is the overall layout of the code:

Macushla:rust_visitor z0ltan$ tree
.
├── Cargo.lock
├── Cargo.toml
└── src
    ├── lib.rs
    ├── main.rs
    └── visitors.rs

1 directory, 5 files

Here is the client code, main.rs:

extern crate rust_visitor;

use rust_visitor::*;

fn main() {
    let house = House::new();

    // simply print out the house elements
    house.accept(&visitors::HouseElementListVisitor::new());
   
    println!();
     
    // do something with the elements of a house
    house.accept(&visitors::HouseElementDemolishVisitor::new());
}

So we can see that we create the House element, and then pass in two different implementations of the HouseElementVisitor trait.

Here is the code defining the components of a House:

use std::any::Any;

pub mod visitors;

use visitors::HouseElementVisitor;

/// This represents and element of a house.
/// Each type that implements this trait
/// supports being processed by a visitor.
pub trait HouseElement {
    fn accept(&self, visitor: &HouseElementVisitor);
    fn as_any(&self) -> &Any;
}


/// Define the house entity and its elements

pub struct House {
    components: Vec<Box<HouseElement>>,
}

impl House {
    pub fn new() -> Self {
        House {
            components: vec![Box::new(Livingroom::new()), Box::new(Bedroom::new()), 
                            Box::new(Bathroom::new()), Box::new(Kitchen::new())],
        }
    }
}

impl HouseElement for House {
    fn accept(&self, visitor: &HouseElementVisitor) {
        for component in self.components.iter() {
            component.accept(visitor);
        }

        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}

struct Livingroom {}

impl Livingroom {
    fn new() -> Self {
        Livingroom{}
    }
}

impl HouseElement for Livingroom {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}


struct Bedroom {}

impl Bedroom {
    fn new() -> Self {
        Bedroom {}
    }
}

impl HouseElement for Bedroom {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}


struct Bathroom {}

impl Bathroom {
    fn new() -> Self {
        Bathroom {}
    }
}

impl HouseElement for Bathroom {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}


pub struct Kitchen {}

impl Kitchen {
    fn new() -> Self {
        Kitchen {}
    }
}

impl HouseElement for Kitchen {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}

As can be seen, each implementation of the HouseElement trait defines a method, as_any that returns self as the specific type of the trait object, &Any. This will help the visitor implementation match and find the actual concrete type to do the processing specific to that visitor and that particular HouseElement component.

Finally, here is the actual implementation of the visitors:

//! This module defines the `HouseElementVisitor` trait which defines the capabilities
//! of a type to process the components of a `HouseElement` entity, and also defines
//! a couple of visitor types that do completely different processing.

use ::*;

pub trait HouseElementVisitor {
    fn visit(&self, element: &HouseElement);
}

pub struct HouseElementListVisitor {}

impl HouseElementListVisitor {
    pub fn new() -> Self {
        HouseElementListVisitor {}
    }
}

impl HouseElementVisitor for  HouseElementListVisitor {
    fn visit(&self, element: &HouseElement) {
        if element.as_any()
            .downcast_ref::<House>()
            .is_some() {
            println!("Visiting the House...");
        }

        if element.as_any()
            .downcast_ref::<Livingroom>()
            .is_some() {
            println!("Visiting the Living Room...");
        }

        if element.as_any()
            .downcast_ref::<Bedroom>()
            .is_some() {
            println!("Visiting the bedroom...");
        }

        if element.as_any()
            .downcast_ref::<Kitchen>()
            .is_some() {
                println!("Visiting the kitchen...");
        }
    }
}


pub struct HouseElementDemolishVisitor {}

impl HouseElementDemolishVisitor {
    pub fn new() -> Self {
        HouseElementDemolishVisitor {}
    }
}

impl HouseElementVisitor for HouseElementDemolishVisitor {
    fn visit(&self, element: &HouseElement) {
        if element.as_any()
            .downcast_ref::<House>()
            .is_some() {
            println!("Annihilating the House...!!!");
        }

        if element.as_any()
            .downcast_ref::<Livingroom>()
            .is_some() {
            println!("Bombing the Living Room...!!!");
        }

        if element.as_any()
            .downcast_ref::<Bedroom>()
            .is_some() {
            println!("Decimating the bedroom...!!!");
        }

        if element.as_any()
            .downcast_ref::<Kitchen>()
            .is_some() {
                println!("Destroying the kitchen...!!!");
        }
    }
}

Dirty, but at least it works. The problem, as mentioned before, is that in order to perform specific processing using dynamic dispatch, we use the trait object &HouseElement. However, in order to match against the concrete type, we need to try and match the downcasted reference against each concrete type. This is what the element.as_any().downcast_ref::() code does.

Let’s try it out!

Macushla:rust_visitor z0ltan$ cargo run
   Compiling rust_visitor v0.1.0 (file:///Users/z0ltan/Projects/Blog/Visitor_in_Rust/rust_visitor)
    Finished dev [unoptimized + debuginfo] target(s) in 0.56 secs
     Running `target/debug/rust_visitor`
Visiting the Living Room...
Visiting the bedroom...
Visiting the kitchen...
Visiting the House...

Bombing the Living Room...!!!
Decimating the bedroom...!!!
Destroying the kitchen...!!!
Annihilating the House...!!!

Excellent! Clunky and inelegant, but at least it works.

UPDATE: The implementations of the HouseElementVisitor trait can be simplified into a more elegant match expression block and using the is method of the Any trait as shown below:

Updated HouseElementListVisitor:

impl HouseElementVisitor for  HouseElementListVisitor {
    fn visit(&self, element: &HouseElement) {
        match element.as_any() {
            house if house.is::<House>() => println!("Visiting the house..."),
            living if living.is::<Livingroom>() => println!("Visiting the Living room..."),
            bedroom if bedroom.is::<Bedroom>() => println!("Visiting the bedroom..."),
            kitchen if kitchen.is::<Kitchen>() => println!("Visiting the kitchen..."),
            _ => {}
        }
    }
}

And the HouseElementDemolishVisitor implementation:

impl HouseElementVisitor for HouseElementDemolishVisitor {
    fn visit(&self, element: &HouseElement) {
        match element.as_any() {
            house if house.is::<House>() => println!("Annihilating the house...!!!"),
            living if living.is::<Livingroom>() => println!("Bombing the Living room...!!!"),
            bedroom if bedroom.is::<Bedroom>() => println!("Decimating the bedroom...!!!"),
            kitchen if kitchen.is::<Kitchen>() => println!("Destroying the kitchen...!!!"),
            _ => {}
        }
    }
}

Much more readable in my opinion, and it also hides the somewhat ugly downcast_ref::() bit!

A basic multimap prototype in Rust

Recently, I have been working on a couple of projects, and in one of them, the logic required the use of a multimap. Now, this project is in Rust, and the standard Rust map types, `HashMap` and `BTreeMap` do not support duplicate keys.
Of course, there are full-blown crates available that provide multimap functionality.

However, this being a dead simple project, I decided to roll one out myself. This version simply wraps a `HashMap<K, Vec>` instance inside the custom data structure, but the advantage is that the client API remains consistent and never the wiser about the innards of the actual multimap type. Since I just needed `insert`, `get`, and `iter`, I implemented those for the custom multimap. If needed, the entire API of the `HashMap` type could be implemented for `MHashMap` as well.

Here is the code. It is simple enough that no explanation is really required:

#![allow(dead_code)]

/// A simple wrapper around a HashMap to simulate the basic functionality
/// of a multimap.

use std::collections::HashMap;
use std::hash::Hash;
use std::collections::hash_map::Iter;


#[derive(Debug)]
struct MHashMap<K: Eq + PartialEq + Hash, V>{
    map: HashMap<K, Vec<V>>,
}

impl<K, V> MHashMap<K, V> 
    where K : Eq + PartialEq + Hash
{
    fn new() -> Self {
        MHashMap {
            map: HashMap::new(),
        }
    }

    fn iter(&self) -> Iter<K, Vec<V>> {
        self.map.iter()
        
    }

    fn insert(&mut self, k: K, v: V) {
        if self.map.contains_key(&k) {
                self.map.get_mut(&k).unwrap().push(v);
        } else {
            self.map.insert(k, vec![v]);
        }
    }

    fn get(&self, k: &K) -> Option<&Vec<V>> {
        self.map.get(&k)
    }

    fn get_mut(&mut self, k: &K) -> Option<&mut Vec<V>> {
        self.map.get_mut(&k)
    }
}


fn main() {
    let mut mmap = MHashMap::new();

    mmap.insert(1, "One");
    mmap.insert(1, "Uno");
    mmap.insert(1, "Ein");
    mmap.insert(2, "Two");
    mmap.insert(3, "Three");

    println!("{:?}", mmap);

    let key = 1;

    match mmap.get(&key) {
        Some(vals) => println!("Got {:?} for key {}", vals, key),
        None => println!("Key {} not found!", key),
    }

    for entry in mmap.iter() {
        println!("{} = {:?}", entry.0, entry.1);
    }
}

A small test run:

Macushla:tcp_study z0ltan$ rustc mhash.rs
Macushla:tcp_study z0ltan$ ./mhash
MHashMap { map: {3: ["Three"], 1: ["One", "Uno", "Ein"], 2: ["Two"]} }
Got ["One", "Uno", "Ein"] for key 1
3 = ["Three"]
1 = ["One", "Uno", "Ein"]
2 = ["Two"]

Short and simple!

A quick comparison of Euclid’s Algorithm in Haskell, Rust, and D

I recently procured a copy of Graham Hutton’s most excellent book, “Programming in Haskell” (2nd Edition). I had earlier worked through the first edition, and that is the book that really opened my eyes to the power of Haskell! Having become a bit rusty with my Haskell, I decided to work through the second edition book (which is considerably larger and more comprehensive). As part of that exercise, I decided to use a lazy weekend afternoon to code up Euclid’s GCD Algorithm in Haskell, Rust and in D. Just for a quick visual comparison of how the languages look. Here’s how they look:

First off, the Haskell version, which is the best looking one in my opinion:

Macushla:Playground z0ltan$ cat Euclid.hs
module Main where

euclid :: Int -> Int -> Int
euclid m n | m <= 0 && n <= 0 = error "GCD works for positive numbers only"
           | m == n = m
           | m < n = euclid m (n-m)
           | otherwise = euclid (m-n) n

main :: IO ()
main = do putStrLn "Enter the first number: "
          x <- getLine
          putStrLn "Enter the second number: "
          y <- getLine
          let x' = read x :: Int
          let y' = read y :: Int

          putStrLn $ "The GCD of " ++ x
                     ++ " and " ++ y
                     ++ " is " ++ show (euclid x' y')

Macushla:Playground z0ltan$ ghc Euclid.hs
[1 of 1] Compiling Main             ( Euclid.hs, Euclid.o )
Linking Euclid ...
Macushla:Playground z0ltan$ ./Euclid
Enter the first number:
12
Enter the second number:
18
The GCD of 12 and 18 is 6

Here’s the Rust version (a bit uglier, but the Pattern Matching is quite nice):

Macushla:Playground z0ltan$ cat euclid.rs
use std::io;
use std::str::FromStr;
use std::cmp::Ordering;

fn get_number(prompt: &str) -> u32 {
    println!("{}", prompt);

    let mut input = String::new();

    io::stdin().read_line(&mut input)
        .expect("no input!");

    u32::from_str(input.trim()).unwrap()
}

fn main() {
    let x = get_number("Enter the first number: ");
    let y = get_number("Enter the second number: ");

    println!("The GCD of {} and {} is {}", x, y, euclid(x, y));
}

fn euclid(m: u32, n: u32) -> u32 {
    assert!(m > 0 && n > 0);

    match m.cmp(&n) {
        Ordering::Equal => m,
        Ordering::Less => euclid(m, n-m),
        Ordering::Greater => euclid(m-n, n),
        }
}
Macushla:Playground z0ltan$ rustc euclid.rs && ./euclid
Enter the first number:
12
Enter the second number:
18
The GCD of 12 and 18 is 6

And finally, here is the D version – clean, succinct, and a pleasure to read as always:

Macushla:Playground z0ltan$ cat euclid.d
import std.stdio: readln, writeln, writefln;

uint get_number(string prompt) {
    writeln(prompt);

    import std.conv: to;
    import std.string: chomp;

    return readln().chomp().to!(uint);
}

void main() {
    uint x = get_number("Enter the first number: ");
    uint y = get_number("Enter the second number: ");

    writefln("The GCD of %s and %s is %s", x, y, euclid(x, y));
}

uint euclid(uint m, uint n) {
    assert(m > 0 && n > 0);

    if (m < n) {
        return euclid(m, n-m);
    } else if (m == n) {
        return m;
    } else {
        return euclid(m-n, n);
    }
}
Macushla:Playground z0ltan$ dmd -run euclid.d
Enter the first number:
12
Enter the second number:
18
The GCD of 12 and 18 is 6

One thing is for sure. Aside from syntactic differences, most modern languages are all converging in terms of paradigms and features. Hell, even keeping languages like Haskell and Idris aside, most mainstream languages are also converging on the syntactical front! Anyway, this was a fun little exercise on a slow afternoon. I personally like the Haskell version best , followed by the D version, and then finally the Rust version (ruined by the somewhat ugly I/O syntax). What do you think?

A simple demo of using channels in Rust

Continuing with my study of Rust, I ventured into concurrency using Rust at long last!

The standard library in Rust provides a few basic concurrency primitives, but of them all, my favourite has to be the support for channels. I decided to try out a small program where the factorial of a number is calculated in another thread, and the result piped back into the calling function.

The code is presented as follows:

/// Factorial using channel

use std::io;
use std::str::FromStr;
use std::thread;
use std::sync::mpsc::{channel, Sender};


fn factorial(tx: Sender<u64>, n: u64) {
    fn helper(acc: u64, n: u64) -> u64 {
        if n == 0 { acc } else { helper(acc * n, n - 1) }
    }

    thread::spawn(move || { tx.send(helper(1u64, n)).unwrap(); });
}


fn get_number() -> u64 {
    println!("Enter a number");

    let mut input = String::new();

    io::stdin()
        .read_line(&mut input)
        .expect("could not read input");

    match u64::from_str(input.trim()) {
        Ok(n) => n,
        Err(e) => panic!(e.to_string()),
    }
}


fn main() {
    let n = get_number();

    let (tx, rx) = channel();

    factorial(tx, n);

    println!("The factorial of {} is {}", n, rx.recv().unwrap());
}

Pretty straightforward code. We create a channel using std::sync::mspsc::channel, which returns a (Sender, Receiver) tuple. As the names suggest, Sender is used for transmitting data on the channel, and Receiver is used for receiving data on the same channel.

This approach also precludes the need for waiting on the thread explicitly.

Let’s take it for a spin:

bash-3.2$  rustc channel_factorial.rs && ./channel_factorial
Enter a number
20
The factorial of 20 is 2432902008176640000

Short and sweet!

The contrived example without using a channel might look something like so:

/// concurrent factorial using plain threads

use std::thread;
use std::sync::{Arc, Mutex};


fn factorial(n: u64) -> u64 {
    fn helper(acc: u64, n: u64) -> u64 {
        if n == 0 { acc } else { helper(acc * n, n - 1) }
    }

    helper(1u64, n)
}


fn main() {
    let data = Arc::new(Mutex::new(0u64));

    let data_clone = data.clone(); // bump the RC count

    let handle = thread::spawn(move || {
        let mut fact_cell = data_clone.lock().unwrap();
        *fact_cell = factorial(10);
    });

    match handle.join() {
        Ok(_) => println!("Factorial of 10 = {:?}", data),
        Err(e) => panic!(e),
    }
}

Not too bad, eh? The only difference is that we wrap the actual data inside a Mutex, which is itself wrapped inside an Arc object. This is to allow for shared mutable data using reference counting (since every object has exactly one owner at any point of time).

Inside the spawned thread, we can then get a lock on the cloned data object, and then we update the wrapped data with the factorial value. The problem with this approach is that it is not trivial to extract the value out. Taking it for a test run:

bash-3.2$ rustc concurrent_factorial.rs && ./concurrent_factorial
Factorial of 10 = Mutex { data: 3628800 }

As we can see, we do get the correct result, but unfortunately it’s wrapped up inside the mutex object, and getting it out is not easy since the API itself doesn’t seem to provide any way of doing so. Suggestions on StackOverflow seem to imply wrapping the data inside an Option object beforehand (as in something like

Arc<Mutex<Option<T>>>

), but that road didn’t lead me anywhere nice and safe either. I probably need to learn more about this approach in addition to perusing the documentation for a nice and elegant way out of this jumble of wrapped generic types (if there is one). No wonder I’d rather have the nice Go-esque channels!

UPDATE: Thanks to the good folks over at #rust, I got a simple and elegant solution for the unwrapping issue! Here is the update code:

/// concurrent factorial using plain threads

use std::thread;
use std::sync::{Arc, Mutex};


fn factorial(n: u64) -> u64 {
    fn helper(acc: u64, n: u64) -> u64 {
        if n == 0 { acc } else { helper(acc * n, n - 1) }
    }

    helper(1u64, n)
}


fn main() {
    let data = Arc::new(Mutex::new(0u64));

    let data_clone = data.clone(); // bump the RC count

    let handle = thread::spawn(move || {
        let mut fact_cell = data_clone.lock().unwrap();
        *fact_cell = factorial(10);
    });

    match handle.join() {
        Ok(_) => println!("Factorial of 10 = {:?}", *data.lock().unwrap()),
        Err(e) => panic!(e),
    }
}

The change is this: *data.lock().unwrap() inside the match in main. The explanation given was this: since the data itself is wrapped inside a Mutex within an Arc, it needs to be unwrapped in the reverse order – dereference the Arc to get the Mutex, and then dereference that to get the data! The unwrap is necessary since lock() returns a LockResult object.

The Rust community is indeed one of the most helpful communities on IRC. The problem which had me vexed for an hour or so was solved within minutes!

A Line number closure in Rust, and an interesting battle with the Rust compiler

It had been a while since I’d played around with Rust, so I decided to implement one of my two favourite programs when learning new languages viz., a line number closure (with mutation, of course) as shown by Hoyte in his seminal book, “Let Over Lambda”. The other program is, of course, implementing a functioning Rational number class (as in the previous blog post, for instance).

Well, with Rust, it’s a very…interesting experience to say the least. The Rust Borrow Checker is a very strange beast indeed. So, let’s have a look at how we can arrive at a working version of the program.

Background

Rust does have powerful support for closures (what respectable language does not today?). However, the interesting part is that closures in Rust are simply syntactic sugar for traits. To be more precise, there are three available traits that a closure can belong to:

  • the Fn trait – this is the least permissive of all, and it allows a read-only closure (in effect). This is ideal when we want to simply have a self-contained immutable closure. For instance,
    fn square<F>(n: i32, callback: F)
        where
        F: Fn(i64) ->() {
        callback(n as i64 * n as i64);
    }
    
    fn main() {
        square(100, |n| {  println!("Received {}", n); });
    }
    

    This duly prints out:

    Received: 10000
    

    This is quite straightforward. As we can see, The type of the callback, which is the closure, is Fn(i64) -> (), which means that it takes a 64-bit integer and does some side-effect (printing to screen in this case). No mutation anywhere, no capturing of environments, and so the Fn trait works superbly here.

  • the FnMut trait is more interesting. This is needed when we need to mutate the environment captured by the closure. As a small example:
    fn process_mutating_closure<F>(mut closure: F)
        where
        F: FnMut(i32) -> () {
        closure(100);
    }
    
    fn main() {
        let mut local = 100;
    
        process_mutating_closure(|x| { 
                  local += 100; 
                  println!("Value = {}", x + local); 
        });
    
    

    which produces:

    Value = 300
    

    All right, this is a bit more complicated than the previous version. So, let’s break it down. What we intend to do is this – the closure that’s created in the main function captures a mutable variable, local, in the enclosing lexical scope. The closure then modifies this captured variable, and uses it for some dummy processing before printing the final value out.

    Since a captured variable is being modified, the type of the closure will be FnMut. This is exactly what’s reflected in the type signature when the closure is passed as an argument to process_mutating_closure.

  • , and finally, the FnOnce trait is the most restrictive of all. It basically transfers ownership of the capture to the closure itself. If you are not familiar with Rust’s wonderful (sarcasm) Ownership and Borrowing world, consider yourself lucky.
    fn move_closure_demo<F: FnOnce(i32) -> ()>(closure: F) {
        closure(100)
    }
    
    fn main() {
        let mut local = 100;
    
        move_closure_demo(move |x| {
            local += 100;
            println!("Result = {}", x + local);
        });
    
        println!("{}", local);
    }
    
    Result = 300
    

    Eh? This code looks almost identical to the previous one (except for the change in the function signature of the closure from FnMut to FnOnce and the move in front of the closure). Well, if you notice, you will also see that the function var, closure in the function move_closure_demo also does not require mut unlike in the previous case.

    What’s happening here is that because we have “moved” the closure, the closure is now the owner of the captured variable, local. Wait a minute, so how come we can still use local in main after it’s been moved? It turns out that because the binding of local is to a Copy-able type (i32 in this case), a copy of the binding’s value is actually moved. However, this won’t work for types that do not implement the Copy trait. Confused? Well, you should be. I mean, wtf?

All right, now let’s get on with the meat of the actual matter.

First attempt

Let’s put our knowledge of how closures are actually syntactic sugar for the aforementioned trifecta of traits to use. Here’s the first shot:

fn gen_line_number() -> (FnMut() -> ()) {
    let mut line_no = 0;

    || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    }
}

fn main() {
    let foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

Since we’re modifying the capture var, line_no, we need the closure to be of the trait type, FnMut. Looks okay, let’s take it for a spin:

bash-3.2$ rustc hoyte-blog.rs 
error[E0277]: the trait bound `std::ops::FnMut() + 'static: std::marker::Sized` is not satisfied
 --> hoyte-blog.rs:1:26
  |
1 | fn gen_line_number() -> (FnMut() -> ()) {
  |                          ^^^^^^^^^^^^^ the trait `std::marker::Sized` is not implemented for `std::ops::FnMut() + 'static`
  |
  = note: `std::ops::FnMut() + 'static` does not have a constant size known at compile-time
  = note: the return type of a function must have a statically known size

error[E0308]: mismatched types
 --> hoyte-blog.rs:4:5
  |
4 |     || {
  |     ^ expected trait std::ops::FnMut, found closure
  |
  = note: expected type `std::ops::FnMut() + 'static`
  = note:    found type `[closure@hoyte-blog.rs:4:5: 8:6 line_no:_]`

error[E0277]: the trait bound `std::ops::FnMut(): std::marker::Sized` is not satisfied
  --> hoyte-blog.rs:12:9
   |
12 |     let foo = gen_line_number();
   |         ^^^ the trait `std::marker::Sized` is not implemented for `std::ops::FnMut()`
   |
   = note: `std::ops::FnMut()` does not have a constant size known at compile-time
   = note: all local variables must have a statically known size

error[E0277]: the trait bound `std::ops::FnMut(): std::marker::Sized` is not satisfied
  --> hoyte-blog.rs:20:9
   |
20 |     let bar = gen_line_number();
   |         ^^^ the trait `std::marker::Sized` is not implemented for `std::ops::FnMut()`
   |
   = note: `std::ops::FnMut()` does not have a constant size known at compile-time
   = note: all local variables must have a statically known size

error: aborting due to 4 previous errors

Whoa! Okay, let’s read the error messages in order. Maybe that will help us fix the issue better. Right, so the first one says something about std::marker::Sized not being implemented… blah blah blah. What all this verbosity actually means is that we are returning a closure from the function, and it needs to know the size of the closure at compile time itself.

Fair enough. However, why does it work when we’re passing a closure into a function the same way (instead of returning the closure from it)? Who the hell really knows? Well, the basic problem here (as I figure it) is that we need to wrap the whole damned thing inside a heap pointer (aka the Box type) since we’re returning from the function, and the stack frame for that function is going to get de-allocated, we need to return a heap pointer to the object so that it can be used even after the current function frame’s destroyed.

This is the reason why code like the following works because the callback function, closure cannot outlive the function where the closure was generated (main):

fn foo<F: Fn() -> ()>(closure: F) {
    closure();
}

fn main() {
    foo(|| { println!("Hello, world!");});
}
Hello, world!

Okay, let’s proceed.

Second Attempt

Having gained that great insight from the first version, here’s how the second version looks like:

fn gen_line_number() -> Box<FnOnce() -> ()> {
    let mut line_no = 0;

    Box::new(move || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    })
}

fn main() {
    let foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

and…

bash-3.2$ rustc hoyte-blog.rs 
error[E0161]: cannot move a value of type std::ops::FnOnce(): the size of std::ops::FnOnce() cannot be statically determined
  --> hoyte-blog.rs:15:9
   |
15 |         foo();
   |         ^^^

error[E0161]: cannot move a value of type std::ops::FnOnce(): the size of std::ops::FnOnce() cannot be statically determined
  --> hoyte-blog.rs:23:9
   |
23 |         bar();
   |         ^^^

error[E0382]: use of moved value: `*foo`
  --> hoyte-blog.rs:15:9
   |
15 |         foo();
   |         ^^^ value moved here in previous iteration of loop
   |
   = note: move occurs because `*foo` has type `std::ops::FnOnce()`, which does not implement the `Copy` trait

error[E0382]: use of moved value: `*bar`
  --> hoyte-blog.rs:23:9
   |
23 |         bar();
   |         ^^^ value moved here in previous iteration of loop
   |
   = note: move occurs because `*bar` has type `std::ops::FnOnce()`, which does not implement the `Copy` trait

error: aborting due to 4 previous errors

Sigh. Just when things were looking up… theoretically. All right, before we lose our sanity, let’s read all the bloody error messages and try to make some sense of them. Okay, ignoring the first two about statically size something, we see that the other errors are complaining about foo and bar being already moved in the previous iteration of the loop. At least that makes some sort of sense – since we have the closure passed to main and assigned to foo and bar, and, as the error message says, FnOnce does not implement the Copy trait, we cannot change the owner of the closure in subsequent iterations of the for loop.

Let’s rectify that by using the FnMut trait instead, which is actually what we want since we are not using any ownership changes, and we’re mutating the captured local variable, it sounds like the perfect candidate for our use case.

In case you’re wondering about modifying the main function instead to use references inside the for loops, I assure you that it’s not an oversight on my part – down that road lies perdition! (Reflect upon the fact that I have a working copy, and I’m actually working backwards through some sane progression amongst the many that I had to endure along the way!)

Third attempt

Here’s the version with the closure type changed to FnMut:

fn gen_line_number() -> Box<FnMut() -> ()> {
    let mut line_no = 0;

    Box::new(move || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    })
}

fn main() {
    let foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

And yes, you guessed it, it simply doesn’t work, but you’ll love the error messages:

bash-3.2$ rustc hoyte-blog.rs 
error: cannot borrow immutable `Box` content `*foo` as mutable
  --> hoyte-blog.rs:15:9
   |
15 |         foo();
   |         ^^^

error: cannot borrow immutable `Box` content `*bar` as mutable
  --> hoyte-blog.rs:23:9
   |
23 |         bar();
   |         ^^^

error: aborting due to 2 previous errors

Hmmm… hmmm? It mentions that when I call foo and bar, it’s trying to make something mutable that’s actually immutable. Eh? Okay, let’s see. The local captured variable was “moved” to live beyond the function’s stack frame, so that’s fine. I’m using FnMut so that I can actually mutate the captured variable, that’s fine. I’m wrapping the whole damned thing inside a Box so that the whole closure can persist beyond the function call, and finally, all I’m trying to do is to increment the mutable captured line_no var inside my for loops. That also sounds fine. So when did the contents of the box become immutable?

All right, let’s see the final version first.

Final version

fn gen_line_number() -> Box<FnMut() -> ()> {
    let mut line_no = 0;

    Box::new(move || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    })
}

fn main() {
    let mut foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let mut bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

and running it,

bash-3.2$ rustc hoyte-blog.rs 
bash-3.2$ ./hoyte-blog 
Current line number: 1
Current line number: 2
Current line number: 3
Current line number: 4
Current line number: 5

Current line number: 1
Current line number: 2
Current line number: 3
Current line number: 4
Current line number: 5

Wow! Glory at last! Heh. Well, if you have the eyes of a hawk, you’ll observe that the only change was adding the mut qualifier to the closure handles in main. So what that extremely useful, intuitive, and helpful error message was telling us is that we have a boxed closure containing a captured mutable local var which we’re updating in every iteration of the respective for loops, but unfortunately we have assigned the closures to immutable handles, and so the whole thing just won’t work.

Whew. And seriously, wtf?

Rust appears to be great fun when things go well, but when things go badly (and they will, of course), it can be quite a PITA. Granted, I am not an expert or even an intermediate Rust programmer, but perhaps people’s complaints about the learning curve being seriously fucked up aren’t really exaggerated. If the learning curve had been at least logical, it would be a different matter. As things stand though, the more I venture into the language, the more “patched-together” it appears to be. Also, all the various exceptions to the rules in addition to the myriad rules about exceptional scenarios aren’t really helping, folks. I doubt anything can really be done about it now, though. Heh.