# An inefficient Haskell implementation of the Integer Partitioning problem

This is my first shot at implementing a solution for the integer partitioning problem in Haskell. It is a very inefficient algorithm, and it would probably either hang forever or crash with a core dump beyond, say, 20, but it is a first step in an efficient implementation of this problem which appears to be egregiously difficult to implement in idiomatic Haskell as compared to say, Java.

This implementation also generates the partitions in lexicographic order.

Here is the code:

module Partitions where
isort :: Ord a => (a -> a -> Bool) -> [a] -> [a]
isort _ [] = []
isort f (x:xs) = insert f x (isort f xs)
where
insert _ x [] = [x]
insert f x (y:ys) | f x y = x : (y:ys)
| otherwise = y : insert f x ys
lexical :: Ord a => [a] -> [a] -> Bool
lexical [] [] = True
lexical [] ys = True
lexical xs [] = True
lexical (x:xs) (y:ys) | x > y = False
| x < y = True
| otherwise = lexical xs ys
clean :: Ord a => [[a]] -> [[a]]
clean xss = h (map (isort (<)) xss) []
where
h [] acc = acc
h [xs] acc = if elem xs acc then acc else (xs:acc)
h (xs:xss) acc = if elem xs acc then h xss acc
else h xss (xs:acc)
partitions :: Int -> [[Int]]
partitions 0 = []
partitions n = isort lexical \$ clean \$ f start [start]
where
start = replicate n 1
f [_] acc = acc
f xs acc = f (normalise xs) (acc ++ cs) ++ clean (concat (map (\ys -> f ys []) cs))
where
normalise [] = []
normalise [x, y] = [x+y]
normalise (x:xs) = x : normalise xs
combinations [] = []
combinations [x] = [[x]]
combinations xs = g xs (length xs – 2)
where
g xs s | s < 0 = []
| otherwise = [take s xs ++ [xs!!s + xs!!(s+1)] ++ drop (s+2) xs] ++
g xs (s-1)
cs = clean \$ combinations xs

Running some sample tests:

\$ ghci
Prelude> :l Partitions.hs
[1 of 1] Compiling Partitions       (Partitions.hs, interpreted )
*Partitions> partitions 0
[]
*Partitions> partitions 4
[[1,1,1,1],[1,1,2],[1,3],[2,2],[4]]
*Partitions> partitions 9
[[1,1,1,1,1,1,1,1,1],[1,1,1,1,1,1,1,2],[1,1,1,1,1,1,3],[1,1,1,1,1,2,2],[1,1,1,1,1,4],[1,1,1,1,2,3],[1,1,1,1,5],[1,1,1,2,2,2],[1,1,1,2,4],[1,1,1,3,3],[1,1,1,6],[1,1,2,2,3],[1,1,2,5],[1,1,3,4],[1,1,7],[1,2,2,2,2],[1,2,2,4],[1,2,3,3],[1,2,6],[1,3,5],[1,4,4],[1,8],[2,2,2,3],[2,2,5],[2,3,4],[2,7],[3,3,3],[3,6],[4,5],[9]]
*Partitions> partitions 12
[[1,1,1,1,1,1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1,1,1,2],[1,1,1,1,1,1,1,1,1,3],[1,1,1,1,1,1,1,1,2,2],[1,1,1,1,1,1,1,1,4],[1,1,1,1,1,1,1,2,3],[1,1,1,1,1,1,1,5],[1,1,1,1,1,1,2,2,2],[1,1,1,1,1,1,2,4],[1,1,1,1,1,1,3,3],[1,1,1,1,1,1,6],[1,1,1,1,1,2,2,3],[1,1,1,1,1,2,5],[1,1,1,1,1,3,4],[1,1,1,1,1,7],[1,1,1,1,2,2,2,2],[1,1,1,1,2,2,4],[1,1,1,1,2,3,3],[1,1,1,1,2,6],[1,1,1,1,3,5],[1,1,1,1,4,4],[1,1,1,1,8],[1,1,1,2,2,2,3],[1,1,1,2,2,5],[1,1,1,2,3,4],[1,1,1,2,7],[1,1,1,3,3,3],[1,1,1,3,6],[1,1,1,4,5],[1,1,1,9],[1,1,2,2,2,2,2],[1,1,2,2,2,4],[1,1,2,2,3,3],[1,1,2,2,6],[1,1,2,3,5],[1,1,2,4,4],[1,1,2,8],[1,1,3,3,4],[1,1,3,7],[1,1,4,6],[1,1,5,5],[1,1,10],[1,2,2,2,2,3],[1,2,2,2,5],[1,2,2,3,4],[1,2,2,7],[1,2,3,3,3],[1,2,3,6],[1,2,4,5],[1,2,9],[1,3,3,5],[1,3,4,4],[1,3,8],[1,4,7],[1,5,6],[1,11],[2,2,2,2,2,2],[2,2,2,2,4],[2,2,2,3,3],[2,2,2,6],[2,2,3,5],[2,2,4,4],[2,2,8],[2,3,3,4],[2,3,7],[2,4,6],[2,5,5],[2,10],[3,3,3,3],[3,3,6],[3,4,5],[3,9],[4,4,4],[4,8],[5,7],[6,6],[12]]

An explanation will be provided when the efficient solution has been implemented!

# An absolute beginner’s guide to folding in Haskell

(This is a small write-up I did on a forum for a Haskell MOOC).

`foldr` and `foldl` are simply means of reducing/accumulating/folding over values of a sequence into a single value. That’s basically it!

To make it more concrete, let’s analyse a specific implementation of `foldr` and `foldl` for Lists:

The easier one to understand is actually `foldl` since it is more intuitive and natural:

```foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f acc [] = acc
foldl f acc (x:xs) = foldl f (f acc x) xs
```

The first thing to do is look at the type. How does it read? In plain English, you can read that as “foldl is a function that takes three arguments: the first argument is a function that takes a value of type b and a value of type a and produces a value of type b, the second argument is a value of type b (this is the accumulator), and the final argument is a list of values of type a. The overall result is a single value of type b”. Makes sense?

With that intuition, let’s look at an example and map that to the definition itself:

Suppose you have a list, `xs = [1, 2, 3, 4, 5]` and you want to find the sum of all the elements of this list. Then you basically want to reduce the list into a single element under addition, right? So you can define the whole operation as: `foldl (+) 0 xs`.

Now see how that maps to the definition. So the first argument to `foldl`, which is a function, is `+`, and this makes sense since `+` is a binary function that takes two numerical values and produces their sum. The second argument, which is the “accumulator” (basically the one which keeps accumulating the results as we traverse the list) and we want to start off with 0 here since that is the identity for addition. Finally, the final argument is the list of values itself, `xs`.

Now look at the body of `foldl`. It has two patterns to match the inputs against:

1). `foldl f acc [] = acc`

So when we run out of values in our input list, we simply return the accumulator, which had been dutifully collecting the running sum of the elements of the list. So in our case, this would be analogous to something like `foldl (+) acc [] = acc`.

2). `foldl f acc (x:xs) = foldl f (f acc x) xs`

This is the more interesting case. Translating it to our example, we might have something like `foldl (+) acc (x:xs) = foldl (+) (acc + x) xs`. This is the crucial part – note the `acc + x` part. In the recursive call, the next value of `acc` will be `acc + x`, right? So we are collecting, in this example, the sums of the elements of the list in the variable `acc`. Now, the most important bit – note that `acc` is always the left operand of the function as in `acc + xs` (which, in function application form would be `(+) acc xs`). This is the reason why we call it “foldl” or “fold left” – we are simply reducing the list from the left to the end of the list!

So, for example:
foldl (+) 0 [1, 2, 3, 4, 5] can be expanded (conceptually) as:

foldl (+) 0 [1, 2, 3, 4, 5]
= foldl (+) (0+1) [2, 3, 4, 5
= foldl (+) ((0+1)+2) [3, 4, 5]
= foldl (+) (((0+1)+2)+3) [4, 5]
= foldl (+) ((((0+1)+2)+3)+4) [5]
= foldl (+) ((((((0+1)+2)+3)+4)+5) []
= (((((0+1)+2)+3)+3)+5) — from the first pattern in the definition of `foldl`.
= 15

As you can see, we move from left to right across the list, and keep accumulating the values as we go along. This is also the reason why `foldl` is much more efficient than `foldr` (as we will see).

Now, onto `foldr`. `foldr` is very similar to `foldl`, but whereas `foldl` fold from left to right, `foldr` fold from right to left! First, again, let’s look at the definition:

```foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f acc [] = acc
foldr f acc (x:xs) = f x (foldr f acc xs)
```

The type in plain English: “foldr is a function that takes three arguments – the first argument is a binary function that takes a value of type a and a value of type b, and produces a value of type b, the second argument is the “accumulator” of type b, and the final argument is the list of values of type a. The overall result is a single value of type b”.

Now taking the same example, let’s analyse the body:

1). `foldr f acc [] = acc`

This is identical to the first pattern in the body of `foldl`. This makes sense since once we have exhausted the list, we simply need to return the final result, which is stored in the running accumulator, `acc`.

2), `foldr f acc (x:xs) = f x (foldr f acc xs)`

Now, this is the tricky part and needs to be analysed carefully. The analogue of this definition for our sum example would be `foldr (+) acc (x:xs) = x + (foldr (+) acc xs)`. What does this mean?

Well, since we want to process the list starting from the rightmost end of the list, we are forced to have to update the accumulator from the right and move towards the beginning of the list. This is what that definition basically means. So if you observe carefully, the value of the accumulator is actually given by the expression `(foldr f acc xs)` (or `(foldr (+) acc xs)` in our example). Remember that in the case of `foldl`, the accumulator was always the left operand. Likewise, in the case of `foldr`, the accumulator is always the right operand. This is why in the expression `f x (foldr f acc xs)`, the second operand is the running state of the accumulator. Note that the full call itself cannot be fully evaluated until the `(foldr f acc xs)` part has been evaluated. This means that we keep on building up a stack of function calls, and only when the entire list has been consumed can we begin actually updating the value of the accumulator. This is why `foldr` is much slower and memory-intensive than `foldl`, which does not suffer from this deficiency.

To use the same example, let’s evaluate the call `foldr (+) 0 [1, 2, 3, 4, 5]`:

foldr (+) 0 [1, 2, 3, 4, 5]
= 1 + (foldr (+) 0 [2, 3, 4, 5])
= 1 + (2 + (foldr (+) 0 [3, 4, 5]))
= 1 + (2 + (3 + (foldr (+) 0 [4, 5])))
= 1 + (2 + (3 + (4 + (foldr (+) 0 [5]))))
= 1 + (2 + (3 + (4 + (5 + (foldr (+) 0 []))))) — we now match with the first pattern in the definition of `foldr`
= 1 + (2 + (3 + (4 + (5 + 0)))) — we now keep updating the accumulator all the way from right to left
= 1 + (2 + (3 + (4 + 5)))
= 1 + (2 + (3 + 9))
= 1 + (2 + 12)
= 1 + 14
= 15

If you observe carefully, the expressions are parenthesised from right to left whereas in the case of `foldl`, they were from left to right.

Now, you see that both `foldl` and `foldr` gave the same result 15 for our example. However, this is only because addition is a commutative property (just like multiplication), and that’s why it doesn’t matter if we fold from the left or from the right. So, multiplication would also give the same results for both `foldl` and `foldr`, but subtraction and division would not, since they are not commutative operations.

To confirm this assertion, just compare the outputs of the following expressions:

```Prelude> foldr (+) 0 [1..5]
15
Prelude> foldl (+) 0 [1..5]
15

Prelude> foldr (*) 1 [1..5]
120
Prelude> foldl (*) 1 [1..5]
120

Prelude> foldr (-) 0 [1..5]
3
Prelude> foldl (-) 0 \$ [1..5]
-15

Prelude> foldr (/) 1.0 [1.0, 2.0, 3.0, 4.0, 5.0]
1.875
Prelude> foldl (/) 1.0 [1.0, 2.0, 3.0, 4.0, 5.0]
8.333333333333333e-3
```

Let’s just analyse the subtraction example:

foldr (-) 0 [1..5]
= 1 – (foldr (-) [2, 3, 4, 5])
= 1 – (2 – (foldr (-) [3, 4, 5]))
= 1 – (2 – (3 – (foldr (-) [4, 5])))
= 1 – (2 – (3 – (4 – (foldr (-) [5]))))
= 1 – (2- (3 – (4 – (5 – (foldr (-) [])))))
= 1 – (2 – (3 – (4 – (5 – 0))))
= 1 – (2 – (3 – (4 – 5)))
= 1 – (2 – (3 – (-1)))
= 1 – (2 – 4)
= 1 – (-2)
= 3.

and

foldl (-) 0 [1..5]
= foldl (-) (0-1) [2, 3, 4, 5]
= foldl (-) ((0-1)-2) [3, 4, 5]
= foldl (-) (((0-1)-2)-3) [4, 5]
= foldl (-) ((((0-1)-2)-3)-4) [5]
= foldl (-) (((((0-1)-2)-3)-4)-5) []
= (((((0-1)-2)-3)-4)-5)
= -15.

Et voila!

### Optional

In case you are familiar with C, the functions may be rendered thus, for instance:

```// The generic foldr function
void *foldr(void *(*fptr)(void*, void*), void *acc, void *arr, size_t n, size_t delta)
{
if (!n) {
return acc;
} else {
return fptr(arr, foldr(fptr, acc, arr+delta, n-1, delta));
}
}

// The generic foldl function
void *foldl(void *(*fptr)(void*, void*), void *acc, void *arr, size_t n, size_t delta)
{
if (!n) {
return acc;
} else {
return foldl(fptr, fptr(acc, arr), arr+delta, n-1, delta);
}
}
```

Note how they map to the Haskell definitions directly.

If you are interested, here is a full program (note that this is only to simulate Haskell behaviour as closely as possible – it is not the idiomatic way to do the operations in C, and it leaks memory as well which we do not care to consider for this demo) that you can run to observe the behaviour and compare it with Haskell:

The program:

```#include <stdio.h>
#include <stdlib.h>

{
int *res = malloc(sizeof(int));
*res = *((int*) x) + *((int*) y);

return (void*) res;
}

void *multiply_int(void *x, void *y)
{
int *res = malloc(sizeof(int));
*res = *((int*) x) * *((int*) y);

return (void*)res;
}

void *subtract_int(void *x, void *y)
{
int *res = malloc(sizeof(int));
*res = *((int*) x) - *((int*) y);

return (void*) res;
}

void *divide_double(void *x, void *y)
{
double *res = malloc(sizeof(double));
*res = *((double*) x) / *((double*) y);

return (void*) res;
}

// The generic foldr function
void* foldr(void *(*fptr)(void*, void*), void *acc, void *arr, size_t n, size_t delta)
{
if (!n) {
return acc;
} else {
return fptr(arr, foldr(fptr, acc, arr+delta, n-1, delta));
}
}

// The generic foldl function
void *foldl(void *(*fptr)(void*, void*), void *acc, void *arr, size_t n, size_t delta)
{
if (!n) {
return acc;
} else {
return foldl(fptr, fptr(acc, arr), arr+delta, n-1, delta);
}
}

int sum_foldr(int a[], size_t n)
{
int sum = 0, acc = 0;
void *ret = foldr(add_int, &acc, a, n, sizeof(int));
if (ret) {
sum = *((int*) ret);
free(ret);
}

return sum;
}

int product_foldr(int a[], size_t n)
{
int prod = 1, acc = 1;
void *ret = foldr(multiply_int, &acc, a, n, sizeof(int));
if (ret) {
prod = *((int*) ret);
free(ret);
}

return prod;
}

int sum_foldl(int a[], size_t n)
{
int sum = 0, acc = 0;
void *ret = foldl(add_int, &acc, a, n, sizeof(int));
if (ret) {
sum = *((int*) ret);
free(ret);
}

return sum;
}

int product_foldl(int a[], size_t n)
{
int prod = 1, acc = 1;
void *ret = foldl(multiply_int, &acc, a, n, sizeof(int));
if (ret) {
prod = *((int*) ret);
free(ret);
}

return prod;
}

int sub_foldr(int a[], size_t n)
{
int diff = 0, acc = 0;
void *ret = foldr(subtract_int, &acc, a, n, sizeof(int));
if (ret) {
diff = *((int*) ret);
free(ret);
}

return diff;
}

int sub_foldl(int a[], size_t n)
{
int diff = 0, acc = 0;
void *ret = foldl(subtract_int, &acc, a, n, sizeof(int));
if (ret) {
diff = *((int*) ret);
free(ret);
}

return diff;
}

double div_foldr(double a[], size_t n)
{
double div = 1.0, acc = 1.0;
void *ret = foldr(divide_double, &acc, a, n, sizeof(double));
if (ret) {
div = *((double*) ret);
free(ret);
}

return div;
}

double div_foldl(double a[], size_t n)
{
double div = 1.0, acc = 1.0;
void *ret = foldl(divide_double, &acc, a, n, sizeof(double));
if (ret) {
div = *((double*) ret);
free(ret);
}

return div;
}

int main(int argc, char **argv)
{
int a[] = { 1, 2, 3, 4, 5 };
size_t n = sizeof(a)/sizeof(a[0]);

double b[] = { 1.0, 2.0, 3.0, 4.0, 5.0 };
size_t m = sizeof(b)/sizeof(b[0]);

printf("sum_foldr = %d\n", sum_foldr(a, n));
printf("product_foldr = %d\n", product_foldr(a, n));

printf("sum_foldl = %d\n", sum_foldl(a, n));
printf("product_foldl = %d\n", product_foldl(a, n));

printf("sub_foldr = %d\n", sub_foldr(a, n));
printf("div_foldr = %lf\n", div_foldr(b, m));

printf("sub_foldl = %d\n", sub_foldl(a, n));
printf("div_foldl = %lf\n", div_foldl(b, m));

return 0;
}
```

Running it:

```\$ gcc -Wall -O2 -o fold fold.c && ./fold
sum_foldr = 15
product_foldr = 120
sum_foldl = 15
product_foldl = 120
sub_foldr = 3
div_foldr = 1.875000
sub_foldl = -15
div_foldl = 0.008333
```

Exactly the same result as in the case of Haskell.

The basic idea is this – take a simple example, and work through it with pen and paper to really understand how it works!

As an extension to the previous blog, we finally reach the highest level of abstraction yet – the Monad. Let’s see how we can implement our custom `List` type as an instance of the `Monad` typeclass. Here is the definition of the `Monad` typeclass as given in the book:

```class Applicative m => Monad m where
return :: a -> m a

(>>=) :: m a -> (a -> m b) -> m b

return = pure
```

This means that in order to make our `List` type an instance of `Monad`, we will have to make it an instance of `Functor` as well as Applicative.

Here is a simple but comprehensive implementation:

```data List a = Nil | Cons a (List a) deriving Show

append :: List a -> List a -> List a
append Nil ys = ys
append (Cons x xs) ys = Cons x (append xs ys)

instance Functor List where
-- fmap :: (a -> b) -> List a -> List b
fmap _ Nil = Nil
fmap f (Cons x xs) = Cons (f x) (fmap f xs)

instance Applicative List where
-- pure :: a -> List a
pure x = Cons x Nil

-- (<*>) :: List (a -> b) -> List a -> List b
Nil <*> _ = Nil
(Cons g gs) <*> xs = fmap g xs `append` (gs <*> xs)

-- (>>=) :: List a -> (a -> List b) -> List b
Nil >>= _ = Nil
(Cons x xs) >>= f = (f x) `append` (xs >>= f)
```

We also define some test data to work with, and a function called `pairs` that simply returns the cross-product of the two input lists.

```
-- test data
l1 :: List Int
l1 = Cons 1 (Cons 2 (Cons 3 (Cons 4 (Cons 5 Nil))))

l2 :: List String
l2 = Cons "Hello" (Cons "World" (Cons "we" (Cons "meet" (Cons "again" Nil))))

pairs :: Monad m => m a -> m b -> m (a, b)
pairs xs ys = xs >>= \x ->
ys >>= \y ->
return (x, y)
```

Note that the `pairs` function could also have been written using the `do` notation as follows:

```pairs :: Monad m => m a -> m b -> m (a, b)
pairs xs ys = do x <- xs
y <- ys
return (x, y)
```

This is identical to the previous version since the `do` notation is simply syntactic sugar for the bind operator.

Sample test run:

```*Main> :t l1
l1 :: List Int
*Main> l2
Cons "Hello" (Cons "World" (Cons "we" (Cons "meet" (Cons "again" Nil))))
*Main> pairs l1 l2
Cons (1,"Hello") (Cons (1,"World") (Cons (1,"we") (Cons (1,"meet") (Cons (1,"again") (Cons (2,"Hello") (Cons (2,"World") (Cons (2,"we") (Cons (2,"meet") (Cons (2,"again") (Cons (3,"Hello") (Cons (3,"World") (Cons (3,"we") (Cons (3,"meet") (Cons (3,"again") (Cons (4,"Hello") (Cons (4,"World") (Cons (4,"we") (Cons (4,"meet") (Cons (4,"again") (Cons (5,"Hello") (Cons (5,"World") (Cons (5,"we") (Cons (5,"meet") (Cons (5,"again") Nil))))))))))))))))))))))))
```

# Implementing Applicative for a custom List type in Haskell

Continuing with the Haskell theme, this is a small implementation of `Applicative` for a custom `List` type in Haskell.

Here is the general definition of `Applicative` as given in Professor Graham Hutton’s exemplary book, “Programming in Haskell” (2nd Edition) – one of the best books on Haskell:

```class Functor f => Applicative f where
pure :: a -> f a

(<*>) :: f (a -> b) -> f a -> fb
```

This means that we have to make our custom `List` type into an instance of `Functor` before we can make it an instance of `Applicative`. Here is the definition of `Functor` from the same book:

```class Functor f where
fmap :: (a -> b) -> f a -> f b
```

Okay, enough preamble, let’s get started with it. First, we define our custom `List` type:

```data List a = Nil | Cons a (List a) deriving (Eq, Ord, Show, Read)
```

Now we define a function called `append`. This is our analogue of the built-in `++` generic append operator in Haskell. This will be required when implementing the `Applicative` code:

```-- append function for our custom List type
append :: List a -> List a -> List a
append Nil ys = ys
append (Cons x xs) ys = Cons x (append xs ys)
```

Almost there! Let’s make our `List` type a `Functor` type:

```-- make it an instance of the Functor typeclass
instance Functor List where
-- fmap :: (a -> b) -> List a -> List b
fmap _ Nil = Nil
fmap g (Cons x xs) = Cons (g x) (fmap g xs)
```

Simple enough – we simply implement the `fmap` function using Pattern Matching. Now, finally, let’s make our type an `Applicative Functor`!

```-- now make the List type an instance of Applicative
instance Applicative List where
-- pure :: a -> List a
pure x = Cons x Nil

-- (<*>) :: List (a -> b) -> List a -> List b
Nil <*> _ = Nil
(Cons g gs) <*> xs = append (fmap g xs) (gs <*> xs)
```

That’s it! We’re all done. Again, we use Pattern Matching in the implementation of , the applicative operator. As promised earlier, we make use of both `fmap` and `append` in this definition.

Finally, let’s define some test data:

```-- test data
l1 :: List Int
l1 = Cons 1 (Cons 2 (Cons 2 Nil)) -- [1, 2, 3]

l2 :: List Int
l2 = Cons 10 (Cons 20 Nil) -- [10, 20]

l3 :: List Int
l3 = Nil -- []
```

Let’s try out a few tests to make sure that it works correctly!

```*Main> l1
Cons 1 (Cons 2 (Cons 2 Nil))

*Main> l2
Cons 10 (Cons 20 Nil)

*Main> l3
Nil

*Main> pure (+1) <*> l1
Cons 2 (Cons 3 (Cons 3 Nil))

*Main> pure (+) <*> l1 <*> l2
Cons 11 (Cons 21 (Cons 12 (Cons 22 (Cons 12 (Cons 22 Nil)))))

*Main> pure (\x -> \y -> \z -> x * y * z) <*> l1 <*> l2 <*> l1
Cons 10 (Cons 20 (Cons 20 (Cons 20 (Cons 40 (Cons 40 (Cons 20 (Cons 40 (Cons 40 <br/>(Cons 40 (Cons 80 (Cons 80 (Cons 20 (Cons 40 (Cons 40 (Cons 40 (Cons 80 <br/>(Cons 80 Nil)))))))))))))))))

*Main> pure (\x -> \y -> \z -> x * y * z) <*> l1 <*> l2 <*> Nil
Nil

*Main> pure (^200) <*> l3
Nil

*Main> pure (\x y z w -> (x+y) * (z+w)) <*> l1 <*> l2 <*> l1 <*> l2
Cons 121 (Cons 231 (Cons 132 (Cons 242 (Cons 132 (Cons 242 (Cons 231 (Cons 441 <br/>(Cons 252 (Cons 462 (Cons 252 (Cons 462 (Cons 132 (Cons 252 (Cons 144 (Cons <br/>264 (Cons 144 (Cons 264 (Cons 242 (Cons 462 (Cons 264 (Cons 484 (Cons 264 (Cons <br/>484 (Cons 132 (Cons 252 (Cons 144 (Cons 264 (Cons 144 (Cons 264 (Cons <br/>242 (Cons 462 (Cons 264 (Cons 484 (Cons 264 (Cons 484 <br/>Nil)))))))))))))))))))))))))))))))))))

```

Most satisfying indeed!

# A simple Guessing Game implementation in Haskell

Just a quick program since it’s been some time since I updated my blog!

```module Main where

import System.Random

main :: IO ()
main = do putStrLn "Try to guess the secret word ([1-100])..."
secret <- randomRIO (1, 100)
play secret

play :: Int -> IO ()
play secret = do guesses <- playGame secret 0
putStrLn \$ "You win! You took " ++ show guesses ++ " guesses!"

playGame :: Int -> Int -> IO Int
playGame secret guesses = do putStr "? "
input <- getLine
let guess = read input :: Int
if guess == secret then
do return (guesses + 1)
else if guess < secret then
do putStrLn "Too small!"
playGame secret (guesses + 1)
else do putStrLn "Too big!"
playGame secret (guesses + 1)
```

The code is pretty straightforward – the only tricky part I faced was implementing the random number generation (quite a complicated topic in Haskell).

Running it,

```\$ ghc guessingGame.hs
[1 of 1] Compiling Main             ( guessingGame.hs, guessingGame.o )

\$ ./guessingGame
Try to guess the secret word ([1-100])...
50
? Too small!
75
? Too big!
63
? Too big!
56
? Too small!
59
? Too small!
61
? You win! You took 6 guesses!
```

Exciting, eh? 🙂

# Changing the Java language using OpenJDK – a small experiment

Of late, I have been fascinated with languages, compilers, and specifically parsing and code generation. To that end, I started looking into how the Java language itself might be modified. It turns out that Java itself uses a handwritten lexer and parser with a very well-defined tree structure for the AST. That being said, good documentation if hard to come by.

I chanced upon this gem of a blog – Making Java esoteric which shows a simple example of how to add a `maybe` keyword to Java so that it may randomly execute statements and/or blocks.

I was inspired enough to try out my first hack on OpenJDK. To that end, I decided to implement a `when` statement that would essentially act as its Common Lisp equivalent.

My attempt at implementing the `when` keyword didn’t go down so well since it appears that the word has been used in umpteen places in the JDK itself. All right, let’s just use the Russian equivalent, `kogda` meaning essentially the same.

My basic logic was to define the new keyword, `kogda` that would work with block statements or simple statements, and then offload the actual work to the `if` statement. So here were the code changes in order:

First off the lexer:
`/langtools/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/Tokens.java`):

``` public enum TokenKind implements Formattable, Filter<TokenKind> {
...
KOGDA("kogda"),
...
}
```

And then the parser:`/langtools/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParer.java`:

``` List<JCStatement> blockStatement() {
...
case KOGDA:
...
case ASSERT:
return List.of(parseSimpleStatement());
...
}
```

and

``` public JCStatement parseSimpleStatement() {
...
case KOGDA: {
nextToken();
JCExpression cond = parExpression();
JCStatement body = parseStatementAsBlock();
return F.at(pos).Kogda(cond, body);
}
...
}
```

Nothing special here – we simply read in the next token, get the conditional part, and then get the body of the `when` expression/statement.

And then finally, `langtools/src/jdk.compiler/share/classes/com/sun/tools/javac/tree/TreeMaker.java`:

```public JCIf Kogda(JCExpression cond, JCStatement body) {
JCIf tree = new JCIf(cond, body, null);
tree.pos = pos;

return tree;
}
```

This is the actual meat of the logic per se. In this case, I simply offload the work to the reliable `if` expression and insert `null` for the `else` part. That’s all there is to it!

Then we build the new version of the JDK:

```Macushla:9dev z0ltan\$ pwd
/Users/z0ltan/Projects/Resources/Languages/Java/9dev

Macushla:9dev z0ltan\$ make clean && make images
<output elided>

Macushla:9dev z0ltan\$ cd build/macosx-x86_64-normal-server-release/jdk/bin/

Macushla:bin z0ltan\$ ./javac -version
javac 9-internal
```

Here’s our little test program to test out:

```Macushla:bin z0ltan\$ cat HelloKogda.java
public class HelloKogda {
public static void main(String[] args) {
kogda (1 < 2) {
System.out.println("Yes, 1 < 2");
}

kogda (2 > 100) {
System.out.println("2 > 100");
}

kogda (1 < 10 && 1 == 1) System.out.println("Booyah!");
}
}
```

And finally, compiling and running the test code using our custom-built JDK:

```Macushla:bin z0ltan\$ ./javac HelloKogda.java
Macushla:bin z0ltan\$ ./java -cp . HelloKogda
Yes, 1 < 2
Booyah!
```

Nice! There was just one thing that I was not completely happy with – `JShell`. In theory, it should be trivial to modify `JShell` to reflect these same changes, but I had immense trouble trying to get the `jdk.jshell` project pick up the changes from my modified `jdk.compiler` project instead of the default JDK (which, of course, does not contain my changes). Maybe when I work my head around the internals of the whole ecosystem, I will post an update here. For now, this was a fun experiment!

# A parser using Regular Expressions?

The oft-quoted joke is that Regular Expressions are such a useful tool that it is inevitably abused beyond its abilities. Case in point – Dmitry Olshansky’s Dconf 2017 talk where he mentions the tonnes of questions on StackOverflow about how to write a full-fledged HTML parser using only regular expressions (not quite possible!).

That amused me to such an extent that I decided to experiment a bit with regular expressions and to see if I could use a regular expression engine (which, of course, do not conform to strict regular languages (think backreferences, etc.). A modern regex engine (such as Java’s `java.util.regex` package) is suitably powerful enough to be a perfect candidate for such wanton abuse!

All right, so what’s the scope here? For the purpose of this experiment, I will restrict the scope of the parser to the following:

1. Allow `List` initialisation of the form
```          List<Double> specialNos = { 3.14519, 2.71828, 1.818181 }
```
2. Allow similar syntactic sugar for `Set`S:
```          Set<String> languages = { "Java", "Rust", "Haskell", "Common Lisp", "C++" }
```
3. A Rubyesque `Map` initialisation syntax:
```          Map<Integer, String> nos = { 1 => "One"; 2 => "Two"; 3 => "Three" }
```

So here’s how a sample file might look like:

```Macushla:basic-regex-parser z0ltan\$ cat samples/com/z0ltan/regexparser/test/HelloParser.javax
package com.z0ltan.regexparser.test;

import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class HelloParser {
public static void main(String[] args) {
listDemo();

setDemo();

mapDemo();
}

private static void listDemo() {
// normal list
List<String> greetings = Arrays.asList("Hello", "Hallo", "Hola");
System.out.printf("greetings = %s\n", greetings);

// custom list syntax
List<Integer> myNums =
{ 1, 3, 5, 7, 9, 11 }

System.out.printf("myNums = %s\n", myNums);

}

private static void setDemo() {
// normal set
Set<Integer> evensBelow10 = new HashSet<>();

System.out.printf("evensBelow10 = %s\n", evensBelow10);

// custom set syntax
Set<String> mammals = { "Cat", "Dog", "Lion", "Beaver", "Raccoon" }

System.out.printf("mammals = %s\n", mammals);
}

private static void mapDemo() {
// normal map
Map<Integer, String> numberToString = new HashMap<>();
numberToString.put(1,  "One");
numberToString.put(2, "Two");
numberToString.put(3,  "Three");

System.out.printf("numberToString = %s\n",  numberToString);

// custom map syntax
Map<Integer, Foo> numToFoo =
{
1 => new Foo(1, "Ein");

2 => new Foo(2, "Zwei");

3 => new Foo(3, "Drei");

4 => new Foo(4, "Vier");

5 => new Foo(5, "Fuenf");
}

System.out.printf("numToFoo = %s\n", numToFoo);
}

static class Foo {
private int id;
private String name;

public Foo(int id, String name) {
this.id = id;
this.name = name;
}

@Override
public String toString() {
return "Foo { id = " + id + ", name = " + name + " }";
}
}
}
```

So this sample input file includes examples for all three variants. So how would the parser for this look like? Let’s split the parser into three variants as well:

The parser for `List`:

```/**
* Handle custom List syntax.
*
* @param contents
* @return
*/
private static String parseListSyntax(String contents) {
Matcher m = Patterns.LIST_PATTERN.matcher(contents);

while (m.find()) {
if (m.groupCount() == 3) {
String type = m.group(1);
String var = m.group(2);
String val = m.group(3).trim();

String replacement = "List<" + type + ">" + var + " = new ArrayList<" + type + ">() {{ ";
String[] vals = val.split(",");

for (String rep : vals) {
String v = rep.trim();

if (!v.isEmpty()) {
replacement += "add(" + v + ");\n\t\t\t\t\t ";
}
}
replacement += "}};";

contents = m.replaceFirst(replacement);

m = Patterns.LIST_PATTERN.matcher(contents);
}
}
return contents;
}
```

The parser for `Set`:

```/**
* Handle custom Set syntax
*
* @param contents
* @return
*/
private static String parseSetSyntax(String contents) {
Matcher m = Patterns.SET_PATTERN.matcher(contents);

while (m.find()) {
if (m.groupCount() == 3) {
String type = m.group(1);
String var = m.group(2);
String val = m.group(3).trim();

String replacement = "Set<" + type + "> " + var + " = new HashSet<" + type + ">() {{ ";
String[] vals = val.split(",");

for (String rep : vals) {
String v = rep.trim();

if (!v.isEmpty()) {
replacement += "add(" + v + ");\n\t\t\t\t\t";
}
}

replacement += " }};";

contents = m.replaceFirst(replacement);

m = Patterns.SET_PATTERN.matcher(contents);
}
}

return contents;
}
```

And finally, the parser for `Map`:

```/**
* Handle custom Map syntax
*
* @param contents
* @return
*/
private static String parseMapSyntax(String contents) {
Matcher m = Patterns.MAP_PATTERN.matcher(contents);

while (m.find()) {
if (m.groupCount() == 3) {
String type = m.group(1);
String var = m.group(2);
String val = m.group(3).trim();

String replacement = "Map<" + type + "> " + var + " = new HashMap<" + type + ">() {{";
String[] vals = val.split(";");

for (String rep : vals) {
String v = rep.trim();

if (!rep.isEmpty()) {
String[] keyVal = v.split("=>");

if (keyVal == null || keyVal.length != 2 || keyVal[0] == null || keyVal[1] == null) {
throw new RuntimeException("invalid map syntax");
}
replacement += "put(" + keyVal[0].trim() + ", " + keyVal[1].trim() + ");\n\t\t\t\t\t";
}
}

replacement += " }};";

contents = m.replaceFirst(replacement);

m = Patterns.MAP_PATTERN.matcher(contents);
}
}

return contents;
}
```

The code is quite straightforward and similar for all three variants – we simply to see if the relevant pattern is found in the current code contents (the whole source file is read in in one go), and if so, the relevant bit is rewritten, and the updated `String` is then passed to the next stage of the pipeline for further processing.

One thing to observe here is that in every parsing method, the `find` operation is performed in a loop. This is so that every occurrence can be updated correctly, and that is also the reason why the `Matcher` is updated at the end of every iteration of the loop – `m = Patterns.LIST_PATTERN.matcher(contents)`, for instance. If this is not done so, the result will be an infinite loop since the `while (m.find())` will never fail!

Here’s how the main parsing method looks like:

```public static void parse(final String fileName) {
String ext = fileName.substring(fileName.lastIndexOf('.') + 1);

if (!ext.equalsIgnoreCase("javax")) {
throw new RuntimeException("input files must have a .javax extension");
}

File srcFile = new File(fileName);

if (!srcFile.exists()) {
throw new RuntimeException("source file " + fileName + " does not exist!");
}

final String basePath = srcFile.getAbsolutePath().substring(0,
srcFile.getAbsolutePath().lastIndexOf(File.separator));
final String baseFileName = srcFile.getName().substring(0, srcFile.getName().lastIndexOf('.'));
final String outputFile = basePath + File.separator + baseFileName
+ ".java";

String contents = reader.lines().reduce("", (acc, s) -> {
acc += s;
acc += "\n";
return acc;
});

contents = insertImports(contents);
contents = parseListSyntax(contents);
contents = parseSetSyntax(contents);
contents = parseMapSyntax(contents);

try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {
writer.write(contents);
} catch (Exception ex) {
throw ex;
}

} catch (Exception ex) {
throw new RuntimeException("Error during parsing: " + ex.getLocalizedMessage());
}
}
```

A final subtlety is in the way the `import`S are handled. For the full source code as well as more explanation on the way the whole tool works, visit the Github page.

Let’s take it for a spin:

```Macushla:basic-regex-parser z0ltan\$ ls

Macushla:basic-regex-parser z0ltan\$ mvn package 1> /dev/null

Macushla:basic-regex-parser z0ltan\$ java -jar target/basic-regex-parser-1.0-SNAPSHOT.jar samples/Default.javax samples/com/z0ltan/test/Package.javax samples/com/z0ltan/regexparser/test/HelloParser.javax
```

Let’s see how the sample file shown earlier has been rewritten:

```Macushla:basic-regex-parser z0ltan\$ cat samples/com/z0ltan/regexparser/test/HelloParser.java
package com.z0ltan.regexparser.test;

import java.util.*;

import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class HelloParser {
public static void main(String[] args) {
listDemo();

setDemo();

mapDemo();
}

private static void listDemo() {
// normal list
List<String> greetings = Arrays.asList("Hello", "Hallo", "Hola");
System.out.printf("greetings = %s\n", greetings);

// custom list syntax
List<Integer>myNums = new ArrayList<Integer>() {{ add(1);
}};

System.out.printf("myNums = %s\n", myNums);

}

private static void setDemo() {
// normal set
Set<Integer> evensBelow10 = new HashSet<>();

System.out.printf("evensBelow10 = %s\n", evensBelow10);

// custom set syntax
Set<String> mammals = new HashSet<String>() {{ add("Cat");
}};

System.out.printf("mammals = %s\n", mammals);
}

private static void mapDemo() {
// normal map
Map<Integer, String> numberToString = new HashMap<>();
numberToString.put(1,  "One");
numberToString.put(2, "Two");
numberToString.put(3,  "Three");

System.out.printf("numberToString = %s\n",  numberToString);

// custom map syntax
Map<Integer, Foo> numToFoo = new HashMap<Integer, Foo>() {{put(1, new Foo(1, "Ein"));
put(2, new Foo(2, "Zwei"));
put(3, new Foo(3, "Drei"));
put(4, new Foo(4, "Vier"));
put(5, new Foo(5, "Fuenf"));
}};

System.out.printf("numToFoo = %s\n", numToFoo);
}

static class Foo {
private int id;
private String name;

public Foo(int id, String name) {
this.id = id;
this.name = name;
}

@Override
public String toString() {
return "Foo { id = " + id + ", name = " + name + " }";
}
}
}
```

Excellent! Now let’s try and run all the parsed files to make sure they are all working as expected:

```Macushla:basic-regex-parser z0ltan\$ cd samples/

Macushla:samples z0ltan\$ javac Default.java
Macushla:samples z0ltan\$ java -cp . Default
users = [Cornelius, Marcus, Linnaeus, Gaius, Augustus]

Macushla:samples z0ltan\$ javac com/z0ltan/test/Package.java
Macushla:samples z0ltan\$ java -cp . com.z0ltan.test.Package
map = {One=1, Two=2, Three=3}
normalMap = {Hello=100, Again=300, World=200}

Macushla:samples z0ltan\$ javac com/z0ltan/regexparser/test/HelloParser.java
Macushla:samples z0ltan\$ java -cp . com.z0ltan.regexparser.test.HelloParser
greetings = [Hello, Hallo, Hola]
myNums = [1, 3, 5, 7, 9, 11]
evensBelow10 = [2, 4, 6, 8]
mammals = [Cat, Lion, Beaver, Raccoon, Dog]
numberToString = {1=One, 2=Two, 3=Three}
numToFoo = {1=Foo { id = 1, name = Ein }, 2=Foo { id = 2, name = Zwei }, 3=Foo { id = 3, name = Drei }, 4=Foo { id = 4, name = Vier }, 5=Foo { id = 5, name = Fuenf }}
```

Nice! Of course, this is just a toy tool that does trivial rewriting. If Java had a proper macro system (or even a macro system to begin with!), all this would have been trivial at best without so much redundant code. Nevertheless, it does illustrate the regular expression engines are indeed quite powerful well beyond their original intent.

EDIT: Just in case you are curious as to what the patterns look like, here they are:

```public class Patterns {
public static final Pattern LIST_PATTERN = Pattern
.compile("List<([a-zA-Z]+)>\\s+([a-zA-Z0-9]+)\\s*=\\s*\\{\\s*([a-zA-Z0-9,\" ]+)\\s*\\}");

public static final Pattern SET_PATTERN = Pattern
.compile("Set<([a-zA-Z]+)>\\s+([a-zA-Z0-9]+)\\s*=\\s*\\{\\s*([a-zA-Z0-9,\" ]+)\\s*\\}");

public static final Pattern MAP_PATTERN = Pattern
.compile("Map<([a-zA-Z, ]+)>\\s+([a-zA-Z0-9]+)\\s*=\\s*\\{\\s*([a-zA-Z0-9\"\\s=>();, ]+)\\s*\\}");

public static final Pattern PACKAGE_PATTERN = Pattern
.compile("package\\s+([a-zA-Z0-9._;]+)");

public static final Pattern IMPORT_PATTERN = Pattern
.compile("import\\s+([a-zA-Z0-9.;]+)");

public static final Pattern PUBLIC_CLASS_PATTERN = Pattern
.compile("public class\\s+([a-zA-Z0-9_]+)");

public static final Pattern NORMAL_CLASS_PATTERN = Pattern
.compile("class\\s+([a-zA-Z0-9_]+)");
}
```

You may safely ignore the patterns after the first three patterns – they are used to insert the `import java.util.*;` declaration at the first valid location. Again, for more details, visit the Github location!

# Creating custom Java 8 Stream collectors

The streams feature (which makes heavy use of Functional Interfaces) is arguably the strongest feature in Java 8 (and above). The introduction of lambdas in Java in the same release to target Functional Interfaces also meant that creating chained operations in a functional style has never been easier in Java.

That being said, there are plenty of examples in the official docs that show how streams can be “collected” (effectively reduced/folded depending on your preference of the terminology) into Collection types – `List`, `Set`, or even `Map`. Now, I must make it absolutely clear at this stage that I love Java streams, and barring the absence (or rather, deprecation) of the `zip` iterator, it’s almost comprehensive. However, there are no examples in the docs to show how we might a series of intermediate operations into a custom type. Of course the helper class, `Collectors` has several helper methods such as `groupingBy`, `partitioningBy`, `filtering`, and `reducing`, but they either return a `Map`, or expect a reducible expression which may not always be the case as explained next.

Recently, I did a project in which I needed to process a stream of integers (the lack of `zip` forced me to take quite the peripatetic approach to finally make things work. Perhaps more on that in a later blogpost) acting as indices into a simple wrapper around a `List` of integers, and then ultimately collect the updated values of the list into a new instance of the custom type. It was quite an interesting experience that sparked interest in exploring how much more we could push the `collect` mechanism. (If you are interested in checking out the code for the mentioned example, you can find it here – Functional Nim.

For some more examples of custom `Collector` implementations, you can check out my Github page.

### Use Case

For the purposes of this blog, to keep things simple, let us consider a a hypothetical example. Suppose we have a `Point` class with the following structure:

```package com.z0ltan.custom.collectors.types;

public class Point {
private int x;
private int y;

public Point(final int x, final int y) {
this.x = x;
this.y = y;
}

public int x() {
return this.x;
}

public int y() {
return this.y;
}

@Override
public int hashCode() {
return (this.x + this.y) % 31;
}

@Override
public boolean equals(Object other) {
if (other == null || !(other instanceof Point)) {
return false;
}

Point p = (Point) other;

return p.x == this.x && p.y == this.y;
}

@Override
public String toString() {
return "Point { x = " + x + ", y = " + y + " }";
}
}
```

and we have a `List` of such points. Now suppose we wish to collate the information into a custom `Points` object which has the following structure:

```package com.z0ltan.custom.collectors.types;

import java.util.Set;

public class Points {
private Set<Integer> xs;
private Set<Integer> ys;

public Points(final Set<Integer> xs, final Set<Integer> ys) {
this.xs = xs;
this.ys = ys;
}

@Override
public int hashCode() {
return (this.xs.hashCode() + this.ys.hashCode()) % 31;
}

@Override
public boolean equals(Object other) {
if (other == null || !(other instanceof Points)) {
return false;
}

Points p = (Points) other;

return p.xs.equals(this.xs) && p.ys.equals(this.ys);
}

@Override
public String toString() {
return "Points { xs = " + xs + ", ys = " + ys + " }";
}
}
```

As can be seen, we wish to retain only the unique `x` and `y` coordinate values into our final object.

A simple and logical way would be to `collect` the items from this stream (`collect` being a “terminal operation” defined in the `Stream` interface) into our `Points` object using a custom collector. Before we can do that, let us first understand what is involved in implementing a custom collector.

### How to implement a custom collector

In brief, there are two forms of operations involved in Java streams – non-terminal or intermediate operations, which produce streams of their own, and terminal operations, which effectively stop the pipeline resulting in a final result of some sort.

As mentioned before, in most cases, the helper class, `Collectors`, provides enough functionality to cater to almost any requirement imaginable. However, in cases such as these, where we want to collect data into a custom type, we might be better off defining our own custom collector.

To do that, let us examine the signature of the `collect` method in the `Stream` interface. In fact, we will find that there are two versions of this method available to us:

` R collect​(Collector collector)`

and

``` R collect​(Supplier supplier,
BiConsumer accumulator,
BiConsumer combiner)```

So which one do we use? Well, we can actually use either one for our purposes. In fact, the former is preferable if we have some involved logic, and we wish to encapsulate all of that in a nice class. However, functionally speaking, the latter is exactly the same. This can be further clarified by examining the `Collector` interface (showing only the abstract methods):

```public interface Collector<T, A, R> {
Supplier<A>	supplier​()
BiConsumer<A,T>	accumulator​();
BinaryOperator<A>	combiner​()
Function<A,R>	finisher​()
Set<Collector.Characteristics>	characteristics​()
}
```

As can be seen, if we do implement the `Collector` interface, we will have to implement essentially the same methods as that used by the second version of the `collect` method. In addition, we have a couple of extra methods which are not only interesting, but quite vital if we wish to implement the interface:

• `finisher`: This is the main method that we need to implement as per our collection logic. This is the actual part where the accumulated values of the stream are massaged together into the final return value. The type parameters give a big hint in this regard – the return type,
`R` is the same as that returned by the overall `collect`
method.
• `characteristics`: This is where we need to be careful. The `enum` has three variants – `CONCURRENT`, `IDENTITY_FINISH`, and `UNORDERED`. The bottomline is this – always use `CONCURRENT` for your custom types if the final value depends on the ordering of the values in the stream, or use `UNORDERED` if they do not. In the case of collecting values into a custom non-collection type, I don’t see any scenario where you would want to use `IDENTITY_FINISH` (unless you are a big fan of unsolicited `ClassCastException`S).

In short, this variant indicates that the `finisher` function is essentially an identity function, meaning that it can be skipped, and the currently accumulated value returned as the overall result (which is precisely what we wish to avoid).

One final comment to understand the `collect` method once and for all – what all those terms mean!

• `Supplier`: This is the mechanism by which the input values are supplied to the `collect` method.
• `Accumulator`: This is where the elements of the stream are combined with a running accumulator (which may be of a different type from the elements themselves), “reduced”, or “folded” in Functional terms.
• `Combiner`: Similar to the accumulator, but the elements being combined together are of the same type. In most cases, this type would be a collection type, and finally,
• `Finisher`: This is the meat of the whole collector. This is where the actual custom logic goes into to take the values produced by the combiner into the final result of the given return type.

Now that we’ve analysed the signature of the `collect` method, we must be in a position to realise that we can actually create custom collectors in multiple ways:

• Using the static `of` methods in the `Collector` interface by supplying the correct supplier, accumulator, combiner, finisher, and Collector characteristics,
• By creating a class that implements the `Collector` interface itself, and thereby providing implementations of the same supplier, accumulator, combiner, finisher, and Collector characteristics,
• By simply creating any anonymous class conforming to the `Collector` interface, and providing the same inputs as in the previous two cases, or
• Using any combination of the above.

To keep matters simple, let us create a custom class that implements the `Collector` interface. This will not only make things easier to understand, but also allow us to maintain code cleanliness.

Now let’s proceed with the implementation of the given use case to solidify these concepts.

### Implementation and Demo

Let’s create a simple Maven project called `custom_stream_collectors`:

```Macushla:Blog z0ltan\$ mvn archetype:generate -DgroupId=com.z0ltan.custom.collectors -DartifactId=custom-collector -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------

<elided>

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30.967 s
[INFO] Finished at: 2017-07-11T20:57:45+05:30
[INFO] Final Memory: 18M/62M
[INFO] ------------------------------------------------------------------------

```

After customising the project to our heart’s desire, let’s fill in our custom collector class:

```package com.z0ltan.custom.collectors.collectors;

import java.util.ArrayList;
import java.util.Collections;
import java.util.EnumSet;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;

import com.z0ltan.custom.collectors.types.Point;
import com.z0ltan.custom.collectors.types.Points;

public class PointToPointsCollector implements Collector<Point, List<Point>, Points> {
@Override
public Supplier<List<Point>> supplier() {
return ArrayList::new;
}

@Override
public BiConsumer<List<Point>, Point> accumulator() {
}

@Override
public BinaryOperator<List<Point>> combiner() {
return (acc, ps) -> {
return acc;
};
}

@Override
public Function<List<Point>, Points> finisher() {
return (points) -> {
final Set<Integer> xs = new HashSet<>();
final Set<Integer> ys = new HashSet<>();

for (Point p : points) {
}

return new Points(xs, ys);
};
}

@Override
public Set<java.util.stream.Collector.Characteristics> characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.UNORDERED));
}
}
```

We use `ArrayList::new` (method references are another excellent feature in Java 8 and beyond) for our `Supplier` since we start off with a blank slate, and for the `Accumulator`, we use `List::add` since the last section made it clear that the accumulator’s only job is to keep collecting items into running value of another type (a `List` in this case).

Then we have the `Combiner` which is implemented by the little lambda expression:

```   (acc, ps) -> { acc.addAll(ps); return acc; }
```

As mentioned in the previous section, the combiner simply flattens the collections together into a single collection. In case of confusion, always look to the type signature for clarity.

And finally, we have the `Finisher`:

```return (points) -> {
final Set<Integer> xs = new HashSet<>();
final Set<Integer> ys = new HashSet<>();

for (Point p : points) {
}

return new Points(xs, ys);
};
```

At this point of the stream pipeline, the `points` variable holds the list of accumulated `Point` objects. All we do then is to create an instance of the `Points` class by using the data available in the `points` variable. The whole point (if you will forgive the pun) is that this method will have logic peculiar to your specific use case, so the implementation will vary tremendously (which is more than can be said about the others – supplier, accumulator, and combiner).

And finally, here is our main class:

```package com.z0ltan.custom.collectors;

import java.util.Arrays;
import java.util.List;

import com.z0ltan.custom.collectors.collectors.PointToPointsCollector;
import com.z0ltan.custom.collectors.types.Point;
import com.z0ltan.custom.collectors.types.Points;

public class Main {
public static void main(String[] args) {
final List<Point> points = Arrays.asList(new Point(1, 2), new Point(1, 2), new Point(3, 4), new Point(4, 3),
new Point(2, 5), new Point(2, 5));

// the result of our custom collector
final Points pointsData = points.stream().collect(new PointToPointsCollector());

System.out.printf("\npoints = %s\n", points);
System.out.printf("\npoints data = %s\n", pointsData);
}
}
```

Well, let’s run it and see the output!

```Macushla:custom-collector z0ltan\$ mvn package && java -jar target/custom-collector-1.0-SNAPSHOT.jar
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building custom-collector 1.0-SNAPSHOT
<elided>
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.627 s
[INFO] Finished at: 2017-07-11T21:31:19+05:30
[INFO] Final Memory: 16M/55M
[INFO] ------------------------------------------------------------------------

points = [Point { x = 1, y = 2 }, Point { x = 1, y = 2 }, Point { x = 3, y = 4 }, Point { x = 4, y = 3 }, Point { x = 2, y = 5 }, Point { x = 2, y = 5 }]

points data = Points { xs = [1, 2, 3, 4], ys = [2, 3, 4, 5] }

```

Success!

# Implementing Nat in Rust (a la Haskell)

The power of Algebraic Data Types comes to the fore when dealing with complex problem domains. Most modern languages tend towards supporting some form of ADTs, but the complexity of expressing such paradigms is not completely lost when compared to languages that were designed with such support in mind.

Case in point – defining the prototypical example of an algebraic data type, defining the Natural Number type (`Nat`). Let’s compare the naive implementation in two languages that do have varying support for ADTs – Haskell, and Rust.

First, the Haskell version. Haskell, just like ML and other members of this family, is eminently suited for describing such types almost effortlessly:

```module NatDemo where

import Test.HUnit

data Nat = Zero | Succ Nat deriving (Show, Eq)

nat2int :: Nat -> Int
nat2int Zero = 0
nat2int (Succ n) = 1 + nat2int n

int2nat :: Int -> Nat
int2nat 0 = Zero
int2nat n = Succ (int2nat (n-1))

add :: Nat -> Nat -> Nat

-- test cases
test1 = TestCase (assertEqual "nat to int" (nat2int (Succ (Succ Zero))) 2)
test2 = TestCase (assertEqual "int to nat" (int2nat 5) (Succ (Succ (Succ (Succ (Succ Zero))))))
test3 = TestCase (assertEqual "add two nats" (add (int2nat 5) (int2nat 6)) (int2nat 11))

tests = TestList [test1, test2, test3]

run = runTestTT tests
```

Running it to confirm:

```
Macushla:Nat-in-Rust z0ltan\$ ghci
GHCi, version 8.0.2: http://www.haskell.org/ghc/  😕 for help

Prelude> :l Nat.hs
[1 of 1] Compiling NatDemo          ( Nat.hs, interpreted )

*NatDemo> run
Cases: 3  Tried: 3  Errors: 0  Failures: 0
Counts {cases = 3, tried = 3, errors = 0, failures = 0}
```

As can be seen, it’s almost trivial to define the `Nat` type in Haskell, as well as write functions that operate on this type. It reads almost exactly like the mathematical version of the concept. Simply beautiful.

Now let’s show the Rust version first, and then perhaps a comment or two on the implementation.

```#[derive(Debug, PartialEq)]
enum Nat {
Zero,
Succ(Box<Nat>),
}

/// open it up for code readability
use Nat::{Zero, Succ};

fn nat_to_int(n: Nat) -> i32 {
match n {
Zero => 0,
Succ(n) => 1 + nat_to_int(*n),
}
}

fn int_to_nat(n: i32) -> Nat {
match n {
0 => Zero,
n => Succ(Box::new(int_to_nat(n-1))),
}
}

fn add(m: Nat, n: Nat) -> Nat {
match (m, n) {
(Zero, n) => n,
}
}

fn main() {
assert_eq!(nat_to_int(Succ(Box::new(Succ(Box::new(Succ(Box::new(Zero))))))), 3);
assert_eq!(int_to_nat(5), Succ(Box::new(Succ(Box::new(Succ(Box::new(Succ(Box::new(Succ(Box::new(Zero)))))))))));
}
```

Let’s make sure that it’s all in working condition:

```Macushla:Nat-in-Rust z0ltan\$ rustc nat.rs && ./nat
Macushla:Nat-in-Rust z0ltan\$
```

Excellent! It all just works. However, a couple of points to notice – first of all, because Rust doesn’t really have a runtime, we need to wrap the recursive part in a `Box` (amongst other choices) so that the compiler knows what size the data is (a `box`, being a pointer, is of a fixed size determinable at compile time). Of course, the test code is positively ghastly because we have to wrap all calls with `Box::new`, not to mention to remember to dereference the boxed value inside the functions.

Secondly, the code is surprisingly clean for the most part. However, writing the code in not nearly as intuitive as writing the same in Haskell (or ML for that matter). I can only imagine how much more daunting the task would be in a language without pattern matching (amongst other higher-level features) such as Java. Of course we would make it work, but the distance between the intent of the code and the code itself would be much much higher. Syntax does matter, folks!

Well, that was a nice little experiment of an evening.

# Compile time evaluation in Rust macros… or not (contrast with Common Lisp)

Rust macros are a great help in reducing boilerplate as well as creating tools to perform advanced code manipulation at compile time (the `nom` crate comes to mind). However, I did run into its limitations (again) when I started tinkering with a small idea.

Well, the idea that I had is quite simple – create a macro that will reverse the words of a string, but defer checking of types to runtime (so that correct entries would still produce output). However, this turned out to be not so simple, again due to the fact that Rust macros apparently do not provide a way to – perform compile-time checks, or `eval` code during macroexpansion (like Lisp and Scheme/Racket macros do).

Here was my first shot at creating that simple macro in Rust:

```use std::any::Any;
use std::io::{self, Write};

fn is_string(s: &Any) -> bool {
s.is::<String>()
}

macro_rules! reverse_string {
(\$string: expr) => {{
if !is_string(\$string) {
writeln!(io::stderr(), "{:?} must be a String", \$string).unwrap();
std::process::exit(1);
}

let mut rev_string = String::new();

for word in \$string.split_whitespace().rev() {
rev_string.push_str(word);
rev_string.push(' ');
}
rev_string
}};
}

fn main() {
let my_string = "Hello, world. How do you do?".to_string();
println!("{}", reverse_string!(&my_string));
}
```

This works as expected, of course:

```Macushla:Type_Checking_Macro_Rust z0ltan\$ rustc reverse.rs && ./reverse
do? you do How world. Hello,
```

Now, suppose I added a call to the same macro, `reverse_string` on an integer instead of a string, then the results are not quite what I wanted:

```fn main() {
let my_string = "Hello, world. How do you do?".to_string();
println!("{}", reverse_string!(&my_string));

let my_num = 99;
println!("{}", reverse_string!(&my_num));
}
```

Running which gives:

```Macushla:Type_Checking_Macro_Rust z0ltan\$ rustc reverse.rs && ./reverse
error[E0599]: no method named `split_whitespace` found for type `&{integer}` in the current scope
--> reverse.rs:17:37
|
17 |                 for word in \$string.split_whitespace().rev() {
|                                     ^^^^^^^^^^^^^^^^
...
31 |     println!("{}", reverse_string!(&my_num));
|                    ------------------------ in this macro invocation

error: aborting due to previous error(s)
```

And what I really wanted was to see output for the string argument, and then the nice error message that I generate inside the macro. So what’s going on? Why can’t I generate the template, so to speak, of the macroexpansion and defer the actual error checking to runtime? Let’s expand the macro and take a peek:

```Macushla:Type_Checking_Macro_Rust z0ltan\$ rustc -Z unstable-options --pretty expanded reverse.rs
```

This produces a ton of output, but the relevant part of the expanded macro is here:

``` let my_num = 99;
::io::_print(::std::fmt::Arguments::new_v1(
{
static __STATIC_FMTSTR: &'static [&'static str] = &["", "\n"];
__STATIC_FMTSTR
},
&match (&{
if !is_string(&my_num) {
io::stderr()
.write_fmt(::std::fmt::Arguments::new_v1(
{
static __STATIC_FMTSTR: &'static [&'static str] =
&["", " must be a String\n"];
__STATIC_FMTSTR
},
&match (&&my_num,) {
(__arg0,) => {
[::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Debug::fmt)]
}
},
))
.unwrap();
std::process::exit(1);
}
let mut rev_string = String::new();
for word in &my_num.split_whitespace().rev() {
rev_string.push_str(word);
rev_string.push(' ');
}
rev_string
},) {
(__arg0,) => {
[
::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt),
]
}
},
));
```

These, of course, correspond to the fully expanded form of the following two lines:

```    let my_num = 99;
println!("{}", reverse_string!(&my_num));
```

Now we begin to see why the code doesn’t work as expected. Here’s how it works –
macroexpansion happens as part of the overall compilation phase. During this time, the Rust Type Checker is still very much active (so we cannot inject arbitrary code that doesn’t satisfy the Type Checker or the Borrow Checker). Now, Rust doesn’t really have a way to “escape” or defer the actual checking till runtime. This is as much due to the Type Checker as to the fact that the Rust macro system does not provide such means (as Lisp or Scheme/Racket macros do).

So, in this case, the Type Checker sees this snippet: `for word in &my_num.split_whitespace().rev()`, realises that we are trying to call `split_whitespace` on an `i32` variable, and immediately stops with a compilation error.

The other part (though not directly relevant here) is that all the defensive error checks using `if !is_string(...)` wouldn’t really work even if we were to try to check that at compile time, since Rust macros do not have, as far as I know, any way of doing compile-time conditional checking.

So, at this point I just stopped with the Rust version. Now, let’s try and implement the same macro using Common Lisp:

```(defmacro reverse-string (x)
"reverse the words of the string, error checking done at runtime"
`(if (not (stringp ,x))
(error "~a must be a string, not a ~a~%" ,x (type-of ,x))
,(let ((collect (gensym))
(lst (gensym))
(f (gensym))
(s (gensym)))
`(labels ((,collect (,lst)
(reduce (lambda (,f ,s)
(concatenate 'string ,f " " ,s)) ,lst)))
(,collect (reverse (loop for i = 0 then (1+ j)
as j = (position #\Space ,x :start i)
collect (subseq ,x i j)
while j)))))))
(defun main ()
(let ((s "Hello world")
(d 99))
(format t "~a reversed is ~a~%" s (reverse-string s))
(format t "~a reversed is ~a~%" d (reverse-string d))))

(defun view-macro-expansion (form)
"helper function to display the macro-expanded form for the
supplied form"
(macroexpand form))
```

The only point of interest is the `reverse-string` macro. It’s pretty much the same logic as in the attempted Rust macro – create a template that checks, at runtime, whether the supplied argument is a string, and if not, generate a proper error message. If indeed the argument is a string, then reverse the words of the original string – this is the bit being done inside the `loop` macro.

The interesting bit is that the Lisp distro that I use – SBCL, does do rigorous compile-time analysis, and actual gives plenty of notice that it’s deleting redundant code (corresponding to the actual call in main, `(format t "~a reversed is ~a~%" d (reverse-string d))` which the compiler realises will never actually be executed). However, the expanded macro has the relevant checks, and the relevant call itself is preserved so that the macro behaves exactly as desired:

```CL-USER> (main)
Hello world reversed is world Hello
```

and, in the Lisp debugger,

```99 must be a string, not a (INTEGER 0 4611686018427387903)
[Condition of type SIMPLE-ERROR]

Restarts:
0: [RETRY] Retry SLIME REPL evaluation request.
2: [REMOVE-FD-HANDLER] Remove #<SB-IMPL::HANDLER INPUT on descriptor 14: #<CLOSURE (LABELS SWANK/SBCL::RUN :IN SWANK/BACKEND:ADD-FD-HANDLER) {1002F80ADB}>>
3: [ABORT] Exit debugger, returning to top level.

Backtrace:
0: (MAIN)
1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MAIN) #<NULL-LEXENV>)
2: (EVAL (MAIN))
```

Excellent! And here is how the expanded form of the macro call actually looks like:

```CL-USER> (view-macro-expansion '(reverse-string 99))
(IF (NOT (STRINGP 99))
(ERROR "~a must be a string, not a ~a~%" 99 (TYPE-OF 99))
(LABELS ((#:G602 (#:G603)
(REDUCE
(LAMBDA (#:G604 #:G605)
(CONCATENATE 'STRING #:G604 " " #:G605))
#:G603)))
(#:G602
(REVERSE
(LOOP FOR I = 0 THEN (1+ J) AS J = (POSITION #\  99 :START I)
COLLECT (SUBSEQ 99 I J)
WHILE J)))))
T
```

Of course, I am being a bit unduly harsh on Rust here because Common Lisp, despite all vendor-specific quirks, is still pretty much a dynamic language, so we reasonably expect it to defer most type checking to runtime. In the case of Rust, it is a very strongly-typed static language, so it can ill afford to leave a lot of checking to runtime especially since it is hardly expected to have a runtime to carry out those checks (even though Rust does have a runtime, I suspect it’s quite lightweight). In any case, an interesting little experiment.