Changing the Java language using OpenJDK – a small experiment

Of late, I have been fascinated with languages, compilers, and specifically parsing and code generation. To that end, I started looking into how the Java language itself might be modified. It turns out that Java itself uses a handwritten lexer and parser with a very well-defined tree structure for the AST. That being said, good documentation if hard to come by.

I chanced upon this gem of a blog – Making Java esoteric which shows a simple example of how to add a maybe keyword to Java so that it may randomly execute statements and/or blocks.

I was inspired enough to try out my first hack on OpenJDK. To that end, I decided to implement a when statement that would essentially act as its Common Lisp equivalent.

My attempt at implementing the when keyword didn’t go down so well since it appears that the word has been used in umpteen places in the JDK itself. All right, let’s just use the Russian equivalent, kogda meaning essentially the same.

My basic logic was to define the new keyword, kogda that would work with block statements or simple statements, and then offload the actual work to the if statement. So here were the code changes in order:

First off the lexer:
/langtools/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/Tokens.java):

 public enum TokenKind implements Formattable, Filter<TokenKind> {
    ...
    KOGDA("kogda"),
    ...
}

And then the parser:/langtools/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParer.java:

 List<JCStatement> blockStatement() {
    ...
    case KOGDA:
    ...
    case ASSERT:
            return List.of(parseSimpleStatement());
    ...
}

and

 public JCStatement parseSimpleStatement() {
   ...
     case KOGDA: {
                nextToken();
                JCExpression cond = parExpression();
                JCStatement body = parseStatementAsBlock();
                return F.at(pos).Kogda(cond, body);
            }
   ...
}

Nothing special here – we simply read in the next token, get the conditional part, and then get the body of the when expression/statement.

And then finally, langtools/src/jdk.compiler/share/classes/com/sun/tools/javac/tree/TreeMaker.java:

public JCIf Kogda(JCExpression cond, JCStatement body) {
        JCIf tree = new JCIf(cond, body, null);
        tree.pos = pos;

        return tree;
    }

This is the actual meat of the logic per se. In this case, I simply offload the work to the reliable if expression and insert null for the else part. That’s all there is to it!

Then we build the new version of the JDK:

Macushla:9dev z0ltan$ pwd
/Users/z0ltan/Projects/Resources/Languages/Java/9dev

Macushla:9dev z0ltan$ make clean && make images
<output elided>

Macushla:9dev z0ltan$ cd build/macosx-x86_64-normal-server-release/jdk/bin/

Macushla:bin z0ltan$ ./javac -version
javac 9-internal

Here’s our little test program to test out:

Macushla:bin z0ltan$ cat HelloKogda.java
public class HelloKogda {
    public static void main(String[] args) {
        kogda (1 < 2) {
            System.out.println("Yes, 1 < 2");
        }

        kogda (2 > 100) {
            System.out.println("2 > 100");
        }

        kogda (1 < 10 && 1 == 1) System.out.println("Booyah!");
    }
}

And finally, compiling and running the test code using our custom-built JDK:

Macushla:bin z0ltan$ ./javac HelloKogda.java
Macushla:bin z0ltan$ ./java -cp . HelloKogda
Yes, 1 < 2
Booyah!

Nice! There was just one thing that I was not completely happy with – JShell. In theory, it should be trivial to modify JShell to reflect these same changes, but I had immense trouble trying to get the jdk.jshell project pick up the changes from my modified jdk.compiler project instead of the default JDK (which, of course, does not contain my changes). Maybe when I work my head around the internals of the whole ecosystem, I will post an update here. For now, this was a fun experiment!

Changing the Java language using OpenJDK – a small experiment

A parser using Regular Expressions?

The oft-quoted joke is that Regular Expressions are such a useful tool that it is inevitably abused beyond its abilities. Case in point – Dmitry Olshansky’s Dconf 2017 talk where he mentions the tonnes of questions on StackOverflow about how to write a full-fledged HTML parser using only regular expressions (not quite possible!).

That amused me to such an extent that I decided to experiment a bit with regular expressions and to see if I could use a regular expression engine (which, of course, do not conform to strict regular languages (think backreferences, etc.). A modern regex engine (such as Java’s java.util.regex package) is suitably powerful enough to be a perfect candidate for such wanton abuse!

All right, so what’s the scope here? For the purpose of this experiment, I will restrict the scope of the parser to the following:

  1. Allow List initialisation of the form
              List<Double> specialNos = { 3.14519, 2.71828, 1.818181 }
             
  2. Allow similar syntactic sugar for SetS:
              Set<String> languages = { "Java", "Rust", "Haskell", "Common Lisp", "C++" }
             
  3. A Rubyesque Map initialisation syntax:
              Map<Integer, String> nos = { 1 => "One"; 2 => "Two"; 3 => "Three" }
              

So here’s how a sample file might look like:

Macushla:basic-regex-parser z0ltan$ cat samples/com/z0ltan/regexparser/test/HelloParser.javax
package com.z0ltan.regexparser.test;

import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class HelloParser {
    public static void main(String[] args) {
        listDemo();

        setDemo();

        mapDemo();
    }

    private static void listDemo() {
        // normal list
        List<String> greetings = Arrays.asList("Hello", "Hallo", "Hola");
        System.out.printf("greetings = %s\n", greetings);

        // custom list syntax
        List<Integer> myNums =
                    { 1, 3, 5, 7, 9, 11 }

        System.out.printf("myNums = %s\n", myNums);

    }

    private static void setDemo() {
        // normal set
        Set<Integer> evensBelow10 = new HashSet<>();
        evensBelow10.addAll(Arrays.asList(2, 4, 6, 8));

        System.out.printf("evensBelow10 = %s\n", evensBelow10);

        // custom set syntax
        Set<String> mammals = { "Cat", "Dog", "Lion", "Beaver", "Raccoon" }

        System.out.printf("mammals = %s\n", mammals);
    }

    private static void mapDemo() {
        // normal map
        Map<Integer, String> numberToString = new HashMap<>();
        numberToString.put(1,  "One");
        numberToString.put(2, "Two");
        numberToString.put(3,  "Three");

        System.out.printf("numberToString = %s\n",  numberToString);

        // custom map syntax
        Map<Integer, Foo> numToFoo =
                {
                    1 => new Foo(1, "Ein");

                    2 => new Foo(2, "Zwei");

                    3 => new Foo(3, "Drei");

                    4 => new Foo(4, "Vier");

                    5 => new Foo(5, "Fuenf");
                }

        System.out.printf("numToFoo = %s\n", numToFoo);
    }

    static class Foo {
        private int id;
        private String name;

        public Foo(int id, String name) {
            this.id = id;
            this.name = name;
        }

        @Override
        public String toString() {
            return "Foo { id = " + id + ", name = " + name + " }";
        }
    }
}

So this sample input file includes examples for all three variants. So how would the parser for this look like? Let’s split the parser into three variants as well:

The parser for List:

/**
	 * Handle custom List syntax.
	 * 
	 * @param contents
	 * @return
	 */
	private static String parseListSyntax(String contents) {
		Matcher m = Patterns.LIST_PATTERN.matcher(contents);

		while (m.find()) {
			if (m.groupCount() == 3) {
				String type = m.group(1);
				String var = m.group(2);
				String val = m.group(3).trim();

				String replacement = "List<" + type + ">" + var + " = new ArrayList<" + type + ">() {{ ";
				String[] vals = val.split(",");

				for (String rep : vals) {
					String v = rep.trim();

					if (!v.isEmpty()) {
						replacement += "add(" + v + ");\n\t\t\t\t\t ";
					}
				}
				replacement += "}};";

				contents = m.replaceFirst(replacement);

				m = Patterns.LIST_PATTERN.matcher(contents);
			}
		}
		return contents;
	}

The parser for Set:

/**
	 * Handle custom Set syntax
	 * 
	 * @param contents
	 * @return
	 */
	private static String parseSetSyntax(String contents) {
		Matcher m = Patterns.SET_PATTERN.matcher(contents);

		while (m.find()) {
			if (m.groupCount() == 3) {
				String type = m.group(1);
				String var = m.group(2);
				String val = m.group(3).trim();

				String replacement = "Set<" + type + "> " + var + " = new HashSet<" + type + ">() {{ ";
				String[] vals = val.split(",");

				for (String rep : vals) {
					String v = rep.trim();

					if (!v.isEmpty()) {
						replacement += "add(" + v + ");\n\t\t\t\t\t";
					}
				}

				replacement += " }};";

				contents = m.replaceFirst(replacement);

				m = Patterns.SET_PATTERN.matcher(contents);
			}
		}

		return contents;
	}

And finally, the parser for Map:

/**
	 * Handle custom Map syntax
	 * 
	 * @param contents
	 * @return
	 */
	private static String parseMapSyntax(String contents) {
		Matcher m = Patterns.MAP_PATTERN.matcher(contents);

		while (m.find()) {
			if (m.groupCount() == 3) {
				String type = m.group(1);
				String var = m.group(2);
				String val = m.group(3).trim();

				String replacement = "Map<" + type + "> " + var + " = new HashMap<" + type + ">() {{";
				String[] vals = val.split(";");

				for (String rep : vals) {
					String v = rep.trim();

					if (!rep.isEmpty()) {
						String[] keyVal = v.split("=>");

						if (keyVal == null || keyVal.length != 2 || keyVal[0] == null || keyVal[1] == null) {
							throw new RuntimeException("invalid map syntax");
						}
						replacement += "put(" + keyVal[0].trim() + ", " + keyVal[1].trim() + ");\n\t\t\t\t\t";
					}
				}

				replacement += " }};";

				contents = m.replaceFirst(replacement);

				m = Patterns.MAP_PATTERN.matcher(contents);
			}
		}

		return contents;
	}

The code is quite straightforward and similar for all three variants – we simply to see if the relevant pattern is found in the current code contents (the whole source file is read in in one go), and if so, the relevant bit is rewritten, and the updated String is then passed to the next stage of the pipeline for further processing.

One thing to observe here is that in every parsing method, the find operation is performed in a loop. This is so that every occurrence can be updated correctly, and that is also the reason why the Matcher is updated at the end of every iteration of the loop – m = Patterns.LIST_PATTERN.matcher(contents), for instance. If this is not done so, the result will be an infinite loop since the while (m.find()) will never fail!

Here’s how the main parsing method looks like:

public static void parse(final String fileName) {
		try (BufferedReader reader = new BufferedReader(new FileReader(fileName))) {
			String ext = fileName.substring(fileName.lastIndexOf('.') + 1);

			if (!ext.equalsIgnoreCase("javax")) {
				throw new RuntimeException("input files must have a .javax extension");
			}

			File srcFile = new File(fileName);

			if (!srcFile.exists()) {
				throw new RuntimeException("source file " + fileName + " does not exist!");
			}

			final String basePath = srcFile.getAbsolutePath().substring(0,
					srcFile.getAbsolutePath().lastIndexOf(File.separator));
			final String baseFileName = srcFile.getName().substring(0, srcFile.getName().lastIndexOf('.'));
			final String outputFile = basePath + File.separator + baseFileName
					+ ".java";

			String contents = reader.lines().reduce("", (acc, s) -> {
				acc += s;
				acc += "\n";
				return acc;
			});

			contents = insertImports(contents);
			contents = parseListSyntax(contents);
			contents = parseSetSyntax(contents);
			contents = parseMapSyntax(contents);

			try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {
				writer.write(contents);
			} catch (Exception ex) {
				throw ex;
			}

		} catch (Exception ex) {
			throw new RuntimeException("Error during parsing: " + ex.getLocalizedMessage());
		}
	}

A final subtlety is in the way the importS are handled. For the full source code as well as more explanation on the way the whole tool works, visit the Github page.

Let’s take it for a spin:

Macushla:basic-regex-parser z0ltan$ ls
README.md pom.xml   samples   src       target

Macushla:basic-regex-parser z0ltan$ mvn package 1> /dev/null

Macushla:basic-regex-parser z0ltan$ java -jar target/basic-regex-parser-1.0-SNAPSHOT.jar samples/Default.javax samples/com/z0ltan/test/Package.javax samples/com/z0ltan/regexparser/test/HelloParser.javax

Let’s see how the sample file shown earlier has been rewritten:

Macushla:basic-regex-parser z0ltan$ cat samples/com/z0ltan/regexparser/test/HelloParser.java
package com.z0ltan.regexparser.test;

import java.util.*;

import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class HelloParser {
    public static void main(String[] args) {
        listDemo();

        setDemo();

        mapDemo();
    }

    private static void listDemo() {
        // normal list
        List<String> greetings = Arrays.asList("Hello", "Hallo", "Hola");
        System.out.printf("greetings = %s\n", greetings);

        // custom list syntax
        List<Integer>myNums = new ArrayList<Integer>() {{ add(1);
					 add(3);
					 add(5);
					 add(7);
					 add(9);
					 add(11);
					 }};

        System.out.printf("myNums = %s\n", myNums);

    }

    private static void setDemo() {
        // normal set
        Set<Integer> evensBelow10 = new HashSet<>();
        evensBelow10.addAll(Arrays.asList(2, 4, 6, 8));

        System.out.printf("evensBelow10 = %s\n", evensBelow10);

        // custom set syntax
        Set<String> mammals = new HashSet<String>() {{ add("Cat");
					add("Dog");
					add("Lion");
					add("Beaver");
					add("Raccoon");
					 }};

        System.out.printf("mammals = %s\n", mammals);
    }

    private static void mapDemo() {
        // normal map
        Map<Integer, String> numberToString = new HashMap<>();
        numberToString.put(1,  "One");
        numberToString.put(2, "Two");
        numberToString.put(3,  "Three");

        System.out.printf("numberToString = %s\n",  numberToString);

        // custom map syntax
        Map<Integer, Foo> numToFoo = new HashMap<Integer, Foo>() {{put(1, new Foo(1, "Ein"));
					put(2, new Foo(2, "Zwei"));
					put(3, new Foo(3, "Drei"));
					put(4, new Foo(4, "Vier"));
					put(5, new Foo(5, "Fuenf"));
					 }};

        System.out.printf("numToFoo = %s\n", numToFoo);
    }

    static class Foo {
        private int id;
        private String name;

        public Foo(int id, String name) {
            this.id = id;
            this.name = name;
        }

        @Override
        public String toString() {
            return "Foo { id = " + id + ", name = " + name + " }";
        }
    }
}

Excellent! Now let’s try and run all the parsed files to make sure they are all working as expected:

Macushla:basic-regex-parser z0ltan$ cd samples/

Macushla:samples z0ltan$ javac Default.java
Macushla:samples z0ltan$ java -cp . Default
users = [Cornelius, Marcus, Linnaeus, Gaius, Augustus]

Macushla:samples z0ltan$ javac com/z0ltan/test/Package.java
Macushla:samples z0ltan$ java -cp . com.z0ltan.test.Package
map = {One=1, Two=2, Three=3}
normalMap = {Hello=100, Again=300, World=200}

Macushla:samples z0ltan$ javac com/z0ltan/regexparser/test/HelloParser.java
Macushla:samples z0ltan$ java -cp . com.z0ltan.regexparser.test.HelloParser
greetings = [Hello, Hallo, Hola]
myNums = [1, 3, 5, 7, 9, 11]
evensBelow10 = [2, 4, 6, 8]
mammals = [Cat, Lion, Beaver, Raccoon, Dog]
numberToString = {1=One, 2=Two, 3=Three}
numToFoo = {1=Foo { id = 1, name = Ein }, 2=Foo { id = 2, name = Zwei }, 3=Foo { id = 3, name = Drei }, 4=Foo { id = 4, name = Vier }, 5=Foo { id = 5, name = Fuenf }}

Nice! Of course, this is just a toy tool that does trivial rewriting. If Java had a proper macro system (or even a macro system to begin with!), all this would have been trivial at best without so much redundant code. Nevertheless, it does illustrate the regular expression engines are indeed quite powerful well beyond their original intent.

EDIT: Just in case you are curious as to what the patterns look like, here they are:

public class Patterns {
	public static final Pattern LIST_PATTERN = Pattern
			.compile("List<([a-zA-Z]+)>\\s+([a-zA-Z0-9]+)\\s*=\\s*\\{\\s*([a-zA-Z0-9,\" ]+)\\s*\\}");

	public static final Pattern SET_PATTERN = Pattern
			.compile("Set<([a-zA-Z]+)>\\s+([a-zA-Z0-9]+)\\s*=\\s*\\{\\s*([a-zA-Z0-9,\" ]+)\\s*\\}");

	public static final Pattern MAP_PATTERN = Pattern
			.compile("Map<([a-zA-Z, ]+)>\\s+([a-zA-Z0-9]+)\\s*=\\s*\\{\\s*([a-zA-Z0-9\"\\s=>();, ]+)\\s*\\}");
	
	public static final Pattern PACKAGE_PATTERN = Pattern
			.compile("package\\s+([a-zA-Z0-9._;]+)");
	
	public static final Pattern IMPORT_PATTERN = Pattern
			.compile("import\\s+([a-zA-Z0-9.;]+)");
	
	public static final Pattern PUBLIC_CLASS_PATTERN = Pattern
			.compile("public class\\s+([a-zA-Z0-9_]+)");
	
	public static final Pattern NORMAL_CLASS_PATTERN = Pattern
			.compile("class\\s+([a-zA-Z0-9_]+)");
}

You may safely ignore the patterns after the first three patterns – they are used to insert the import java.util.*; declaration at the first valid location. Again, for more details, visit the Github location!

A parser using Regular Expressions?

Creating custom Java 8 Stream collectors

The streams feature (which makes heavy use of Functional Interfaces) is arguably the strongest feature in Java 8 (and above). The introduction of lambdas in Java in the same release to target Functional Interfaces also meant that creating chained operations in a functional style has never been easier in Java.

That being said, there are plenty of examples in the official docs that show how streams can be “collected” (effectively reduced/folded depending on your preference of the terminology) into Collection types – List, Set, or even Map. Now, I must make it absolutely clear at this stage that I love Java streams, and barring the absence (or rather, deprecation) of the zip iterator, it’s almost comprehensive. However, there are no examples in the docs to show how we might a series of intermediate operations into a custom type. Of course the helper class, Collectors has several helper methods such as groupingBy, partitioningBy, filtering, and reducing, but they either return a Map, or expect a reducible expression which may not always be the case as explained next.

Recently, I did a project in which I needed to process a stream of integers (the lack of zip forced me to take quite the peripatetic approach to finally make things work. Perhaps more on that in a later blogpost) acting as indices into a simple wrapper around a List of integers, and then ultimately collect the updated values of the list into a new instance of the custom type. It was quite an interesting experience that sparked interest in exploring how much more we could push the collect mechanism. (If you are interested in checking out the code for the mentioned example, you can find it here – Functional Nim.

For some more examples of custom Collector implementations, you can check out my Github page.

Use Case

For the purposes of this blog, to keep things simple, let us consider a a hypothetical example. Suppose we have a Point class with the following structure:

package com.z0ltan.custom.collectors.types;

public class Point {
	private int x;
	private int y;

	public Point(final int x, final int y) {
		this.x = x;
		this.y = y;
	}

	public int x() {
		return this.x;
	}

	public int y() {
		return this.y;
	}

	@Override
	public int hashCode() {
		return (this.x + this.y) % 31;
	}

	@Override
	public boolean equals(Object other) {
		if (other == null || !(other instanceof Point)) {
			return false;
		}

		Point p = (Point) other;

		return p.x == this.x && p.y == this.y;
	}

	@Override
	public String toString() {
		return "Point { x = " + x + ", y = " + y + " }";
	}
}

and we have a List of such points. Now suppose we wish to collate the information into a custom Points object which has the following structure:

package com.z0ltan.custom.collectors.types;

import java.util.Set;

public class Points {
	private Set<Integer> xs;
	private Set<Integer> ys;

	public Points(final Set<Integer> xs, final Set<Integer> ys) {
		this.xs = xs;
		this.ys = ys;
	}

	@Override
	public int hashCode() {
		return (this.xs.hashCode() + this.ys.hashCode()) % 31;
	}

	@Override
	public boolean equals(Object other) {
		if (other == null || !(other instanceof Points)) {
			return false;
		}

		Points p = (Points) other;

		return p.xs.equals(this.xs) && p.ys.equals(this.ys);
	}

	@Override
	public String toString() {
		return "Points { xs = " + xs + ", ys = " + ys + " }";
	}
}

As can be seen, we wish to retain only the unique x and y coordinate values into our final object.

A simple and logical way would be to collect the items from this stream (collect being a “terminal operation” defined in the Stream interface) into our Points object using a custom collector. Before we can do that, let us first understand what is involved in implementing a custom collector.

How to implement a custom collector

In brief, there are two forms of operations involved in Java streams – non-terminal or intermediate operations, which produce streams of their own, and terminal operations, which effectively stop the pipeline resulting in a final result of some sort.

As mentioned before, in most cases, the helper class, Collectors, provides enough functionality to cater to almost any requirement imaginable. However, in cases such as these, where we want to collect data into a custom type, we might be better off defining our own custom collector.

To do that, let us examine the signature of the collect method in the Stream interface. In fact, we will find that there are two versions of this method available to us:

 R collect​(Collector collector)

and

 R collect​(Supplier supplier,
              BiConsumer accumulator,
              BiConsumer combiner)

So which one do we use? Well, we can actually use either one for our purposes. In fact, the former is preferable if we have some involved logic, and we wish to encapsulate all of that in a nice class. However, functionally speaking, the latter is exactly the same. This can be further clarified by examining the Collector interface (showing only the abstract methods):

public interface Collector<T, A, R> {
        Supplier<A>	supplier​()
        BiConsumer<A,T>	accumulator​();	
        BinaryOperator<A>	combiner​()	
        Function<A,R>	finisher​()
        Set<Collector.Characteristics>	characteristics​()	
 }

As can be seen, if we do implement the Collector interface, we will have to implement essentially the same methods as that used by the second version of the collect method. In addition, we have a couple of extra methods which are not only interesting, but quite vital if we wish to implement the interface:

  • finisher: This is the main method that we need to implement as per our collection logic. This is the actual part where the accumulated values of the stream are massaged together into the final return value. The type parameters give a big hint in this regard – the return type,
    R is the same as that returned by the overall collect
    method.
  • characteristics: This is where we need to be careful. The enum has three variants – CONCURRENT, IDENTITY_FINISH, and UNORDERED. The bottomline is this – always use CONCURRENT for your custom types if the final value depends on the ordering of the values in the stream, or use UNORDERED if they do not. In the case of collecting values into a custom non-collection type, I don’t see any scenario where you would want to use IDENTITY_FINISH (unless you are a big fan of unsolicited ClassCastExceptionS).

    In short, this variant indicates that the finisher function is essentially an identity function, meaning that it can be skipped, and the currently accumulated value returned as the overall result (which is precisely what we wish to avoid).

One final comment to understand the collect method once and for all – what all those terms mean!

  • Supplier: This is the mechanism by which the input values are supplied to the collect method.
  • Accumulator: This is where the elements of the stream are combined with a running accumulator (which may be of a different type from the elements themselves), “reduced”, or “folded” in Functional terms.
  • Combiner: Similar to the accumulator, but the elements being combined together are of the same type. In most cases, this type would be a collection type, and finally,
  • Finisher: This is the meat of the whole collector. This is where the actual custom logic goes into to take the values produced by the combiner into the final result of the given return type.

Now that we’ve analysed the signature of the collect method, we must be in a position to realise that we can actually create custom collectors in multiple ways:

  • Using the static of methods in the Collector interface by supplying the correct supplier, accumulator, combiner, finisher, and Collector characteristics,
  • By creating a class that implements the Collector interface itself, and thereby providing implementations of the same supplier, accumulator, combiner, finisher, and Collector characteristics,
  • By simply creating any anonymous class conforming to the Collector interface, and providing the same inputs as in the previous two cases, or
  • Using any combination of the above.

To keep matters simple, let us create a custom class that implements the Collector interface. This will not only make things easier to understand, but also allow us to maintain code cleanliness.

Now let’s proceed with the implementation of the given use case to solidify these concepts.

Implementation and Demo

Let’s create a simple Maven project called custom_stream_collectors:

Macushla:Blog z0ltan$ mvn archetype:generate -DgroupId=com.z0ltan.custom.collectors -DartifactId=custom-collector -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------

               <elided>

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30.967 s
[INFO] Finished at: 2017-07-11T20:57:45+05:30
[INFO] Final Memory: 18M/62M
[INFO] ------------------------------------------------------------------------

After customising the project to our heart’s desire, let’s fill in our custom collector class:

package com.z0ltan.custom.collectors.collectors;

import java.util.ArrayList;
import java.util.Collections;
import java.util.EnumSet;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;

import com.z0ltan.custom.collectors.types.Point;
import com.z0ltan.custom.collectors.types.Points;

public class PointToPointsCollector implements Collector<Point, List<Point>, Points> {
	@Override
	public Supplier<List<Point>> supplier() {
		return ArrayList::new;
	}

	@Override
	public BiConsumer<List<Point>, Point> accumulator() {
		return List::add;
	}

	@Override
	public BinaryOperator<List<Point>> combiner() {
		return (acc, ps) -> {
			acc.addAll(ps);
			return acc;
		};
	}

	@Override
	public Function<List<Point>, Points> finisher() {
		return (points) -> {
			final Set<Integer> xs = new HashSet<>();
			final Set<Integer> ys = new HashSet<>();
			
			for (Point p : points) {
				xs.add(p.x());
				ys.add(p.y());
			}
			
			return new Points(xs, ys);
		};
	}

	@Override
	public Set<java.util.stream.Collector.Characteristics> characteristics() {
		return Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.UNORDERED));
	}
}

We use ArrayList::new (method references are another excellent feature in Java 8 and beyond) for our Supplier since we start off with a blank slate, and for the Accumulator, we use List::add since the last section made it clear that the accumulator’s only job is to keep collecting items into running value of another type (a List in this case).

Then we have the Combiner which is implemented by the little lambda expression:

   (acc, ps) -> { acc.addAll(ps); return acc; }

As mentioned in the previous section, the combiner simply flattens the collections together into a single collection. In case of confusion, always look to the type signature for clarity.

And finally, we have the Finisher:

return (points) -> {
			final Set<Integer> xs = new HashSet<>();
			final Set<Integer> ys = new HashSet<>();
			
			for (Point p : points) {
				xs.add(p.x());
				ys.add(p.y());
			}
			
			return new Points(xs, ys);
		};

At this point of the stream pipeline, the points variable holds the list of accumulated Point objects. All we do then is to create an instance of the Points class by using the data available in the points variable. The whole point (if you will forgive the pun) is that this method will have logic peculiar to your specific use case, so the implementation will vary tremendously (which is more than can be said about the others – supplier, accumulator, and combiner).

And finally, here is our main class:

package com.z0ltan.custom.collectors;

import java.util.Arrays;
import java.util.List;

import com.z0ltan.custom.collectors.collectors.PointToPointsCollector;
import com.z0ltan.custom.collectors.types.Point;
import com.z0ltan.custom.collectors.types.Points;

public class Main {
	public static void main(String[] args) {
		final List<Point> points = Arrays.asList(new Point(1, 2), new Point(1, 2), new Point(3, 4), new Point(4, 3),
				new Point(2, 5), new Point(2, 5));

		// the result of our custom collector
		final Points pointsData = points.stream().collect(new PointToPointsCollector());

		System.out.printf("\npoints = %s\n", points);
		System.out.printf("\npoints data = %s\n", pointsData);
	}
}

Well, let’s run it and see the output!

Macushla:custom-collector z0ltan$ mvn package && java -jar target/custom-collector-1.0-SNAPSHOT.jar
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building custom-collector 1.0-SNAPSHOT
        <elided>
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.627 s
[INFO] Finished at: 2017-07-11T21:31:19+05:30
[INFO] Final Memory: 16M/55M
[INFO] ------------------------------------------------------------------------

points = [Point { x = 1, y = 2 }, Point { x = 1, y = 2 }, Point { x = 3, y = 4 }, Point { x = 4, y = 3 }, Point { x = 2, y = 5 }, Point { x = 2, y = 5 }]

points data = Points { xs = [1, 2, 3, 4], ys = [2, 3, 4, 5] }

Success!

Creating custom Java 8 Stream collectors

Implementing Nat in Rust (a la Haskell)

The power of Algebraic Data Types comes to the fore when dealing with complex problem domains. Most modern languages tend towards supporting some form of ADTs, but the complexity of expressing such paradigms is not completely lost when compared to languages that were designed with such support in mind.

Case in point – defining the prototypical example of an algebraic data type, defining the Natural Number type (Nat). Let’s compare the naive implementation in two languages that do have varying support for ADTs – Haskell, and Rust.

First, the Haskell version. Haskell, just like ML and other members of this family, is eminently suited for describing such types almost effortlessly:

module NatDemo where

import Test.HUnit

data Nat = Zero | Succ Nat deriving (Show, Eq)

nat2int :: Nat -> Int
nat2int Zero = 0
nat2int (Succ n) = 1 + nat2int n

int2nat :: Int -> Nat
int2nat 0 = Zero
int2nat n = Succ (int2nat (n-1))

add :: Nat -> Nat -> Nat
add Zero n = n
add (Succ m) n = Succ (add m n)


-- test cases
test1 = TestCase (assertEqual "nat to int" (nat2int (Succ (Succ Zero))) 2)
test2 = TestCase (assertEqual "int to nat" (int2nat 5) (Succ (Succ (Succ (Succ (Succ Zero))))))
test3 = TestCase (assertEqual "add two nats" (add (int2nat 5) (int2nat 6)) (int2nat 11))

tests = TestList [test1, test2, test3]

run = runTestTT tests

Running it to confirm:


Macushla:Nat-in-Rust z0ltan$ ghci
GHCi, version 8.0.2: http://www.haskell.org/ghc/  😕 for help
Loaded GHCi configuration from /Users/z0ltan/.ghci

Prelude> :l Nat.hs
[1 of 1] Compiling NatDemo          ( Nat.hs, interpreted )
Ok, modules loaded: NatDemo.

*NatDemo> run
Cases: 3  Tried: 3  Errors: 0  Failures: 0
Counts {cases = 3, tried = 3, errors = 0, failures = 0}

As can be seen, it’s almost trivial to define the Nat type in Haskell, as well as write functions that operate on this type. It reads almost exactly like the mathematical version of the concept. Simply beautiful.

Now let’s show the Rust version first, and then perhaps a comment or two on the implementation.

#[derive(Debug, PartialEq)]
enum Nat {
    Zero,
    Succ(Box<Nat>),
}

/// open it up for code readability
use Nat::{Zero, Succ};

fn nat_to_int(n: Nat) -> i32 {
    match n {
        Zero => 0,
        Succ(n) => 1 + nat_to_int(*n),
    }
}


fn int_to_nat(n: i32) -> Nat {
    match n {
        0 => Zero,
        n => Succ(Box::new(int_to_nat(n-1))),
    }
}


fn add(m: Nat, n: Nat) -> Nat {
    match (m, n) {
        (Zero, n) => n,
        (Succ(m), n) => Succ(Box::new(add(*m, n))),
    }
}

fn main() {
    assert_eq!(nat_to_int(Succ(Box::new(Succ(Box::new(Succ(Box::new(Zero))))))), 3);
    assert_eq!(int_to_nat(5), Succ(Box::new(Succ(Box::new(Succ(Box::new(Succ(Box::new(Succ(Box::new(Zero)))))))))));
    assert_eq!(add(int_to_nat(5), int_to_nat(11)), int_to_nat(16));
}

Let’s make sure that it’s all in working condition:

Macushla:Nat-in-Rust z0ltan$ rustc nat.rs && ./nat
Macushla:Nat-in-Rust z0ltan$

Excellent! It all just works. However, a couple of points to notice – first of all, because Rust doesn’t really have a runtime, we need to wrap the recursive part in a Box (amongst other choices) so that the compiler knows what size the data is (a box, being a pointer, is of a fixed size determinable at compile time). Of course, the test code is positively ghastly because we have to wrap all calls with Box::new, not to mention to remember to dereference the boxed value inside the functions.

Secondly, the code is surprisingly clean for the most part. However, writing the code in not nearly as intuitive as writing the same in Haskell (or ML for that matter). I can only imagine how much more daunting the task would be in a language without pattern matching (amongst other higher-level features) such as Java. Of course we would make it work, but the distance between the intent of the code and the code itself would be much much higher. Syntax does matter, folks!

Well, that was a nice little experiment of an evening.

Implementing Nat in Rust (a la Haskell)

Compile time evaluation in Rust macros… or not (contrast with Common Lisp)

Rust macros are a great help in reducing boilerplate as well as creating tools to perform advanced code manipulation at compile time (the nom crate comes to mind). However, I did run into its limitations (again) when I started tinkering with a small idea.

Well, the idea that I had is quite simple – create a macro that will reverse the words of a string, but defer checking of types to runtime (so that correct entries would still produce output). However, this turned out to be not so simple, again due to the fact that Rust macros apparently do not provide a way to – perform compile-time checks, or eval code during macroexpansion (like Lisp and Scheme/Racket macros do).

Here was my first shot at creating that simple macro in Rust:

use std::any::Any;
use std::io::{self, Write};

fn is_string(s: &Any) -> bool {
    s.is::<String>()
}

macro_rules! reverse_string {
    ($string: expr) => {{
                if !is_string($string) {
                    writeln!(io::stderr(), "{:?} must be a String", $string).unwrap();
                    std::process::exit(1);
                }
                                  
                let mut rev_string = String::new();

                for word in $string.split_whitespace().rev() {
                    rev_string.push_str(word);
                    rev_string.push(' ');
                }
                rev_string
    }};
}


fn main() {
    let my_string = "Hello, world. How do you do?".to_string();
    println!("{}", reverse_string!(&my_string));
}

This works as expected, of course:

Macushla:Type_Checking_Macro_Rust z0ltan$ rustc reverse.rs && ./reverse
do? you do How world. Hello,

Now, suppose I added a call to the same macro, reverse_string on an integer instead of a string, then the results are not quite what I wanted:

fn main() {
    let my_string = "Hello, world. How do you do?".to_string();
    println!("{}", reverse_string!(&my_string));

    let my_num = 99;
    println!("{}", reverse_string!(&my_num));
}

Running which gives:

Macushla:Type_Checking_Macro_Rust z0ltan$ rustc reverse.rs && ./reverse
error[E0599]: no method named `split_whitespace` found for type `&{integer}` in the current scope
  --> reverse.rs:17:37
   |
17 |                 for word in $string.split_whitespace().rev() {
   |                                     ^^^^^^^^^^^^^^^^
...
31 |     println!("{}", reverse_string!(&my_num));
   |                    ------------------------ in this macro invocation

error: aborting due to previous error(s)

And what I really wanted was to see output for the string argument, and then the nice error message that I generate inside the macro. So what’s going on? Why can’t I generate the template, so to speak, of the macroexpansion and defer the actual error checking to runtime? Let’s expand the macro and take a peek:

Macushla:Type_Checking_Macro_Rust z0ltan$ rustc -Z unstable-options --pretty expanded reverse.rs

This produces a ton of output, but the relevant part of the expanded macro is here:

 let my_num = 99;
    ::io::_print(::std::fmt::Arguments::new_v1(
        {
            static __STATIC_FMTSTR: &'static [&'static str] = &["", "\n"];
            __STATIC_FMTSTR
        },
        &match (&{
            if !is_string(&my_num) {
                io::stderr()
                    .write_fmt(::std::fmt::Arguments::new_v1(
                        {
                            static __STATIC_FMTSTR: &'static [&'static str] =
                                &["", " must be a String\n"];
                            __STATIC_FMTSTR
                        },
                        &match (&&my_num,) {
                            (__arg0,) => {
                                [::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Debug::fmt)]
                            }
                        },
                    ))
                    .unwrap();
                std::process::exit(1);
            }
            let mut rev_string = String::new();
            for word in &my_num.split_whitespace().rev() {
                rev_string.push_str(word);
                rev_string.push(' ');
            }
            rev_string
        },) {
            (__arg0,) => {
                [
                    ::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt),
                ]
            }
        },
    ));

These, of course, correspond to the fully expanded form of the following two lines:

    let my_num = 99;
    println!("{}", reverse_string!(&my_num));

Now we begin to see why the code doesn’t work as expected. Here’s how it works –
macroexpansion happens as part of the overall compilation phase. During this time, the Rust Type Checker is still very much active (so we cannot inject arbitrary code that doesn’t satisfy the Type Checker or the Borrow Checker). Now, Rust doesn’t really have a way to “escape” or defer the actual checking till runtime. This is as much due to the Type Checker as to the fact that the Rust macro system does not provide such means (as Lisp or Scheme/Racket macros do).

So, in this case, the Type Checker sees this snippet: for word in &my_num.split_whitespace().rev(), realises that we are trying to call split_whitespace on an i32 variable, and immediately stops with a compilation error.

The other part (though not directly relevant here) is that all the defensive error checks using if !is_string(...) wouldn’t really work even if we were to try to check that at compile time, since Rust macros do not have, as far as I know, any way of doing compile-time conditional checking.

So, at this point I just stopped with the Rust version. Now, let’s try and implement the same macro using Common Lisp:

(defmacro reverse-string (x)
  "reverse the words of the string, error checking done at runtime"
  `(if (not (stringp ,x))
       (error "~a must be a string, not a ~a~%" ,x (type-of ,x))
       ,(let ((collect (gensym))
              (lst (gensym))
              (f (gensym))
              (s (gensym)))
             `(labels ((,collect (,lst)
                         (reduce (lambda (,f ,s)
                                   (concatenate 'string ,f " " ,s)) ,lst)))
                (,collect (reverse (loop for i = 0 then (1+ j)
                                      as j = (position #\Space ,x :start i)
                                      collect (subseq ,x i j)
                                      while j)))))))
(defun main ()
  (let ((s "Hello world")
         (d 99))
    (format t "~a reversed is ~a~%" s (reverse-string s))
    (format t "~a reversed is ~a~%" d (reverse-string d))))


(defun view-macro-expansion (form)
  "helper function to display the macro-expanded form for the
   supplied form"
  (macroexpand form))

The only point of interest is the reverse-string macro. It’s pretty much the same logic as in the attempted Rust macro – create a template that checks, at runtime, whether the supplied argument is a string, and if not, generate a proper error message. If indeed the argument is a string, then reverse the words of the original string – this is the bit being done inside the loop macro.

The interesting bit is that the Lisp distro that I use – SBCL, does do rigorous compile-time analysis, and actual gives plenty of notice that it’s deleting redundant code (corresponding to the actual call in main, (format t "~a reversed is ~a~%" d (reverse-string d)) which the compiler realises will never actually be executed). However, the expanded macro has the relevant checks, and the relevant call itself is preserved so that the macro behaves exactly as desired:

CL-USER> (main)
Hello world reversed is world Hello

and, in the Lisp debugger,

99 must be a string, not a (INTEGER 0 4611686018427387903)
   [Condition of type SIMPLE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [REMOVE-FD-HANDLER] Remove #<SB-IMPL::HANDLER INPUT on descriptor 14: #<CLOSURE (LABELS SWANK/SBCL::RUN :IN SWANK/BACKEND:ADD-FD-HANDLER) {1002F80ADB}>>
 3: [ABORT] Exit debugger, returning to top level.

Backtrace:
  0: (MAIN)
  1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MAIN) #<NULL-LEXENV>)
  2: (EVAL (MAIN))

Excellent! And here is how the expanded form of the macro call actually looks like:

CL-USER> (view-macro-expansion '(reverse-string 99))
(IF (NOT (STRINGP 99))
    (ERROR "~a must be a string, not a ~a~%" 99 (TYPE-OF 99))
    (LABELS ((#:G602 (#:G603)
               (REDUCE
                (LAMBDA (#:G604 #:G605)
                  (CONCATENATE 'STRING #:G604 " " #:G605))
                #:G603)))
      (#:G602
       (REVERSE
        (LOOP FOR I = 0 THEN (1+ J) AS J = (POSITION #\  99 :START I)
              COLLECT (SUBSEQ 99 I J)
              WHILE J)))))
T

Of course, I am being a bit unduly harsh on Rust here because Common Lisp, despite all vendor-specific quirks, is still pretty much a dynamic language, so we reasonably expect it to defer most type checking to runtime. In the case of Rust, it is a very strongly-typed static language, so it can ill afford to leave a lot of checking to runtime especially since it is hardly expected to have a runtime to carry out those checks (even though Rust does have a runtime, I suspect it’s quite lightweight). In any case, an interesting little experiment.

Compile time evaluation in Rust macros… or not (contrast with Common Lisp)

How to build OpenJDK 9 on macOS Sierra

Continuing with my research into Programming Languages (and Compilers in particular), I started looking through the (considerably abstruse) documentation on the OpenJDK site. My patience paid off in the end and I ended up getting a lot of useful information on what I was looking for in particular – the javac source code and documentation. One of my earmarked projects for this year is to modify javac to allow some for extra syntax, and since this involves adding additional semantics, I was keen to download and build OpenJDK for myself.

The steps are surprisingly very well documented and easy to follow. To give a personal perspective on the build process (including a gotcha regarding JDK9), here is the list of steps to download OpenJDK 9 and build the whole system from scratch (for experimentation/contributing to OpenJDK, what have you). Note that the latter steps are specifically for macOS. However, most of the steps are more or less the same across platforms):

  1. First, install Mercurial if you already don’t have it. This was a bit irksome for me at first (being primarily a Git user). If you have Homebrew installed, then it is as trivial as:
    $ brew install hg

    Then, ensure that you have the following settings in your ~/.hgrc file. In case you don’t have a .hgrc file, create one like so:
    $ touch ~/.hgrc

    Add the following lines to the ~/.hgrc file:

    [settings]
    fetch = 
    mq = 
    
  2. Now, clone the OpenJDK 9 repository (for some reason, hg tclone did not work for me (as specified in the documentation), but the following did steps worked just fine:
     $ hg clone http://hg.openjdk.java.net/jdk9/dev 9dev
     $ cd 9dev
     $ sh ./get_source.sh
     

    The last step will retrieve all the necessary code and build environment into your local repository, so this does take some time.

  3. Now, one caveat here – to build OpenJDK, you need an existing JDK installation (the documentation mentions that any standard variant will do). This is called the “boot JDK”. However, I found that the build does not go through with JDK 9 (from the error messages, it looks like Reflection due to the new modules system is the main culprit). In that case, you can use an older version (say, JDK 8) to perform the build instead.

    In the case of macOS, you can use the following command to find the available JDKs on your machine:

     Macushla:RacketMacros z0ltan$ /usr/libexec/java_home -V
     Matching Java Virtual Machines (2):
      9, x86_64:	"Java SE 9-ea"	/Library/Java/JavaVirtualMachines/jdk-
      9.jdk/Contents/Home
      1.8.0_112, x86_64:	"Java SE 8"	
      /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home
    
      /Library/Java/JavaVirtualMachines/jdk-9.jdk/Contents/Home
     

    The last line shows the current default JDK. Before proceeding with the build, you can either change the default version, or you can simply pass in a different “boot JDK” to the configuration step (preferable).

    Now, let’s generate the configuration file:

    bash configure --with-boot-jdk /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home/ --disable-warnings-as-errors
    

    Note that I pass in two flags to bash configure : --with-boot-jdk specifying the JDK that I want to use to build OpenJDK, and --disable-warnings-as-errors (this is required on macOS).

    If this command runs successfully, the build is almost certainly going to succeed.

  4. Finally, trigger the build (this should take a bit of time again):
    $ make images
    

    This will dump the generated OpenJDK images into the build directory.

  5. You can test that the generated image is valid:
    Macushla:9dev z0ltan$ ./build/macosx-x86_64-normal-server-release/jdk/bin/java -version
    openjdk version "9-internal"
    OpenJDK Runtime Environment (build 9-internal+0-adhoc.z0ltan.9dev)
    OpenJDK 64-Bit Server VM (build 9-internal+0-adhoc.z0ltan.9dev, mixed mode)
    

    And there you go. All done in under half an hour!

Note that there is a plethora of build options and flags (including cross-compilation capabilities), and you should definitely check out the two HTML pages in the following location of your repository – ./common/doc/building.html, and ./common/doc/testing.html.

Now the real work begins! 😀

How to build OpenJDK 9 on macOS Sierra

Mutual Recursion demo in Rust and Racket (inspired by Haskell)

This is a quick post on a Rust version of the Haskell evens and odds program demonstrating mutual recursion (as shown in Hutton’s book).

First off, the Haskell code:

  evens :: [a] -> [a]
  evens [] = []
  evens (x:xs) = x : odds xs

  odds :: [a] -> [a]
  odds [] = []
  odds (_:xs) = evens xs

A sample run:

*Main> let string = "abcde"
*Main> evens string
"ace"
*Main> odds string
"bd"
*Main> string
"abcde"

So the whole ideas is to have the evens functions display all the characters in even positions (starting from 0), and then odds function likewise display all the characters in odd positions.

The evens function acts as the actual accumulator whilst odds is only used as a trampoline for continuing the recursion.

Now, for a rough Rust version of it (preserving the original character array):


fn main() {
    fn evens<T: Copy>(xs: &[T]) -> Vec<T> {
        if xs.is_empty() {
            Vec::new()
        } else {
            cons(&xs[0], &odds(&xs[1..]))
        }
    }

    fn odds<T: Copy>(xs: &[T]) -> Vec<T> {
        if xs.is_empty() {
            Vec::new()
        } else {
            evens(&xs[1..])
        }
    }

    fn cons<T: Copy>(x: &T, xs: &[T]) -> Vec<T> {
        let mut vec = Vec::new();

        vec.push(*x);

        for e in xs.iter() {
            vec.push(*e);
        }
        vec
    }

    let string = String::from("abcde");

    println!("{}",
             String::from_utf8(evens(&string.clone().into_bytes())).unwrap());
    println!("{}",
             String::from_utf8(odds(&string.clone().into_bytes())).unwrap());

    println!("{}", string);
}

And a quick run:

Macushla:EvensAndOdds z0ltan$ rustc evens_and_odds.rs
Macushla:EvensAndOdds z0ltan$ ./evens_and_odds
ace
bd
abcde

So, as can be clearly seen, the original string is left unmodified. Of course this version looks quite dirty, but the nice bit is that &[T] accepts parameters of type Vec (or reference variants) and vice-versa. This enables using slicing extensively and naturally inside the functions. The vector copying code could, of course, be made to work with an API call, but I feel this is much better in its explicit form.

The Racket version looks much nicer, being more amenable to functional constructs than Rust:

#lang racket

(define (evens xs)
  (if (null? xs)
      '()
      (cons (car xs) (odds (cdr xs)))))

(define (odds xs)
  (if (null? xs)
      '()
      (evens (cdr xs))))

(define (main)
  (let ([string "abcde"])
    (displayln (list->string (evens (string->list string))))
    (displayln (list->string (odds (string->list string))))
    (displayln string)))

And a final run for the Racket version:

evens-and-odds.rkt> (main)
ace
bd
abcde
Mutual Recursion demo in Rust and Racket (inspired by Haskell)

A bit of play with Rust macros

Here are a couple of macros that I wrote up on a slow evening. The first one provides a nice literal way of creating maps, and the second one mimicks Haskell’s list comprehension syntax in a rather crude manner.

My aim originally had been to:

  • have a simple if-then-else structure created in Rust, something like so:
    fn main() {
      let res = if 2 > 3 then "Yes" else "No";
    }
    

    However, this did not appear to be possible with the current macro support in Rust since a fragment of type expr (expression) can be followed only by the following sigils –=> ; ,. So much for that.

  • and secondly, to be able to replicate Haskell’s list comprehension syntax in its entirety, allowing for any possible generator. However, I then realised that Rust’s macro system also did not support actual evaluation of code during expansion like Common Lisp or Scheme/Racket does, even though it does work on the actual AST and not merely text. So out that went through the window as well.

So here’s the first macro demo:

use std::collections::HashMap;

macro_rules! my_map {
    ($($key: expr => $val: expr)*) => {
        {
            let mut map = HashMap::new();

            $(
                map.insert($key, $val);
            )*

          map
        }
    };
}

fn main() {
    let my_map = my_map!{
        1 => "One"
        2 => "Two"
        3 => "Three"
        4 => "Four"
        5 => "Five"
    };

    println!("{:?}", my_map);
}

Nothing fancy here, but you must say that it does look nicer, almost Ruby-esque! Here’s the code run:

Macushla:List_Comprehension_Macro z0ltan$ ./map
{2: "Two", 1: "One", 3: "Three", 5: "Five", 4: "Four"}

And here’s the poor man’s list comprehension that is not only limited in scope, but also entirely inflexible in its syntax (I just couldn’t be bothered tinkering with it for little return):

macro_rules! compr {
    ($id1: ident | $id2: ident <- [$start: expr ; $end: expr] , $cond: expr) => {
        {
            let mut vec = Vec::new();

            for num in $start..$end + 1 {
                if $cond(num) {
                    vec.push(num);
                }
            }

            vec
        }  
    };
}

fn even(x: i32) -> bool {
    x%2 == 0
}

fn odd(x: i32) -> bool {
    x%2 != 0
}

fn main() {
    let evens = compr![x | x <- [1;10], even];
    println!("{:?}", evens);

    let odds = compr![y | y <- [1;10], odd];
    println!("{:?}", odds);
}

As you can see, the ident fragments are completely for show, I cannot use .. in the generator (again due to restrictions on what can follow a expr fragment), and the guards don’t even check against the supposed identifier that’s supposed to be collecting the results into the final list/vector! Anyway, here’s the code run output:

Macushla:List_Comprehension_Macro z0ltan$ rustc list_compr.rs
Macushla:List_Comprehension_Macro z0ltan$ ./list_compr
[2, 4, 6, 8, 10]
[1, 3, 5, 7, 9]

Not too shabby, eh? In all seriousness though, while Rust’s macros are a big improvement over the debacle that is C’s macro system, the surface similarities to Racket’s powerful macro system ends right there – on the surface. As it stands now, there are far too many restrictions on the macro system for it to be considered a viable way of extending the syntax of the language, or even creating new languages (as in the Racket world).

EDIT: The list comprehension macro can actually be improved by using tt for the start and end of the range – this also allows .. to be used much in the Haskell way. Moreover, now we can actually simulate the exact Haskell syntax (for this very specific case, of course), and also make use of the identS as well.

Here’s the new code, but I’m keeping the old code up to remind myself that I can be a doddering idiot at times!

Updated macro:

macro_rules! compr {
    ($id1:ident | $id2:ident <- [$start:tt..$end:tt] , $cond:tt $id3:ident) => {
        {
            let mut vec = Vec::new();
 
            for $id1 in $start..$end + 1 {
                if $cond($id3) {
                    vec.push($id2);
                }
            }
 
            vec
        }  
    };
}
 
fn even(x: i32) -> bool {
    x%2 == 0
}
 
fn odd(x: i32) -> bool {
    x%2 != 0
}
 
fn main() {
    let evens = compr![x | x <- [1..10], even x];
    println!("{:?}", evens);
 
    let odds = compr![y | y <- [1..10], odd y];
    println!("{:?}", odds);
}

And a run just to make sure it’s working:

Macushla:List_Comprehension_Macro z0ltan$ rustc list_compr.rs
Macushla:List_Comprehension_Macro z0ltan$ ./list_compr
[2, 4, 6, 8, 10]
[1, 3, 5, 7, 9]

That’s much better!

A bit of play with Rust macros

The Observer pattern in Rust

Here is a simple implementation of the Observer pattern in Rust. First, the code, and then a small bit of explanation of the implementation itself.

The project structure:

Macushla:rust-observer z0ltan$ tree
.
├── Cargo.lock
├── Cargo.toml
└── src
    ├── lib.rs
    └── main.rs

1 directory, 4 files

The client code:

extern crate rust_observer;

use rust_observer::*;

fn main() {
    let mut foo = EvenCounter::new();
    let (bar, baz, quux) = (Box::new(EvenObserver::new("bar".to_string())),
                            Box::new(EvenObserver::new("baz".to_string())),
                            Box::new(EvenObserver::new("quux".to_string())));

    foo.register(bar);
    foo.register(baz);
    foo.register(quux);

    foo.run();
}

Nothing special here – just creating an Observable object of type EvenCounter, and then three instances of EvenObserver, which implements the Observer type.

The idea is to have the observers get notified whenever the observable object’s counter is an even number (it runs the counter in an infinite loop).

And finally, the actual implementation:

/// the observable type
pub trait Observable<T> {
    fn register(&mut self, observer: Box<Observer<Item = T>>);
}

pub trait Observer {
    type Item;

    fn notify(&self, val: &Self::Item);
}


/// the actual structs which implement the Observable and Observer
/// traits

/// the specific Observable
pub struct EvenCounter {
    counter: u32,
    observers: Vec<Box<Observer<Item = u32>>>,
}

impl EvenCounter {
    pub fn new() -> Self {
        EvenCounter {
            counter: 0u32,
            observers: Vec::new(),
        }
    }

    pub fn run(&mut self) {
        loop {
            use std::thread;
            use std::time::Duration;

            thread::sleep(Duration::from_millis(self.get_rand_duration()));

            self.counter += 1;

            if self.counter % 2 == 0 {
                for observer in self.observers.iter() {
                    observer.notify(&self.counter);
                }
            }
        }
    }

    fn get_rand_duration(&self) -> u64 {
        if cfg!(target_os = "windows") {
            500u64
        } else {
            use std::process::Command;
            use std::str::FromStr;

            let rand_cmd = Command::new("sh")
                .arg("-c")
                .arg("echo $(( $RANDOM%1000 + 1 ))")
                .output()
                .expect("failed to get OS RNG");

            u64::from_str(&String::from_utf8(rand_cmd.stdout).unwrap().trim()).unwrap()
        }
    }
}

impl Observable<u32> for EvenCounter {
    fn register(&mut self, observer: Box<Observer<Item = u32>>) {
        self.observers.push(observer);
    }
}

/// the specific Observer type
pub struct EvenObserver {
    name: String,
}

impl EvenObserver {
    pub fn new(name: String) -> Self {
        EvenObserver { name: name }
    }

    fn name(&self) -> &str {
        &self.name
    }
}

impl Observer for EvenObserver {
    type Item = u32;

    fn notify(&self, val: &Self::Item) {
        println!("{} got {}", self.name(), val);
    }
}

As you can see, the types themselves are implemented as traits, and then the specific concrete types used for the example implement these traits.

The Observable type is quite simple – it is generic over the type that its observers can get notified about. Registration of observers is done through the required method, register. Also note the Box type being used here instead of a real trait object. Sometimes it’s much easier just to box things up than bothering about lifetimes and such.

The Observer type is more interesting. It has a single method, notify which the observable object uses to call the observers. The type of value that is retrieved from the observable object is made an “Associate Type” of the Observer trait. I find this to be a much cleaner approach than defining the trait itself to be parameterized over a generic type.

A simple test run:

Macushla:rust-observer z0ltan$ cargo run
   Compiling rust_observer v0.1.0 (file:///Users/z0ltan/Projects/Playground/DesignPatterns/Observer/Rust/rust-observer)
    Finished dev [unoptimized + debuginfo] target(s) in 0.61 secs
     Running `target/debug/rust_observer`
bar got 2
baz got 2
quux got 2
bar got 4
baz got 4
quux got 4
bar got 6
baz got 6
quux got 6
bar got 8
baz got 8
quux got 8
bar got 10
baz got 10
quux got 10
bar got 12
baz got 12
quux got 12
bar got 14
baz got 14
quux got 14
^C

Very satisfying indeed!

The Observer pattern in Rust

A visitor pattern implementation in Rust

I have recently been studying the ANTLR parser-generator framework, and one of its core approaches is separating the core parsing logic from implementation details. It achieves this using two major patterns – the Listener pattern and the Visitor pattern. This is easily achieved in an Object-oriented language like Java. I was curious to see how that would translate into a language like Rust, hence this post.

Rust, while not quite being an Object-oriented language, does support static and dynamic dispatch. In this case, we are interested in dynamic dispatch, and “Trait Objects” are the main way in which Rust support this feature. However, one problem with using trait objects is that the runtime really doesn’t directly provide us with any way of retrieving the specific type at runtime. This means that calling any struct specific methods on the fly isn’t really that straightforward. The other problem (specific to the Visitor pattern) is that Rust doesn’t have the notion of method overloading. This means that the only sane approach to implementing the pattern in Rust would be to pattern match against the trait object and invoke the appropriate logic, which leads us back to the first point.

A bit of Googling leads us to a practical (if not so elegant) approach of determining the concrete type that implemented the trait at runtime by using std::any::Any. This approach entails providing a method that returns a trait object of type Any for each concrete type that implements the trait, and also explicitly checking the concrete type during pattern matching – there really is no way of extracting the concrete type directly. Instead, we have to try downcasting the trait object to each concrete type and checking if we have a match or not. While this approach works fine for our custom types (the types and count of which we know), this clearly wouldn’t work in a scenario where we are working with third-party types.

In any case, here is the code implementing a simple Visitor pattern where we visit each component of a House type, and perform some custom processing independent of the actual data structures.

Here is the overall layout of the code:

Macushla:rust_visitor z0ltan$ tree
.
├── Cargo.lock
├── Cargo.toml
└── src
    ├── lib.rs
    ├── main.rs
    └── visitors.rs

1 directory, 5 files

Here is the client code, main.rs:

extern crate rust_visitor;

use rust_visitor::*;

fn main() {
    let house = House::new();

    // simply print out the house elements
    house.accept(&visitors::HouseElementListVisitor::new());
   
    println!();
     
    // do something with the elements of a house
    house.accept(&visitors::HouseElementDemolishVisitor::new());
}

So we can see that we create the House element, and then pass in two different implementations of the HouseElementVisitor trait.

Here is the code defining the components of a House:

use std::any::Any;

pub mod visitors;

use visitors::HouseElementVisitor;

/// This represents and element of a house.
/// Each type that implements this trait
/// supports being processed by a visitor.
pub trait HouseElement {
    fn accept(&self, visitor: &HouseElementVisitor);
    fn as_any(&self) -> &Any;
}


/// Define the house entity and its elements

pub struct House {
    components: Vec<Box<HouseElement>>,
}

impl House {
    pub fn new() -> Self {
        House {
            components: vec![Box::new(Livingroom::new()), Box::new(Bedroom::new()), 
                            Box::new(Bathroom::new()), Box::new(Kitchen::new())],
        }
    }
}

impl HouseElement for House {
    fn accept(&self, visitor: &HouseElementVisitor) {
        for component in self.components.iter() {
            component.accept(visitor);
        }

        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}

struct Livingroom {}

impl Livingroom {
    fn new() -> Self {
        Livingroom{}
    }
}

impl HouseElement for Livingroom {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}


struct Bedroom {}

impl Bedroom {
    fn new() -> Self {
        Bedroom {}
    }
}

impl HouseElement for Bedroom {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}


struct Bathroom {}

impl Bathroom {
    fn new() -> Self {
        Bathroom {}
    }
}

impl HouseElement for Bathroom {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}


pub struct Kitchen {}

impl Kitchen {
    fn new() -> Self {
        Kitchen {}
    }
}

impl HouseElement for Kitchen {
    fn accept(&self, visitor: &HouseElementVisitor) {
        visitor.visit(self);
    }

    fn as_any(&self) -> &Any { self }
}

As can be seen, each implementation of the HouseElement trait defines a method, as_any that returns self as the specific type of the trait object, &Any. This will help the visitor implementation match and find the actual concrete type to do the processing specific to that visitor and that particular HouseElement component.

Finally, here is the actual implementation of the visitors:

//! This module defines the `HouseElementVisitor` trait which defines the capabilities
//! of a type to process the components of a `HouseElement` entity, and also defines
//! a couple of visitor types that do completely different processing.

use ::*;

pub trait HouseElementVisitor {
    fn visit(&self, element: &HouseElement);
}

pub struct HouseElementListVisitor {}

impl HouseElementListVisitor {
    pub fn new() -> Self {
        HouseElementListVisitor {}
    }
}

impl HouseElementVisitor for  HouseElementListVisitor {
    fn visit(&self, element: &HouseElement) {
        if element.as_any()
            .downcast_ref::<House>()
            .is_some() {
            println!("Visiting the House...");
        }

        if element.as_any()
            .downcast_ref::<Livingroom>()
            .is_some() {
            println!("Visiting the Living Room...");
        }

        if element.as_any()
            .downcast_ref::<Bedroom>()
            .is_some() {
            println!("Visiting the bedroom...");
        }

        if element.as_any()
            .downcast_ref::<Kitchen>()
            .is_some() {
                println!("Visiting the kitchen...");
        }
    }
}


pub struct HouseElementDemolishVisitor {}

impl HouseElementDemolishVisitor {
    pub fn new() -> Self {
        HouseElementDemolishVisitor {}
    }
}

impl HouseElementVisitor for HouseElementDemolishVisitor {
    fn visit(&self, element: &HouseElement) {
        if element.as_any()
            .downcast_ref::<House>()
            .is_some() {
            println!("Annihilating the House...!!!");
        }

        if element.as_any()
            .downcast_ref::<Livingroom>()
            .is_some() {
            println!("Bombing the Living Room...!!!");
        }

        if element.as_any()
            .downcast_ref::<Bedroom>()
            .is_some() {
            println!("Decimating the bedroom...!!!");
        }

        if element.as_any()
            .downcast_ref::<Kitchen>()
            .is_some() {
                println!("Destroying the kitchen...!!!");
        }
    }
}

Dirty, but at least it works. The problem, as mentioned before, is that in order to perform specific processing using dynamic dispatch, we use the trait object &HouseElement. However, in order to match against the concrete type, we need to try and match the downcasted reference against each concrete type. This is what the element.as_any().downcast_ref::() code does.

Let’s try it out!

Macushla:rust_visitor z0ltan$ cargo run
   Compiling rust_visitor v0.1.0 (file:///Users/z0ltan/Projects/Blog/Visitor_in_Rust/rust_visitor)
    Finished dev [unoptimized + debuginfo] target(s) in 0.56 secs
     Running `target/debug/rust_visitor`
Visiting the Living Room...
Visiting the bedroom...
Visiting the kitchen...
Visiting the House...

Bombing the Living Room...!!!
Decimating the bedroom...!!!
Destroying the kitchen...!!!
Annihilating the House...!!!

Excellent! Clunky and inelegant, but at least it works.

UPDATE: The implementations of the HouseElementVisitor trait can be simplified into a more elegant match expression block and using the is method of the Any trait as shown below:

Updated HouseElementListVisitor:

impl HouseElementVisitor for  HouseElementListVisitor {
    fn visit(&self, element: &HouseElement) {
        match element.as_any() {
            house if house.is::<House>() => println!("Visiting the house..."),
            living if living.is::<Livingroom>() => println!("Visiting the Living room..."),
            bedroom if bedroom.is::<Bedroom>() => println!("Visiting the bedroom..."),
            kitchen if kitchen.is::<Kitchen>() => println!("Visiting the kitchen..."),
            _ => {}
        }
    }
}

And the HouseElementDemolishVisitor implementation:

impl HouseElementVisitor for HouseElementDemolishVisitor {
    fn visit(&self, element: &HouseElement) {
        match element.as_any() {
            house if house.is::<House>() => println!("Annihilating the house...!!!"),
            living if living.is::<Livingroom>() => println!("Bombing the Living room...!!!"),
            bedroom if bedroom.is::<Bedroom>() => println!("Decimating the bedroom...!!!"),
            kitchen if kitchen.is::<Kitchen>() => println!("Destroying the kitchen...!!!"),
            _ => {}
        }
    }
}

Much more readable in my opinion, and it also hides the somewhat ugly downcast_ref::() bit!

A visitor pattern implementation in Rust