Talking with a Lisp

Technical z0ltanspeak

A Simple Test Framework (STF) in Java

leave a comment »

The idea for this blog post arose when I decided to implement a simple and basic version of the Untyped Lambda Calculus in Java. As part of the development process, I realised that the simple test code that I was writing was fast ballooning in size. Instead of investing in a relatively heavy framework like the ubiquitous JUnit, I decided that it would be a nice educational experience to convert the testing code into a small test framework of sorts. I call that the STF (Simple Test Framework).

STF is a tiny piece of code that will run without any extra dependencies aside from the standard JDK. I had considered implementing it in a way that would work with any JDK from version 5 and upwards, but in view of the worldwide adoption of Java 8 as also the fact that it is not intended to be a production ready library, I thought it best to implement it using features that would require JDK 7 (or above). That also means that it is implemented as a plain project without using Maven, Gradle, or any other build tool. It can be run directly by running the STFRunner class with the test class as argument (refer to the demo section). I had contemplated putting this code up on GitHub or some such site, but since it is a basic framework that fits well within a blog post, I decided to post it here in its entirety. Full-fledged projects in the future will be posted on some online repository to enable easy access and experimentation, and relevant discussion will be done here in the form of blog posts.

Basic Design

The basic design goals of this small project were:

  • Provide support for defining tests using the @Test annotation.
  • Support @Before and @After annotations for setup and clean-up code.
  • Only these annotations supported (no elements) – @Before, @After, @Test
  • No support for test suites (though it’s easy to extend the framework to support this).
  • Log the output using the built-in Java logging framework.
  • Execute a test class using: java -cp com.z0ltan.stf.STFRunner . A custom class loader loads the test class (compiling the source file first if necessary) and executes the test cases defined therein.
  • No support for line number or line of code where the error occurred i.e., no associated source code information is presented. Only the test method which failed will be logged.
  • Support only for asserts — no support for matching like in JUnit.

The basic layout of STF is amply demonstrated by the following UML diagram:

UML diagram for STF

The STFRunner class in the entry-point to the STF framework. Its responsibility is to use a custom class loader, STFClassLoader to load the test class, create an instance of the STFCore class and pass it the test class.

The STFCore class then creates an instance of the test class, executes any code in the method marked with @Before (only the first method found marked with @Before is executed), run all the test cases in a non-deterministic order logging all the relevant cases which have PASSED or FAILED, and then finally perform any cleanup code as present in the method marked with @After (again, only the first method found marked with this annotation is executed).

The most important class in STF (from a testing perspective) is STFAsserts. This class provides all the assertion facility for STF. There are basic asserts for equality, inequality, conditions, nulls, and for non-nulls not only for basic types, but also for composite types as well as arrays.

Implementation

The entire implementation is copied here for reference. Explanations are provided for the core classes and eschewed for the supporting classes/interfaces.

Before

package com.z0ltan.stf;

import java.lang.annotation.Documented;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;

@Documented
@Retention(RetentionPolicy.RUNTIME)
public @interface Before {
}

After

package com.z0ltan.stf;

import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Documented;

@Documented
@Retention(RetentionPolicy.RUNTIME)
public @interface After {
}

Test

package com.z0ltan.stf;

import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Documented;

@Documented
@Retention(RetentionPolicy.RUNTIME)
public @interface Test {
}

STFRunner

package com.z0ltan.stf;

public class STFRunner {
        public static void main(String[] args) {
                if (args == null || args.length != 1)
                        throw new RuntimeException("Specify the test file");

                STFCore core = new STFCore(args[0]);
                core.run();
        }
}

STFCore

package com.z0ltan.stf;

import java.util.logging.Logger;

import java.lang.reflect.Method;
import java.lang.reflect.InvocationTargetException;
import java.lang.annotation.Annotation;

public class STFCore {
        private static Logger logger = Logger.getLogger(STFCore.class.getName());
        private String file;
        private STFClassLoader loader;

        public STFCore(String file) {
                this.file = file;
                this.loader = new STFClassLoader();
        }

        public void run() {
                try {
                        Class<?> clazz = loader.findClass(this.file);
                        Object testObj = clazz.getConstructor().newInstance();

                        Method[] methods = clazz.getDeclaredMethods();

                        // Setup code (if present)
                        runBefore(testObj, methods);

                        // Run the test
                        runTests(testObj, methods);

                        // Cleanup code (if present)
                        runAfter(testObj, methods);
                } catch (Throwable ex) {
                        logger.severe("Error encountered. Message = " +
                                ex.getLocalizedMessage());
                        throw new RuntimeException(ex);
                }
        }

        private void runBefore(Object o, Method[] methods) throws Throwable {
                logger.info("Running Setup code");

                for (Method m : methods) {
                        if (containsAnnotation(m, Before.class)) {
                                execute(o, m);
                                break;
                        }
                }                                                        

                logger.info("Finished running Setup code");
        }

        private void runTests(Object o, Method[] methods) throws Throwable {
                for (Method m : methods) {
                        if (containsAnnotation(m, Test.class)) {
                                try {
                                        logger.info("Running Test Case: " + m.getName());
                                        execute(o, m);
                                        logger.info("Test Case: " + m.getName() + " - [PASSED]");
                                } catch (AssertionError err) {
                                        logger.info("Test Case: " + m.getName() +
                                                "- [FAILED]. Reason: " +
                                                err.getLocalizedMessage());
                                        continue;
                                }
                        }
                }
        }

        private void runAfter(Object o, Method[] methods) throws Throwable {
                logger.info("Running Cleanup code");

                for (Method m : methods) {
                        if (containsAnnotation(m, After.class)) {
                                execute(o, m);
                                break;
                        }
                }

                logger.info("Finished running Cleanup code");
        }

        private void execute(Object o, Method m) throws Throwable {
                try {
                        m.invoke(o, (Object[]) null);
                } catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException ex) {
                        if ((ex instanceof InvocationTargetException) &&
                                (ex.getCause().getClass() == java.lang.AssertionError.class)) {
                                throw ex.getCause();
                        }
                        throw new RuntimeException("Error while invoking method: " + m.getName() +
                                ". Reason = " + ex.getLocalizedMessage());
                }
        }

        private boolean containsAnnotation(Method m, Class<?> clazz) {
                Annotation[] annotations = m.getDeclaredAnnotations();
                for (Annotation a : annotations) {
                        if (a.annotationType().equals(clazz))
                                return true;
                }

                return false;
        }
}

Explanatory notes: The STFCore class relies heavily on the introspection capabilities of Java. While not as powerful as that of Common Lisp, it is still far more powerful than similar facilities in most mainstream languages such as C++ and Python. The code is pretty much straightforward – simply use Reflection to create an instance of the test class (which has been loaded by STFClassLoader), retrieve the annotations of interest at each stage (Before, Test, and After), and execute all the test cases.

STFAsserts

package com.z0ltan.stf;

import java.util.List;
import java.text.MessageFormat;

public abstract class STFAsserts {
        private static final double EPS = 1e-9;

        private static final MessageFormat equalsFormatter =
                new MessageFormat("{0} is not equal to {1}");

        private static final MessageFormat nullFormatter =
                new MessageFormat("{0} is not null");

        private static final MessageFormat notNullFormatter =
                new MessageFormat("{0} is null");

        private static final MessageFormat trueFormatter =
                new MessageFormat("{0} is false");

        private static final MessageFormat falseFormatter =
                new MessageFormat("{0} is true");

        private static final MessageFormat sameFormatter =
                new MessageFormat("{0} is not the same object as {1}");

        private static final MessageFormat notSameFormatter =
                new MessageFormat("{0} is the same as {1}");

        private STFAsserts() {}

        // Equals
        public static void assertEquals(int x, int y)  {
                if (x != y) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertEquals(long x, long y)  {
                if (x != y) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertEquals(char x, char y)  {
                if (x != y) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertEquals(float x, float y)  {
                if (!(Math.abs(x-y) <= EPS)) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertEquals(double x, double y)  {
                if (!(Math.abs(x-y) <= EPS)) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertEquals(Object x, Object y)  {
                if (!x.equals(y)) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        // Array equals
        public static void assertArrayEquals(char[] x, char[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (x[i] != y[i]) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(byte[] x, byte[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (x[i] != y[i]) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(short[] x, short[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (x[i] != y[i]) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(int[] x, int[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (x[i] != y[i]) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(long[] x, long[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (x[i] != y[i]) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(float[] x, float[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (!(Math.abs(x[i]-y[i]) < EPS)) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(double[] x, double[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (!(Math.abs(x[i]-y[i]) < EPS)) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertArrayEquals(Object[] x, Object[] y)  {
                assertNotNull(x);
                assertNotNull(y);

                boolean same = true;

                for (int i = 0; i < x.length; i++) {
                        if (!x[i].equals(y[i])) {
                                same = false;
                                break;
                        }
                }

                if (!same) {
                        final String message =
                                equalsFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        // Same and not same
        public static void assertSame(Object x, Object y)  {
                if (x != y) {
                        final String message =
                                sameFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }

        public static void assertNotSame(Object x, Object y)  {
                if (x == y) {
                        final String message =
                                notSameFormatter.format(new Object[] { x, y });
                        throw new AssertionError(message);
                }
        }                                                

        // True and False
        public static void assertTrue(boolean condition)  {
                if (!condition) {
                        final String message =
                                trueFormatter.format(new Object[] { condition });
                        throw new AssertionError(message);
                }
        }

        public static void assertFalse(boolean condition)  {
                if (condition) {
                        final String message =
                                falseFormatter.format(new Object[] { condition });
                        throw new AssertionError(message);
                }
        }

        // Nulls and Non-nulls
        public static void assertNull(Object x)  {
                if (x != null) {
                        final String message =
                                nullFormatter.format(new Object[] { x });
                        throw new AssertionError(message);
                }
        }

        public static void assertNotNull(Object x)  {
                if (x == null) {
                        final String message =
                                notNullFormatter.format(new Object[] { “ null” });
                        throw new AssertionError(message);
                }
        }
}

STFClassLoader

package com.z0ltan.stf;

import javax.tools.ToolProvider;
import javax.tools.JavaCompiler;
import javax.tools.JavaFileObject;
import javax.tools.StandardJavaFileManager;
import javax.tools.Diagnostic;
import javax.tools.DiagnosticCollector;

import java.util.Locale;
import java.util.Arrays;
import java.util.logging.Logger;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import java.io.File;
import java.io.FileInputStream;
import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;

import java.text.MessageFormat;
import java.lang.reflect.Method;

public class STFClassLoader extends ClassLoader {
        private static final Logger logger = Logger.getLogger(STFClassLoader.class.getName());
        private static final Pattern p = Pattern.compile("([a-zA-Z0-9\\/]+).?([a-zA-Z0-9]+)?");

        public STFClassLoader() {
                super(STFClassLoader.class.getClassLoader());
        }

        @Override
        public Class<?> findClass(String name) throws ClassNotFoundException {
                if (name == null)
                        throw new RuntimeException("no class name provided");

                logger.info("Loading class: " + name);

                byte[] classBytes = null;
                String className = null;
                Matcher m = p.matcher(name);

                if (m.matches()) {
                        final String baseName = m.group(1);
                        final String extension = m.group(2);
                        final String sourceFileName = baseName + ".java";
                        final String classFileName = baseName + ".class";

                        boolean classFilePresent = Files.exists(Paths.get(classFileName));
                        boolean sourceFilePresent = Files.exists(Paths.get(sourceFileName));
                        boolean forceCompile = classFilePresent? isClassFileOlderThanSourceFile(classFileName, sourceFileName) : false;

                        if (!classFilePresent && !sourceFilePresent) {
                                throw new RuntimeException("Either the source file or the class file must be available!");
                        }

                        if (extension == null || extension.equalsIgnoreCase("class") || extension.equalsIgnoreCase("java")) {
                                if (!classFilePresent || forceCompile)
                                     compileSourceFile(sourceFileName);
                        } else {
                                throw new RuntimeException("invalid class file specified: " + name);
                        }

                        classBytes = findClassBytes(classFileName);

                        char separatorChar = '/';
                        if (baseName.indexOf('\\') != -1)
                                separatorChar = '\\';

                        String definedName = baseName.replace(separatorChar, '.');
                        Class<?> clazz = defineClass(definedName, classBytes, 0, classBytes.length);

                        logger.info("Finished loading class: " + definedName);

                        return clazz;
                } else {
                        throw new ClassNotFoundException();
                }
        }

        private boolean isClassFileOlderThanSourceFile(String classFile, String sourceFile) {
                try {
                        FileTime classTime = Files.getLastModifiedTime(Paths.get(classFile));
                        FileTime sourceTime = Files.getLastModifiedTime(Paths.get(sourceFile));

                        if (classTime.compareTo(sourceTime) < 0)
                                return true;
                } catch (IOException ex) {
                        logger.warning("Error while determining source and class file modified timestamps");
                        return false;
                }

                return false;
        }

        private byte[] findClassBytes(String name) {
                try (BufferedInputStream bin = new BufferedInputStream(new FileInputStream(name));
                     ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
                        byte[] buffer = new byte[4 * 1024];
                        int bytesRead = -1;

                        while ((bytesRead = bin.read(buffer)) != -1) {
                                baos.write(buffer, 0, bytesRead);
                        }

                        return baos.toByteArray();
                } catch (IOException ex) {
                        throw new RuntimeException("Error while loading class file bytes: " +
                                ex.getLocalizedMessage());
                }
        }

        private void compileSourceFile(String fileName) {
                logger.info("Compiling source file: " + fileName);

                File[] files = new File[] { new File(fileName) };

                JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
                DiagnosticCollector<JavaFileObject> collector = new DiagnosticCollector<>();
                StandardJavaFileManager manager =
                        compiler.getStandardFileManager(collector, Locale.getDefault(), Charset.forName("UTF-8"));
                Iterable<? extends JavaFileObject> units = manager.getJavaFileObjectsFromFiles(Arrays.asList(files));

                compiler.getTask(null, manager, collector, null, null, units).call();

	    boolean issuesFound = false;
                for (Diagnostic<? extends JavaFileObject> d : collector.getDiagnostics()) {
                        final String message =
                                MessageFormat.format("Error at line: {0} in source file: {1}. Reason = {2}\n",
                                d.getLineNumber(), d.getSource().toUri(),
                                d.getMessage(Locale.getDefault()));
                        logger.severe(message);
                        issuesFound = true;
                }

               if (issuesFound) {
                        throw new RuntimeException("Aborting testing.... errors found during compilation");
                }
                try {
                        manager.close();
                } catch (IOException ex) {
                        throw new RuntimeException("Compilation failed. Failed to close file manager: " +
                                        ex.getLocalizedMessage());
                }

                logger.info("Finished compiling source file: " + fileName);
        }
}

Explanatory notes: From a purely conceptual viewpoint, this is the most complex class in the whole framework. This custom class loader runs within STF and its main job is to handle any situation that might present itself when a test file is specified – whether the class file exists, is out of date or not, and whether the source file itself is present. If the class file is missing or the class file is older than the source file, the source file is automatically compiled and the byte code loaded into the JVM. The compilation is done using pure Java (more details in the previous blog) and has no dependency on system processes.

Aside from the compilation capabilities of the STFClassLoader, another compelling reason to use a custom class loader for this exercise was to ensure separation of concerns. Due to the visibility rules of class loaders in Java, the loaded class is not visible to the parent class loaders (Bootstrap, Extension, or System class loader). This helps restrict code to within the STF framework. This, in fact, is the way JEE containers as well as OSGi containers work. Java 9 introduces the concept of “modules” into Java, and while it was not designed for the exact same end purposes, there is a whole lot of overlap between the two. It should be interesting to see how Java 9 modules affect the Java ecosystem in the years to come. I have my own reservations on that feature, but overall I think it is a step in the right direction (even thought it doesn’t completely address OSGi hell!).

Demos

First off, let’s create a JAR file out of the str project so that we can run it easily from anywhere that we choose to:

Timmy Jose@WIN-3OCJRNT7NO4 MINGW64 ~/Rabota/Blogs/Java_FP (master)
$ cat manifest.txt
Main-Class: com.z0ltan.stf.STFRunner                                 

Timmy Jose@WIN-3OCJRNT7NO4 MINGW64 ~/Rabota/Blogs/Java_FP (master)
$ jar cmf stf.jar manifest.txt com/z0ltan/stf/*.class

Now stf.jar should have been created in the same directory. Let’s create some test files and test them out!

The sample test class (posiive cases) that will be used for this demo is listed out as follows:

package com.z0ltan.testing;

import com.z0ltan.stf.Before;
import com.z0ltan.stf.Test;
import com.z0ltan.stf.After;
import static com.z0ltan.stf.STFAsserts.*;

import java.util.List;
import java.util.Arrays;

public class PositiveTests {
        private char c1, c2, c3;
        private int i1, i2, i3;
        private long l1, l2, l3;
        private double d1, d2, d3;
        private String s1, s2, s3;
        private List<Integer> li1, li2, li3;
        private int[] ia1, ia2, ia3;
        private Object o1, o2, o3;

        @Before
        public void setup() {
                c1 = 't'; c2 = 'u'; c2 = 't';
                i1 = 100; i2 = 200; i3 = 100;
                l1 = 12345L; l2 = 54321L; l3 = 12345L;
                d1 = 12.34561; d2 = 12.10289; d3 = 12.34561;

                s1 = "Hello"; s2 = "World"; s3 = "Hello";

                li1 = Arrays.asList(1,2,3,4,5);
                li2 = Arrays.asList(1,2,3,4,5,6,7);
                li3 = li1;

                ia1 = new int[] { 1, 2, 3, 4, 5};
                ia2 = new int[] { 1, 2, 3};
                ia3 = new int[] { 1, 2, 3, 4, 5};

                o1 = null;
                o2 = new Object();
                o3 = o1;
        }

        // primitive types
        @Test
        public void testCharEqualSuccess() {
                assertEquals(c1, c2);
        }

        @Test
        public void testIntEqualSuccess() {
                assertEquals(i1, i3);
        }

        @Test
        public void testLongEqualSuccess() {
                assertEquals(l1, l3);
        }

        @Test
        public void testDoubleEqualSuccess() {
                assertEquals(d1, d3);
        }

        // object types
        @Test
        public void testStringEqualSuccess() {
                assertEquals(s1, s3);
        }

        // Sequence types
        @Test
        public void testListEqualSuccess() {
                assertEquals(li1, li3);
        }

        @Test
        public void testListSameSuccess() {
                assertSame(li1, li3);
        }

        // Null and Non-Null checks
        @Test
        public void testNullSuccess() {
                assertNull(o1);
        }

        @Test
        public void testNonNullSuccess() {
                assertNotNull(o2);
        }

        // True and False checks
        @Test
        public void testTrueSucess() {
                assertTrue(1==1);
        }

        @Test
        public void testFalseSuccess() {
                assertFalse(1==2);
        }

        @After
        public void cleanup() {
                li1 = li2 = li3 = null;
                o1 = o2 = o3 = null;
                ia1 = ia2 = ia3 = null;
        }
}

This can be invoked in any of the following ways:

java -jar stf.jar com/z0ltan/testing/PositiveTests.class

or

java -jar stf.jar com/z0ltan/testing/PositiveTests.java

or even

java -jar stf.jar com/z0ltan/testing/PositiveTests

In all of these cases, the STFClassLoader will ensure that if the class file is not present (or if the class file is out of date), the source file will be compiled to get the latest byte code for this test class. The output for this sample test class is as shown below:

Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFClassLoader findClass
INFO: Loading class: com/z0ltan/testing/PositiveTests
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFClassLoader compileSourceFile
INFO: Compiling source file: com/z0ltan/testing/PositiveTests.java
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFClassLoader compileSourceFile
INFO: Finished compiling source file: com/z0ltan/testing/PositiveTests.java
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFClassLoader findClass
INFO: Finished loading class: com.z0ltan.testing.PositiveTests
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runBefore
INFO: Running Setup code
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runBefore
INFO: Finished running Setup code
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testCharEqualSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testCharEqualSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testIntEqualSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testIntEqualSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testStringEqualSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testStringEqualSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testLongEqualSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testLongEqualSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testListSameSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testListSameSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testListEqualSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testListEqualSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testDoubleEqualSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testDoubleEqualSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testFalseSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testFalseSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testNullSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testNullSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testTrueSucess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testTrueSucess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testNonNullSuccess
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testNonNullSuccess - [PASSED]
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runAfter
INFO: Running Cleanup code
Aug 25, 2016 5:47:12 PM com.z0ltan.stf.STFCore runAfter
INFO: Finished running Cleanup code

As can be seen from the logged output, the test file is compiled, and then the byte code is loaded using the custom class loader first since this is the first run. On subsequent runs, if the source file is newer than the class file, the source will be automatically recompiled and then loaded as well. This is very useful because it allows for faster modifications and checks without the user having to recompile everything manually after every change. This helps maintain mental flow.

And just to check and ensure that negative test cases are caught as expected, let’s simply create a test for some negative cases (pretty much the same cases as those in the positive tests, but with the conditions reversed):

package com.z0ltan.testing;

import com.z0ltan.stf.Before;
import com.z0ltan.stf.Test;
import com.z0ltan.stf.After;
import static com.z0ltan.stf.STFAsserts.*;

import java.util.List;
import java.util.Arrays;

public class NegativeTests {
        private char c1, c2, c3;
        private int i1, i2, i3;
        private long l1, l2, l3;
        private double d1, d2, d3;
        private String s1, s2, s3;
        private List<Integer> li1, li2, li3;
        private int[] ia1, ia2, ia3;
        private Object o1, o2, o3;

        @Before
        public void setup() {
                c1 = 't'; c2 = 'u'; c2 = 't';
                i1 = 100; i2 = 200; i3 = 100;
                l1 = 12345L; l2 = 54321L; l3 = 12345L;
                d1 = 12.34561; d2 = 12.10289; d3 = 12.34561;

                s1 = "Hello"; s2 = "World"; s3 = "Hello";

                li1 = Arrays.asList(1,2,3,4,5);
                li2 = Arrays.asList(1,2,3,4,5,6,7);
                li3 = li1;

                ia1 = new int[] { 1, 2, 3, 4, 5};
                ia2 = new int[] { 1, 2, 3};
                ia3 = new int[] { 1, 2, 3, 4, 5};

                o1 = null;
                o2 = new Object();
                o3 = o1;
        }

        // primitive types
        @Test
        public void testCharEqualFail() {
                assertEquals(c1, c3);
        }

        @Test
        public void testIntEqualFail() {
                assertEquals(i1, i2);
        }

        @Test
        public void testLongEqualFail() {
                assertEquals(l1, l2);
        }

        @Test
        public void testDoubleEqualFail() {
                assertEquals(d1, d2);
        }

        // object types
        @Test
        public void testStringEqualFail() {
                assertEquals(s1, s2);
        }

        // Sequence types
        @Test
        public void testListEqualFail() {
                assertEquals(li1, li2);
        }

        @Test
        public void testListSameFail() {
                assertSame(li1, li2);
        }

        // Null and Non-Null checks
        @Test
        public void testNullFail() {
                assertNull(o2);
        }

        @Test
        public void testNonNullFail() {
                assertNotNull(o1);
        }

        // True and False checks
        @Test
        public void testTrueFail() {
                assertTrue(1==100);
        }

        @Test
        public void testFalseFail() {
                assertFalse(1==1);
        }

        @After
        public void cleanup() {
                li1 = li2 = li3 = null;
                o1 = o2 = o3 = null;
                ia1 = ia2 = ia3 = null;
        }
}

Let’s give it a go and see that the output is as expected:

Aug 25, 2016 6:02:29 PM com.z0ltan.stf.STFClassLoader findClass
INFO: Loading class: com/z0ltan/testing/NegativeTests.java
Aug 25, 2016 6:02:29 PM com.z0ltan.stf.STFClassLoader compileSourceFile
INFO: Compiling source file: com/z0ltan/testing/NegativeTests.java
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFClassLoader compileSourceFile
INFO: Finished compiling source file: com/z0ltan/testing/NegativeTests.java
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFClassLoader findClass
INFO: Finished loading class: com.z0ltan.testing.NegativeTests
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runBefore
INFO: Running Setup code
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runBefore
INFO: Finished running Setup code
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testDoubleEqualFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testDoubleEqualFail- [FAILED]. Reason: 12.346 is not equal to 12.103
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testStringEqualFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testStringEqualFail- [FAILED]. Reason: Hello is not equal to World
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testLongEqualFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testLongEqualFail- [FAILED]. Reason: 12,345 is not equal to 54,321
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testListEqualFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testListEqualFail- [FAILED]. Reason: [1, 2, 3, 4, 5] is not equal to [1, 2, 3, 4, 5, 6, 7]
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testCharEqualFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testCharEqualFail- [FAILED]. Reason: t is not equal to
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testIntEqualFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testIntEqualFail- [FAILED]. Reason: 100 is not equal to 200
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testNullFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testNullFail- [FAILED]. Reason: java.lang.Object@14faf53 is not null
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testFalseFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testFalseFail- [FAILED]. Reason: true is true
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testListSameFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testListSameFail- [FAILED]. Reason: [1, 2, 3, 4, 5] is not the same object as [1, 2, 3, 4, 5, 6, 7]
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testNonNullFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testNonNullFail- [FAILED]. Reason: null is null
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Running Test Case: testTrueFail
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runTests
INFO: Test Case: testTrueFail- [FAILED]. Reason: false is false
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runAfter
INFO: Running Cleanup code
Aug 25, 2016 6:02:30 PM com.z0ltan.stf.STFCore runAfter
INFO: Finished running Cleanup code

Immaculate!

Learnings and Conclusion

The main aim of writing this framework was to see how a small, lightweight, and minimalist test framework might be written that was also sufficiently powerful and generic to be usable in a wide variety of situations. In that respect, I feel that this project has been successful.

In terms of learning, the following observations might be made:

  • A simple test framework can be readily implemented in only a few classes and a few hundred lines of code.
  • Using Reflection in Java is always a very tricky affair. It feels rather stilted, especially after using dynamic languages for much more powerful introspection and runtime modification of objects (Common Lisp, for instance). For instance, as can be seen in STFCore, the behaviour of InvocationTargetException in Java is to gobble up any exception thrown by the reflective call – this applies to both Checked and Unchecked exceptions. This is the reason why the explicit check for the cause of the exception needs to be made. Very clunky indeed!
  • There is a lot of boilerplate code in the STFAsserts file for providing overloaded versions of various methods. Even if some of the common logic were to be abstracted away into some helper methods, that would not really help with the overall code bloat. In Common Lisp, I could have simply written a macro to generate the code and saved dozens of methods of code. Even C++’s metaprogramming capabilities would have been better than Java’s facilities.
  • Implementing a test framework is simple enough. However, making it completely safe (think concurrency) and efficient, however, is an entirely different ball game. This version is safe enough because each test class is run on a different JVM process, but it’s far from being anywhere close to efficient or extensible.
  • Understanding Class Loaders is a crucial part of any well-rounded developer’s repertoire. There is no better way to truly grok it than to implement one yourself!
  • Finally, JShell really help! It seriously saved me a ton of mundane typing to test out various snippets of code and corner-cases.

Potentially trivially implemented improvements:

  • Support annotations with elements: @Test(expected=MyException.class) for instance.
  • Linking with source code to provide granular information about where in the code the specific test failed.
  • Show a summary of how many test cases have passed and how many have failed as well as the total number of test cases executed.
  • Support for full-fledged test suites.
  • Linking with IDEs such as Eclipse, NetBeans, and IntelliJ.

In some future post, I will discuss the implications of creating custom class loaders in Java, and class loaders themselves. This is a very important topic, and creating my own class loader for the STF framework has given me a fresh perspective into this whole esoteric domain of Java (more precisely, the JVM).

For now the next few posts will be related to the aforementioned functional implementations of the Untyped Lambda Calculus in Java, Common Lisp, and in C++. The Java version of this functional library will make extensive use of the STF framework for its testing. Note that the aim will be to not only create a functional library in each of these languages, but also to implement the libraries themselves as functionally as possible. In the future, I might write versions of this library in Python and Haskell as well. Haskell especially is indispensable when studying Functional Programming from a pragmatic viewpoint.

Written by Timmy Jose

August 25, 2016 at 6:42 pm

Posted in Java, Programming

Tagged with , ,

Compiling Java files dynamically using pure Java

leave a comment »

Java 6 (and above) provides a wonderful facility to compile Java code using pure Java code. This is done through the tools provided in the javax.tools package. What this allows us to do is to load and compile batches of files without needing to spawn off a separate system process which is not only brittle but highly unreliable in terms of error handling and status reporting.

The idea for this post came about as I was working on creating a custom class loader for use with my STF (Simple Test Framework) testing framework (next post). The main use case that I wanted to support was to allow the user to specify a class name, and then have the class loader load the source file if the class file didn’t exist, compile the source file, and then load that class instead. The advantage of this is that the whole process is completely opaque to the user — in case the test file has not already been compiled, the class loader will take care of that. The onus of linking all the dependencies of the test file via the class path is, of course, on the user.

Dynamic compilation of Java code Pre-Java 6

Before the introduction of the javax.tools package in Java 6, if we wanted to compile Java source files on the fly, we would have to do something like the following:

import java.util.logging.Logger;
import java.text.MessageFormat;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;


public abstract class CompileJavaFileRuntime {
        private static final Logger logger =  Logger.getLogger(CompileJavaFileRuntime.class.getName());
        private static final MessageFormat formatter = new MessageFormat("javac -cp . {0}”);

        private CompileJavaFileRuntime() {}

        public static void main(String[] args) {
                if (args == null || args.length == 0)
                        throw new RuntimeException("specify at least one source file");

                compileSourceFiles(args);
        }

        public static void compileSourceFiles(String[] files) {
                for (String file : files) {
                        compileSourceFile(file);
                }
        }

        public static void compileSourceFile(String file) {
                logger.info("Compiling file: " + file);

                String command = formatter.format(new String[] { file });

                Process p = null;
                try {
                       p = Runtime.getRuntime().exec(command);
                       if (p != null) {
                               p.waitFor();
                               logger.info("Diagnostic info for file: " + file);
                               try (BufferedReader reader = 
                                         new BufferedReader(new InputStreamReader(p.getInputStream()))) {
                                        logger.info(reader.readLine());
                               } catch (IOException ex) {
                               }
                       }
                } catch (InterruptedException | IOException ex) {
                        logger.severe("Error while compiling source file: " + file + 
                           ". Reason = " + ex.getLocalizedMessage());
                        throw new RuntimeException(ex);
                } finally {
                        if (p != null && p.isAlive()) {
                                p.destroy();
                        }
                }
                logger.info("Finished compiling file: " + file);
        }
}

Aug 23, 2016 11:19:25 PM CompileJavaFileRuntime compileSourceFile
INFO: Compiling file: Bar.java
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: Diagnostic info for file: Bar.java
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: null
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: Finished compiling file: Bar.java
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: Compiling file: ClassClient.java
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: Diagnostic info for file: ClassClient.java
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: null
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: Finished compiling file: ClassClient.java
Aug 23, 2016 11:19:26 PM CompileJavaFileRuntime compileSourceFile
INFO: Compiling file: ClassPreamble.java
Aug 23, 2016 11:19:27 PM CompileJavaFileRuntime compileSourceFile
INFO: Diagnostic info for file: ClassPreamble.java
Aug 23, 2016 11:19:27 PM CompileJavaFileRuntime compileSourceFile
INFO: null
Aug 23, 2016 11:19:27 PM CompileJavaFileRuntime compileSourceFile
INFO: Finished compiling file: ClassPreamble.java

or, better still,

import java.util.logging.Logger;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;


public abstract class CompileJavaFileProcessBuilder {
        private static final Logger logger = Logger.getLogger(CompileJavaFileProcessBuilder.class.getName());

        private CompileJavaFileProcessBuilder() {}

        public static void main(String[] args) {
                if (args == null || args.length == 0)
                        throw new RuntimeException("specify at least one source file");

                compileSourceFiles(args);
        }

        public static void compileSourceFiles(String[] files) {
                for (String file : files) {
                        compileSourceFile(file);
                }
        }

        public static void compileSourceFile(String file) {
                logger.info("Compiling file: " + file);

                Process p = null;
                try {
                       p = new ProcessBuilder("javac", “-cp”, “.”, file).start();
                       if (p != null) {
                               p.waitFor();
                               logger.info("Diagnostic info for file: " + file);
                               try (BufferedReader reader = 
                                         new BufferedReader(new InputStreamReader(p.getInputStream()))) {
                                        logger.info(reader.readLine());
                               } catch (IOException ex) {
                               }
                       }
                } catch (InterruptedException | IOException ex) {
                        logger.severe("Error while compiling source file: " + file + 
                           ". Reason = " + ex.getLocalizedMessage());
                        throw new RuntimeException(ex);
                } finally {
                        if (p != null && p.isAlive()) {
                                p.destroy();
                        }
                }
                logger.info("Finished compiling file: " + file);
        }
}

Aug 23, 2016 11:18:08 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: Compiling file: Bar.java
Aug 23, 2016 11:18:09 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: Diagnostic info for file: Bar.java
Aug 23, 2016 11:18:09 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: null
Aug 23, 2016 11:18:09 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: Finished compiling file: Bar.java
Aug 23, 2016 11:18:09 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: Compiling file: ClassClient.java
Aug 23, 2016 11:18:10 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: Diagnostic info for file: ClassClient.java
Aug 23, 2016 11:18:10 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: null
Aug 23, 2016 11:18:10 PM CompileJavaFileProcessBuilder compileSourceFile
INFO: Finished compiling file: ClassClient.java

In both cases, we create native OS processes to compile each Java file in turn. Both of these version are essentially the same, but the ProcessBuilder approach allows much more customisation that is possible with the barebones Runtime.getRuntime().exec() approach. For this trivial case, however, the difference is inconsequential.

Dynamic compilation of Java code from Java 6 onwards

Consider the following class that can compile the source files passed in as command-line arguments:

import javax.tools.ToolProvider;
import javax.tools.JavaCompiler;
import javax.tools.StandardJavaFileManager;
import javax.tools.JavaFileObject;
import javax.tools.Diagnostic;
import javax.tools.DiagnosticCollector;

import java.io.File;
import java.io.IOException;

import java.util.logging.Logger;
import java.util.Arrays;
import java.util.Locale;

import java.nio.charset.Charset;
import java.text.MessageFormat;


public abstract class CompileJavaFileTools {
        private static final Logger logger = Logger.getLogger(CompileJavaFileTools.class.getName());

        private CompileJavaFileTools() {}

        public static void main(String[] args) {
                if (args == null || args.length == 0)
                        throw new RuntimeException("specify at least one source file");

                compileSourceFiles(args);
        }

        public static void compileSourceFiles(String[] files) {
                final String fileNames = ToolsHelper.stringArrayToString(files);
                logger.info("Compiling source files: " + fileNames);

                File[] fileArray = new File[files.length];
                for (int i = 0; i < fileArray.length; i++) {
                        fileArray[i] = new File(files[i]);
                }
                                                        
                JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
                DiagnosticCollector<JavaFileObject> collector = new DiagnosticCollector<>();
                StandardJavaFileManager manager = 
                	compiler.getStandardFileManager(collector, Locale.getDefault(),      								        Charset.forName("UTF-8"));
                Iterable<? extends JavaFileObject> compilationUnits = 	          
                	manager
			.getJavaFileObjectsFromFiles(Arrays.asList(fileArray));

                compiler.getTask(null, manager, collector, null, null, compilationUnits).call();

                for (Diagnostic<? extends JavaFileObject> d : collector.getDiagnostics()) {
                        final String message = MessageFormat.format("Error at line: {0}, in file: {1}\n", 
                                d.getLineNumber(), d.getSource().toUri());
                        logger.severe(message);
                }

                try {
                        manager.close();
                } catch (IOException ex) {
                        logger.severe("Error while closing file manager. Message = " 
                                + ex.getLocalizedMessage());
                }                        

                logger.info("Finished compiling source files: " + fileNames);
        }

        static class ToolsHelper {
                public static String stringArrayToString(String[] strings) {
                        StringBuffer buffer = new StringBuffer();

                        if (strings == null)
                                return null;

                        for (String string : strings) {
                                buffer.append(string);
                                buffer.append(" ");
                        }

                        return "[ " + buffer.toString().trim() + " ]";
                }
        }
}
                                                        
                                
Aug 23, 2016 11:53:59 PM CompileJavaFileTools compileSourceFiles
INFO: Compiling source files: [ Bar.java ClassPreamble.java ]
Aug 23, 2016 11:54:00 PM CompileJavaFileTools compileSourceFiles
INFO: Finished compiling source files: [ Bar.java ClassPreamble.java ]

Explanation:

The sequence of steps to compile source files using the javax.tools package is as follows:

  • Create an instance of the Java Compiler:
    JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
    
  • Create an instance of a DiagnosticCollector (which implements DiagnosticListener). This class is used to log any compilation issues:
    DiagnosticCollector<JavaFileObject> collector = new DiagnosticCollector<>();
    
  • Get a handle to the standard file handler to handle source files:
     StandardJavaFileManager manager = 
                    	compiler.getStandardFileManager(collector, Locale.getDefault(),      								        Charset.forName("UTF-8"));
    
  • Create an internal representation of the compilation units (one per source file):
    Iterable<? extends JavaFileObject> compilationUnits = 	          
                    	manager
    			.getJavaFileObjectsFromFiles(Arrays.asList(fileArray));
    
  • Finally perform the actual compilation. The parameters passed to the getTask method are – output stream writer, the file manager, an instance of DiagnosticListener, options, classes, and the compilation units:
    compiler.getTask(null, manager, collector, null, null, compilationUnits).call();
    
  • Finally, don’t forget to close the file manager handle to prevent any leaks:
    try {
                            manager.close();
                    } catch (IOException ex) {
                            logger.severe("Error while closing file manager. Message = " 
                                    + ex.getLocalizedMessage());
                    }          
    

As simple as it gets! This is not only more reliable, but also more flexible that spawning separate processes to compile Java source files dynamically. Also note that there is better error reporting in the form of Diagnostic Listeners. This is infinitely superior to parsing error messages from a process’ output stream.

Closing thoughts

The classes and interfaces provided in the javax.tools package supply the developer with powerful tools to generate byte code dynamically. This can be not only, as in the example mentioned, develop efficient and simple ways to generate compiling class loaders, but also build more complex tools that depend on dynamic generation of byte code such as compilers. All in all, the main advantage of this approach is that there is no dependency on any external tools. The JDK provides all the functionality that one needs to write platform-independent, dependable byte code generation code.

Written by Timmy Jose

August 24, 2016 at 12:05 am

Functional Programming – a Quick and Dirty Introduction

leave a comment »

What is Functional Programming?

Functional Programming is, despite its newfound popularity in recent times, a rather old programming paradigm. What exactly constitutes Functional Programming is a surprisingly difficult concept to explain today. Part of the reason why that is so is because of the fact that almost every language tends to do things in its own way and call it “functional”, and even developers tend to use the term for a wide variety of situations. In a sense, it has been reduced to a mere buzzword. Ultimately, there is no real consensus as to what exactly Functional Programming is. In this blog, I will try and give my own perspective on the whole situation and try and make sense of this highly messy situation we find ourselves in today.

Functional Programming is, historically speaking, considered to be a paradigm of programming in which programming is done using only functions. What do we mean by function? We, of course, mean mathematical functions. A function in mathematics is a mapping of each value taken from a set of values (called the Domain of the function) to a unique value in another set (which may be the same set, and which is called the Range of the function). There are various types of functions – bijective, injective, surjective, identity, constant, empty, etc. However, we need not bother ourselves with the distinction. The only important bit to remember is that all functions (whatever sub-type they may be) have the following two essential properties:

  • Each element in the Range set is used for “mapping”
  • Each such element in the Range had a unique value in the Range

For instance, suppose we have f(x) = sin(x), f(x) here defines a function. Let us plot for this function to better understand this function:

sinx

Here we have plotted the graph for the f(x) using Wolfram Alpha. The function is clearly periodic (in fact, it has a period of 2π radians), and we can clearly see from the graph that the Domain is the set of all real numbers, and the Range is the set of real numbers in the bounded range [-1, 1]. It is easy to see that both the constraints incumbent upon a function are fulfilled – each real number in the Domain is used (continuous curve), and each real number in the Domain has a unique value in the Domain (single curve). Note that two elements in the Domain may have the same value in the Range. There is no restriction on that. So just why are we flogging this horse to death? Well, it’s because it is vitally important to unambiguously understand what a function is and what it is not. A lot of times, people mistake a relation for a function. A relation is a superset of a function. A relation may map the same element in the Domain to multiple elements in the Range. For instance, the relation “is a son of” is a Relation and not a Function. This is because a person is the son of both his mother as well as his father. What we’re interested in in Functional Programming is the concept of a function.

Functional Programming is therefore a paradigm (note the emphasis – the reason will shortly become clear) of programming in which functions form the basis of all processing. Historically, it has its earliest roots in the Lambda Calculus. Alonzo Church developed this beautifully elegant framework early in the 20th Century, and in its most extreme form, even data was created using only single-parameter functions (Church Numerals). However, even if we assume that data exists in a different world compared to functions, the basic implication of using the idea of a mathematical function in programming is that for every input to a function, we get a unique value, and that this is repeatable ad infinitum. This is also known as “referential transparency”, which means that in compiled languages, if we can determine the parameter to the function, we can replace function calls with that parameter to a constant! This behaviour also means that we can run such functions in parallel, and testing becomes trivial.

As mentioned in the last paragraph, a language doesn’t necessarily have to ensure that it supports only functions (in a mathematical sense). The only constraints are writing functions that generate their output only on the basis of their parameters (no globals involved), always generating the same output for the same argument(s), and ensuring that no side-effects take place within the body of the function. This means that any language can be written in a functional style. However, as we will see in the next section, there are some concepts that are considered the core concepts of Functional Programming, and different languages come with different levels of built-in support for these concepts.

Functional Programming concepts

Functional Programming today is considered to consist of the following core ideas:

    • First-Class functions and Higher-Order functions:These are related concepts which (from a programming perspective) mean that in a language that supports first-class functions (or higher-order functions), a function is treated on par with any other language entity. This means that functions can be bound to variables, defined within other functions, passed to other functions, and returned from other functions. Most of the strength of Functional Programming derives from these features, especially from two crucial ideas – Currying, and Partial Application.

      Currying and Partial Application are intimately connected with each other. To understand them better, let us take the help of a simple example. Consider that we wish to write a function that takes three parameters and produces a result. Here’s how we might write it in Haskell:

      mult :: (Num a) => (a, a, a) -> a
      mult (x, y, z) = x * y * z
      
      *Main> mult (1, 2, 3)
      6
      

      As expected, it evaluates the product correctly. Nothing special here – this is exactly how we would write this function in any run-of-the-mill imperative language.

      Now let’s try something else. Let’s try to write this as a function that will accept one parameter at a time:

      mult' :: (Num a) => a -> (a -> (a -> a))
      mult' x y z = x * y * z
      
      *Main> let f = mult' 1
      *Main> :t f
      f :: Num a => a -> a -> a
      
      *Main> let g = f 2
      *Main> :t g
      g :: Num a => a -> a
      
      *Main> let h = g 3
      *Main> :t h
      h :: Num a => a
      
      *Main> h
      6
      

      So what changed compared to the previous function? Observe that the type of the function has changed. Previously, it required a triplet as a single parameter. Here’s how one would read the type signature: mult’ is a function that takes a numeric parameter and returns a function that takes a numeric parameter and returns a function that takes a numeric parameter and which finally returns a numeric value as the return value of the function.

      The main advantage of this is that this allows us to make our multiplication function out of functions of a single parameter each. To notice that better, look at the type signatures of ‘f’, ‘g’, and ‘h’ at each stage. They correspond exactly with the verbal description of mult’. Functions like mult’, which implement multi-parameter functions in terms of functions that take a single parameter each are called “curried” functions. ‘f’, ‘g’, and ‘h’ demonstrate “partial application” of the overall mult’ function whereby we can build up a series of functions from a curried function by supplying an argument at a time. Yes, even ‘h’, which is a value can be considered to be a function in Functional Programming land! First-class functions, remember? Also note how both these concepts point back to the Lambda Calculus as discussed in the previous section. In fact, in order to make this equivalence blatantly clear, we could implement mult’ explicitly in terms of lambdas of a single parameter each (note that \x is the Haskell syntax for a lambda expression that takes the parameter ‘x’):

      mult'' :: (Num a) => a -> a -> a -> a
      mult'' = \x -> \y -> \z -> x + y + z
      
      Main> let f = mult' 1
      *Main> :t f
      f :: Num a => a -> a -> a
      
      *Main> let g = f 2
      *Main> :t g
      g :: Num a => a -> a
      
      *Main> let h = g 3
      *Main> :t h
      h :: Num a => a
      
      *Main> h
      6
      

      Exactly the same result as expected. In fact, all functions in Functional Programming can be considered to be built up of single-parameter functions. Nifty, isn’t it?

    • Pure Functions:
      A pure function is a function whose return value is determined entirely by its arguments, and which returns the same value for the same set of arguments each time the function is invoked. As mentioned earlier, this also means that the function must not have any side-effects (such as modifying a global variable, or even accessing the value of a mutable global variable). This also automatically provides a favourable property called “Referential Transparency” as explained before.Since I/O is the major source of side-effects, how can we do anything useful with such constaints? Well, different languages deal with this conundrum in different ways – Haskell tries to remain purely functional by using the concept of a “Monad”. Specifically, it uses the IO Monad to wrap the IO operations within an abstraction that takes the current state of the world, performs the modifications, and then returns this modified state as the new state of the world. Mathematically speaking, this ensures purity in Haskell even during IO, but of course, side-effects have happened in the meantime and the whole point appears moot. Other languages such as Clojure opt for a more pragmatic approach and accept some side-effects as a fact of life. Clojure provides some very efficient immutable data-structures that ensure that mutation is almost never needed. It also has a very strong STM (Shared Transactional Memory) to wrap IO operations within transactions that ensure that the state of the entire system remains consistent. Yet other languages such as Common Lisp put the onus of ensuring proper handling of side-effects on the user.
    • Recursion:Recursion is the process whereby a function calls itself (or another function). This is, apart from Higher-Order Functions, the most important concept in practical Functional Programming. In fact, most of the earlier Lisps, especially Scheme, depended tremendously on recursion as a substitute for looping (which is another major source of side-effects). This is why most of the languages in the Lisp family (especially Common Lisp and Scheme) have highly-optimised implementations of operations on the list data-structure (a list is how S-expressions in Lisp are represented).

      Strangely enough, in Common Lisp, the standard does not mandate Tail-Call Optimisation (TCO). However, most implementations come bundled with TCO. TCO is simply a way of converting recursive calls in the “tail-call position” to a simple branch (goto). This ensures that the stack does not blow up when executing deeply recursive calls. To get an idea of this, consider the following example:

      The factorial of a number is defined as:

      n! = \begin{cases}  1 & if n = 0\\  n(n-1) & if n \geq 1  \end{cases}

      This may be directly translated into code in Common Lisp as:

      CL-USER> (defun factorial (n)
                 (if (zerop n)
                     1
                     (* n (factorial (1- n)))))
      FACTORIAL
      
      CL-USER> (factorial 20)
      2432902008176640000
      

      However, the above code will eventually blow up the stack even for small values of ’n’. To ensure that we don’t end up with stack overflow errors, let’s convert this program into a TCO version of the same algorithm:

      CL-USER> (defun factorial (n)
                 (labels ((f (n acc)
                            (if (zerop n)
                                acc
                                (f (1- n) (* acc n)))))
                   (f n 1)))
      FACTORIAL
      
      CL-USER> (factorial 100)
      93326215443944152681699238856266
      70049071596826438162146859296389
      52175999932299156089414639761565
      18286253697920827223758251185210
      916864000000000000000000000000
      

      How is this function TCO but not the previous one? Remember that a function’s arguments are always evaluated before the function call in made. In the previous case, the * operator is the culprit – it tries to multiply the recursive call and the value of n. This ensures that the current state of the function is saved on the stack, and finally the stack is unwound, the multiplications performed, and the result returned.

      In the second example, we remove this binding and ensure that the recursive call is independent of the value of the ‘acc’ variable, which contains the running product which is the factorial of the supplied argument. This is called TCO. Note though that many (if not all) Common Lisp implementations may convert the first example into a TCO version anyway. However, in many cases, the compiler can’t perform this magic conversion, and the programmer has to ensure that the recursion is indeed TCO.

    • Strict vs Non-Strict (Lazy):
      Strict evaluation evaluates an expression and returns the complete value immediately whereas non-strict (better known as lazy evaluation) evaluation only generates as many results as are currently requested.Strict evaluation has the advantage that performance is deterministic with such an evaluation strategy. The main disadvantage is that certain operations are not possible with such a strategy. For instance, suppose we wanted to process a series of numbers and we do not know upfront how many of these numbers we’ll be using. Under strict evaluation this is not possible. We have to know the size of the data set before processing it and if we try to store a large data set, it might cause the whole program to crash.

      Non-strict evaluation solves this very situation by processing only as many elements as currently required (this is how generators, used extensively in Python, are built). This enables us to even define infinite streams of data! Non-strict evaluation also has the added benefit that not all branches of a computation need be done. In an eager language, this is not really possible. The main problem with non-strict evaluation is that performance is non-deterministic. This can lead to issues when trying to profile a large software application. This is also the reason that previously purely non-strict languages such as Haskell are providing more and more support for strict evaluation while keeping non-strict evaluation as the default mode. Clojure does this as well.

    • Strong Type Systems:A type system is a way of assigning a property called a “type” to entities in the programming language such that they may be classified into sets whose collective properties can then be defined, and which then allows the compiler to check for invalid programs during compilation itself. Even though it may appear that a type system is exclusively associated with statically-typed languages, that is not so. Even dynamic languages may be strongly (Common Lisp) or weakly (JavaScript) typed, and the environment is responsible for defining correct behaviour. The only difference between those two categories of typed languages is that statically-typed languages provide us with a mechanism to explicitly specify types for program entities ourselves.

      My personal opinion is that this is not really a core concept of Functional Programming. However, most experts today claim that this is a vital pillar of the whole paradigm. In any case, one cannot truly work with modern functional languages unless one has a good understanding of the different types of type systems in place today. Most of the statically typed functional languages have type systems based on the Hindley-Milner system. Haskell is one such example. Scala, however, despite being strongly typed, cannot fully support Hindley-Milner because of its OOP nature.

Conclusion and Next Steps

This was a brief but comprehensive introduction to what constitutes Functional Programming today. In the next few posts, I will examine these concepts in a few modern languages (whether considered functional by the Hoi Polloi or otherwise): Common Lisp, Java, and C++. In the near future, I will also examine the same for Python, and most importanly, Haskell. Haskell is arguably the most prominent purely functional language in use today. I will (in the not-too-distant future) write a series of posts discussing and researching Functional Programming in-depth, and my vehicle of choice for that endeavour will be Haskell!

Written by Timmy Jose

August 15, 2016 at 9:30 am

Streams in Java – a Hand-on Approach

leave a comment »

In this blog post, I will discuss the Streams feature in Java 8 (and above) from a pragmatic viewpoint. I will describe it and its uses as I have experienced myself.

What is this Stream business after all?

Some people new to Java often get confused with the term “stream”. Java (before version 8) already used “streams” to mean essentially the same thing that C and C++ used the term for — a conceptual stream of data — whether it be bytes or characters. This is why there is a plethora of interfaces and classes in the java.util.io package that has “stream” in its name. However, this is not what the new Streams API is all about.

In Java 8, a lot of features were introduced that enhanced support for Functional Programming in Java – lambdas, method references, and streams. A Stream in this new context is simply a way of streaming data through a set of operations that process the data at every stage of processing. It does not really matter how the data came to be in the first place — it might have been read in from a file, from a socket, from another String, from a database , etc. What matters though is that the data can essentially be passed through different functional operations and then collected into another data structure or printed out to some output (console, file, database, another stream). The important part is that the data at every stage is preserved — there is no modification of the original data structure (as will be demonstrated through examples later on). This is the reason why it supports functional programming concepts and makes for very terse and (for those who are familiar with it) readable code. Of course, there are pros and cons with this new feature.

Streams therefore operate at a much higher conceptual level than merely processing data through methods.

The Hitchhiker’s Guide to the Streams API in Java

The Stream API is specified in the java.util.stream package. There are a few very important interfaces defined here:

  • Stream
  • IntStream
  • LongStream
  • DoubleStream
  • Collector

Of these, we don’t really make use of Stream and Collector all that much directly. Instead, we use them through custom classes that implement these interfaces. The Stream interface is pretty much implemented by all major linear interfaces – List, Set, Queue, and Deque. Note that this support is provided by virtue of the fact that the Collection interface has a “default method”, default Stream stream() and also its parallel counterpart, default Stream parallelStream(). The reason (as mentioned in an earlier post) that the Collection interface itself doesn’t extend the Stream interface is in order to ensure that legacy code is not broken. By its very nature, streams are not implemented for non-linear data structures such Map. Only linear data structures have a well-defined way of streaming their data for processing by various operations.

The available classes in this package are:

  • Collectors
  • StreamSupport

Of these two, we will only ever be concerned with Collectors. StreamSupport provides low-level APIs that library writers can utilise to create their own versions of Streams.

The Collectors class is an extremely important class that provides methods that return a Collector object. Further processing is then done by the methods of the Stream interface, such as the “collect” method. This will become much clearer when we get down to examples.

Examples of Streams in action

So let’s down and dirty with it! First off, let’s start up a JShell session and get some of the basic imports that we need for the demo out of the way:

jshell> import java.util.function.*

jshell> import java.util.stream.*

jshell> /imports
|    import java.io.*
|    import java.math.*
|    import java.net.*
|    import java.util.concurrent.*
|    import java.util.prefs.*
|    import java.util.regex.*
|    import java.util.function.*
|    import java.util.stream.*
|    import java.util.*

As we can see, many of the most commonly used packages are already imported by default in JShell. However, the java.util.function and java.util.stream packages need to be explicitly imported.

Now that we’ve got that out of the way, let’s start off with some simple examples.

  • Let’s create a list of names, filter out those names which are longer than 3 characters, convert them to upper case, collect them in a set (remove duplicates), and then print them all out in order:
    jshell> List names = Arrays.asList("Pat", "Mike", "Sally", "Robert", "Sam", "Matt", "Timmy", "Gennady", "Petr", "Slava", "Zach", "Paula", "Meg", "Matt", "Mike");
    names ==> [Pat, Mike, Sally, Robert, Sam, Matt, Timmy, Gennady, Petr, Slava, Zach, Pau ...
    
    jshell> names.stream().
       ...> filter((s) -> s.length() > 3).
       ...> map(String::toUpperCase).
       ...> collect(Collectors.toSet()).
       ...> forEach(System.out::println)
    GENNADY
    MIKE
    ZACH
    TIMMY
    PETR
    SLAVA
    ROBERT
    MATT
    SALLY
    PAULA
    

    Note that String::toUpperCase represents what is known as a “method reference”. They can also be used as the target of a Functional Interface. In this case, String::toUpperCase as used in the context of the “map” operation is equivalent to the following lambda expression:

     map((s) -> s.toUpperCase())
    

    It is just more convenient and terse to use method references whenever possible in lieu of lambda expressions, especially for well known methods such as in this case.

    And just to emphasise the point that the actual data structure itself has not been modified:

    jshell> names.forEach(System.out::println)
    Pat
    Mike
    Sally
    Robert
    Sam
    Matt
    Timmy
    Gennady
    Petr
    Slava
    Zach
    Paula
    Meg
    Matt
    Mike
    

    The same code before Java 8 might look something like so:

    jshell> interface FilterPredicate {
       ...>          boolean test(T t);
       ...>      }
    |  created interface FilterPredicate
    
    jshell>      
    
    jshell>      interface MapFunction {
       ...>          R apply(T t);
       ...>      }
    |  created interface MapFunction
    
    jshell>
    
    jshell> List filterNames(List names, 	     FilterPredicate predicate)    {
       ...>          List filteredNames = new ArrayList();
       ...>          
       ...>          for (String name : names) {
       ...>              if (predicate.test(name))
       ...>                 filteredNames.add(name);
       ...>          }
       ...>          return filteredNames;
       ...>      }
    |  created method filterNames(List,FilterPredicate)
    
    
    
    jshell>
    
    jshell>      List mapNames(List names, MapFunction mapper) {
       ...>          List mappedNames = new ArrayList();
       ...>          
       ...>          for (String name : names) {
       ...>              mappedNames.add(mapper.apply(name));
       ...>          }
       ...>          
       ...>          return mappedNames;
       ...>      }
    |  created method mapNames(List,MapFunction)
    
    jshell>      
    
    jshell>      Set collectNames(List names) {
       ...>          Set uniqueNames = new HashSet();
       ...>          
       ...>          for (String name : names) {
       ...>              uniqueNames.add(name);
       ...>          }
       ...>          
       ...>          return uniqueNames;
       ...>      }
    |  created method collectNames(List)
    
    jshell> Set processedNames = 
       ...>             collectNames(mapNames(filterNames(names, 
                               new FilterPredicate() {
       ...>                     @Override
       ...>                     public boolean test(String s) {
       ...>                         return s.length() > 3;
       ...>                     }}), new MapFunction() {
       ...>                         @Override
       ...>                         public String apply(String name) {
       ...>                             return name.toUpperCase();
       ...>                         }
       ...>                     }));
    processedNames ==> [GENNADY, MIKE, ZACH, TIMMY, PETR, SLAVA, ROBERT, MATT, SALLY, PAULA]
    
    jshell>                     
    
    jshell>      for (String name : processedNames) {
       ...>         System.out.println(name);
       ...>      } 
    GENNADY
    MIKE
    ZACH
    TIMMY
    PETR
    SLAVA
    ROBERT
    MATT
    SALLY
    PAULA
    

    Wow! So much of code (and a lot of it boilerplate, especially the repetitive for-each looping) to achieve what essentially took one line using Java Streams! Also, from a readability perspective, I would argue that the Streams based version is much more readable and understandable. A lot of the boilerplate in the non-Streams version disrupts one’s flow when reading that code — instead of focusing on the task at hand, one is distracted by all the language specific cruft. Also note that the pre-Java 8 version of the code was written in a somewhat different way compared to what a lot of Java developers would do. If we had written out the code in a purely imperative manner, it would be at least twice as long with a lot more cognitive dissonance to boot.

  • For the second example, let us generate an infinite stream of natural numbers, take the first 100 numbers, filter out the even numbers, double each of them, and finally generate their sum.
    jshell> IntStream.iterate(1, (n) -> n+1).
       ...> limit(100).
       ...> filter((d) -> d%2 == 0).
       ...> map((i) -> i*2).
       ...> sum()
    $29 ==> 5100
    

    How much code would that take without using Streams? Heh. Some explanation of the code: we use the “iterate” method of the IntStream interface to generate an infinite list of natural numbers by using 1 as the seed value and the second parameter (the lambda expression) as a sort of generator function. Then we use “limit” to take the first 100 instances, filter out the even numbers, map each filtered value to double its value and finally call the terminal operation “sum” to collect the sum of the processed stream of numbers.

    At this juncture, it would probably be appropriate to comment that there are two types of operations when it comes to streams – non-terminal operations (which take values and produce processed value), and terminal operations (which simply consume values and generate a final value). “limit”, “filter”, and “map” are examples of the former whereas “sum” is, as noted, a terminal operation. Knowing this distinction can save a lot of headache when code seemingly doesn’t work as expected.

  • For the final example, let us group together a bunch of students by grade:
    jshell> enum Grade {
       ...>         A, B, C, D, E, F;
       ...>     }
    |  created enum Grade
    
    jshell>
    
    jshell>     class Student {
       ...>         private int id;
       ...>         private String name;
       ...>         private Grade grade;
       ...>         
       ...>         public Student(int id, String name, Grade grade) {
       ...>             this.id = id;
       ...>             this.name = name;
       ...>             this.grade = grade;
       ...>         }
       ...>         
       ...>         public int getId() { return this.id; }
       ...>         public String getName() { return this.name; }
       ...>         public Grade getGrade() { return this.grade; }
       ...>         
       ...>         @Override
       ...>         public String toString() {
       ...>             return "{ " + this.id + ", " + this.name + ", " + this.grade + " }"; 
       ...>         }
       ...>     }
    |  created class Student
    
    jshell>     
    
    jshell>     List students = 
       ...>     Arrays.asList(new Student(1, "Rich", Grade.F),
       ...>                   new Student(2, "Peter", Grade.A),
       ...>                   new Student(3, "Sally", Grade.B),
       ...>                   new Student(4, "Slava", Grade.B),
       ...>                   new Student(5, "Megan", Grade.C),
       ...>                   new Student(6, "Edward", Grade.D),
       ...>                   new Student(7, "Amanda", Grade.A),
       ...>                   new Student(8, "Petr", Grade.B),
       ...>                   new Student(9, "Susan", Grade.F),
       ...>                   new Student(10, "Arnold", Grade.E));
    students ==> [{ 1, Rich, F }, { 2, Peter, A }, { 3, Sally, B }, { 4, Slava, B }, { 5, Meg …
    
    jshell>
    
    jshell> Map<Grade, List> studentData = 
       ...>             students.
       ...>             parallelStream().
       ...>             collect(Collectors.
       ...>                     groupingBy(Student::getGrade))
    studentData ==> {A=[{ 2, Peter, A }, { 7, Amanda, A }], E=[{ 10, Arnold, E }], B=[{ 3, Sally ...
    
    jshell>             
    
    jshell> for (Map.Entry<Grade, List> entry : studentData.entrySet()) {
       ...>          System.out.println("Grade: " + 
       ...>                             entry.getKey() + 
       ...>                             ", Students: " + 
       ...>                             entry.getValue());
       ...>     }
    Grade: A, Students: [{ 2, Peter, A }, { 7, Amanda, A }]
    Grade: E, Students: [{ 10, Arnold, E }]
    Grade: B, Students: [{ 3, Sally, B }, { 4, Slava, B }, { 8, Petr, B }]
    Grade: F, Students: [{ 1, Rich, F }, { 9, Susan, F }]
    Grade: C, Students: [{ 5, Megan, C }]
    Grade: D, Students: [{ 6, Edward, D }]
    

    As can be see, the Collectors class comes bundled with extremely useful methods to perform almost any conceivable processing on data. In this case, we’re still grouping the students based on their grades.

    Also note the use of “parallelStream”. Java supports parallel streams for operations that can be parallelised (i.e. they don’t have any real dependencies between one another). In this case, since we are processing a bunch of student data and aggregating them into groups based on grade, this is precisely such a situation where we can achieve performance boosts using parallel streams, especially when the data sets grow in size.

Some things to watch out for

  • Order of operations:
    Since streams provide a very high-level way to process operations, a lot of the inner details gets hidden from the developer. However, understanding some of these low-level details is crucial when working with streams. The most important part is knowing how the flow of processing takes place in streams.

    Suppose we want to find the sum of all odd numbers (doubled) between 1 and 10^6. We could do it like this:

    jshell> OptionalLong sum = 
       ...>             LongStream.iterate(1, (n) -> n+1).
       ...>                       limit(1_000_000).
       ...>                       map((b) -> b*2).
       ...>                       filter((i) -> i%2 != 0).
       ...>                       reduce((one, two) -> one + two);
    sum ==> OptionalLong.empty
    
    jshell> sum.getAsLong()
    |  java.util.NoSuchElementException thrown: No value present
    |        at OptionalLong.getAsLong (OptionalLong.java:119)
    |        at (#28:1)
    
    

    Why does this not work? Well, it’s because we are mapping each number to double its value and thereby ensuring that all the numbers are even! Let’s fix that to see that it works if we swap around map and filter:

    jshell> OptionalLong sum = 
       ...>             LongStream.iterate(1, (n) -> n+1).
       ...>                       limit(1_000_000).
       ...>                       filter((i) -> i%2 != 0).
       ...>                       map((b) -> b*2).
       ...>                       reduce((one, two) -> one + two);
    sum ==> OptionalLong[500000000000]
    
    jshell> sum.getAsLong()
    $30 ==> 500000000000
    

    Et voila! This shows how important it is to get the ordering of non-terminal operations correct since they may result in logical errors which cannot be caught by the compiler.

    Now, let’s flip the same example around: suppose we want to calculate the sum of even numbers (incremented by 2) from 1 to 10^6. Note that in this case, it doesn’t matter whether we map first and then filter or filter first and then map. The result is the same (adding 2 to an odd number always produces another odd, and so also for even number). What about performance? Let’s see:

    Let’s map and then filter:

    jshell>     void sumOfEvenNumbersSlow() {
       ...>         long start = System.currentTimeMillis();
       ...>         
       ...>         OptionalLong sum = 
       ...>LongStream.iterate(1, (n) -> n+1).
       ...>                       limit(1_000_000_000).
       ...>                       map((b) -> b+2).
       ...>                       filter((i) -> i%2 == 0).
       ...>                       reduce((one, two) -> one + two);
       ...>         
       ...>         long end = System.currentTimeMillis();
       ...>         System.out.format("Sum: %d, time taken = %.3fs\n", sum.getAsLong(), (double)(end-start)/1000);
       ...>     }
    |  created method sumOfEvenNumbersSlow()
    
    jshell>
    
    jshell>     sumOfEvenNumbersSlow()
    Sum: 250000001500000000, time taken = 24.378s
    

    Now let’s filter first and the map:

    jshell>     void sumOfEvenNumbersFast() {
       ...>         long start = System.currentTimeMillis();
       ...>         
       ...>         OptionalLong sum = 
       ...>             LongStream.iterate(1, (n) -> n+1).
       ...>                       limit(1_000_000_000).
       ...>                       filter((i) -> i%2 == 0).
       ...>                       map((b) -> b+2).
       ...>                       reduce((one, two) -> one + two);
       ...>         
       ...>         long end = System.currentTimeMillis();
       ...>         System.out.format("Sum: %d, time taken = %.3fs\n", sum.getAsLong(), (double)(end-start)/1000);
       ...>     }
    |  created method sumOfEvenNumbersFast()
    
    jshell>
    
    jshell> sumOfEvenNumbersFast()
    Sum: 250000001500000000, time taken = 22.012s.
    

    Notice the substantial performance difference? Furthermore, this performance gap will only increase as the data set size increases. This illustrates the importance of properly ordering the operations when dealing with streams. Always filter before mapping whenever the order doesn’t matter — don’t waste precious cycles doing operations whose results are going to be dropped anyway.

    Finally, another important point to note is this – for non-terminal operations such as filter and map, the way it works is that a value is generated and passed down, and this process is repeated until the entire data stream has been exhausted. How then does a terminal operation like reduce or sum handle that? They have internal mechanisms to keep collating intermediate results and then produce the entire final result in one go and return that value to the caller.

  • Debugging streams:
    In most cases, each individual operation in a long chain of stream operations is small enough that we can easily weed out bugs – logical or otherwise. When the body of each operation starts growing though, debugging becomes much more difficult. The Java compiler helps us with static type issues, but it cannot always be relied upon to pinpoint the exact issue in large bodies of code. The sane thing to do would be to always try and keep each operation limited to a single conceptual abstraction of code, and then to ensure that that abstraction can be represented in a line or two of code at most.

  • Using parallel streams:
    As could be seen in the last example, we can use parallel streams in those cases where the operations are essentially independent of one another. The thing is that the onus for determining and ensuring this is on the developer. Java will not (and cannot) check whether the operations are independent of one another or not. This means that unless well thought through, this can lead to a lot of head-scratching and confusion.

    Another potential problem with parallel streams is that debugging (which is already hard enough with parallel code) becomes even tougher with parallel streams when things don’t go as planned. This goes right back to where the whole sequence of operations is well thought through. Nothing can substitute good planning.

    Finally, the performance boost might not be very noticeable in many cases, especially for small data sets. This becomes even truer on single-core machines (which are, to be fair, becoming rarer by the day). In any case, one should carefully weigh the overheads associated with parallel streams against the purported performance benefits from using them.

  • Conclusion

    Streams are, without doubt, my favourite feature in Java 8. As the world tends to move more and more towards Functional Programming as a core paradigm (and one which is orthogonal to Object Orientation, I must add), it is increasingly becoming important for developers to know at least the fundamentals of Functional Programming – using pure functions wherever possible, avoiding side-effects, especially with regards to the modification of data structures, using higher-order functions such as filter, map, reduce, and of course, moving towards more declarative code than imperative code.

    Functional Programming is a very old concept that is being rediscovered as the world moves onto a massively mute-core environment where we just cannot afford the headaches associated with mutable state and side-effects. Of course, side-effects are essential in getting anything practical done (imagine a world without any IO!), and some languages handle that problem quite elegantly (Haskell) and others focus on rather providing efficient immutable data structures (Clojure), but one guideline that is bound to be useful whatever your language support for Functional Programming might be (or not!) is to clearly separate out the Functional and Non-Functional parts of the code and provide a clear and simple form of interaction between them. We will discuss more about Functional Programming and its core concepts in the next few posts.

Written by Timmy Jose

August 14, 2016 at 5:17 pm

A highly opinionated review of Java Lambdas

leave a comment »

What really is a lambda expression?

A lambda expression is, for all means and purposes, an anonymous function. That really is all there is to it. In languages that support first-class functions, this is yet another feature of the language – functions are on par with other types in the language. However, in language that don’t consider functions first class, it becomes a bit of an esoteric concept.

The origin of the concept is in the Lambda Calculus first propounded by the great Alonzo Church. According to that scheme, functions are basically entities which take some (or no) parameters, and have a body of code that can use those parameters. There is essentially no side-effect in such functions. That means that the function is deterministic – given the same set of parameters, it will always produce the same output. This is, in fact, the very foundation of Functional Programming. In modern times, Functional Programming is often conflated with strongly and statically typed languages. This is clearly wrong. The original Lambda Calculus had really no notion of types! (There is a variant of it though, the typed Lambda Calculus). Most of the languages that support lambda expressions today, however, freely allow plenty of side-effects within lambda expressions. The main takeaway here though is that lambda expressions are conceptually what named functions are made out of.

Lambdas in Java 8 and how they came to be

The biggest features in Java 8 were lambda support and the Stream API. In many ways, lambdas are important only with respect to their heavy use in the stream APIs (as seen in the previous blog on Shell). The key concept to understand when learning lambdas in Java is that lambdas/functions are not first-class objects in Java. Their entire existence is strictly bound to and controlled by interfaces, specifically the concept of SAMs (Single Abstract Method) interfaces – interface which contain only a single abstract method. In my opinion, this severe crippling of lambdas in Java has created more problems than it has solved. Now new programmers who pick up Java and run with it are liable to be very confused when to move on to languages which do support lambdas in a more natural and proper manner. In any case, let’s work with what we’ve got.

So why a Functional Interface? Prior to Java 8, if we wanted to simulate cases where we wanted to pass some functionality into another function, we had to make do with anonymous classes. For instance, to create a new thread, we could have done the following:

jshell> (new Thread (new Runnable () {
   ...>    @Override
   ...>    public void run () {
   ...>     System.out.println("Hello from thread!");
   ...>    }
   ...>  })).start()

jshell> Hello from thread!

We observe that the sole purpose of the anonymous class is to perform some actions, but the cognitive dissonance comes into play when we see that the Thread class constructor experts an instance (basically a data object) of type Runnable. This is exactly the same pattern that was followed by C++ until C++11. In fact, this is what is known (rather pompously, I must add) as a functor.

Here is what the Runnable interface looks like:

public interface Runnable {
   void run();
}

This pattern of the use of a (dummy) interface containing a single method which basically does all the work that an anonymous or named function should have done in the first place, was found to be in such widespread use amongst Java developers that the committee which worked on developing lambda support in Java decided to make it kosher and provide additional support from the Java Runtime. As a result, from Java 8 onwards, wherever a SAM is present, a lambda expression can be used as its target or in its stead. They have been made essentially the same.
For example, the previous example can now be written more succinctly as:

jshell> (new Thread(() -> System.out.println("Hello from thread...again!"))).start()
Hello from thread...again!

An optional annotation, @FunctionalInterface has also been introduced for bookkeeping purposes. In fact, in order to help out developers, a bunch of Functional Interfaces now come bundled with the JDK in the java.util.function package. I would highly recommend exploring them and testing them out to get a feel for them.

Custom Functional Interfaces

We can define our own functional interface in Java 8 (and above). The only restriction for the interface to be a functional interface is, as mentioned before, is that the interface have a single abstract method.

For instance, the standard package (java.util.function) comes with functional interfaces that support single parameter (Function) and double parameter (BiFunction) functions. Let us define a triple parameter function just for this example.

jshell> @FunctionalInterface interface TriFunction<T, U, V, R> {
   ...>     R apply(T t, U u, V v);
   ...> }
|  created interface TriFunction

jshell> int x = 100;
x ==> 100

jshell> int x = 100
x ==> 100

jshell> String y = "Hello"
y ==> "Hello"

jshell> double z = Math.PI
z ==> 3.141592653589793

jshell> TriFunction<Integer, String, Double, String> f = 
          (i, s, d) -> i + s + d;
f ==> $Lambda$6/1318822808@6d7b4f4c

jshell> System.out.println(f.apply(x, y, z))
100Hello3.141592653589793

Features and Limitations of Java Lambdas

So how exactly does a SAM map onto a lambda expression? To understand this better, first we need to get the syntax and semantics of lambda expressions out of the way:

Java’s lambda syntax was clearly influenced by Scala. A basic lambda expression has the following form:

(<param*>) -> [{] <body-form+> [}]

where,

’param’ is a comma-separated list of zero or more parameters with optional types (note that in some cases where Java’s type inference mechanism is unable to infer the type, you will need to specify the type(s) explicitly), the braces are optional in the case of a single line body, but are required when the body spans more than one line. Finally, each body-form is a series of normal Java statements. In the case of multiple statements, each body form is separated by a semi-colon, and a return statement is also required in this case (if the return type is not void).
So a lambda expression that takes a String and returns a String might take on several forms in actual code:

(s) -> s.toUpperCase()

The type signature is not required in this case, and the return statement is not allowed in this case, This would be the recommend usage of a typical lambda expression – don’t declare the types and don’t use any return statement. Of course, this only works for a single-statement (or, more correctly, a single-expression) body.

In case we want to use braces, we need to have the whole expression take the following form:

(String s) -> { return s.toUpperCase() }

So we need to specify the type of the parameter(s) as well as include an explicit return statement. In all cases where the body contains multiple statements, this would be the recommended format for a lambda expression.

Now getting back to how a SAM is mapped onto a lambda expression, whenever the Java Runtime encounters a lambda expression, it can do either of two things depending on the context in which the SAM is used:

  • In case the lambda expression is used along with a Stream API function (such as map, filter, reduce, etc.), the Java Runtime already has enough context about the form of the function that is expected – the parameter types and the return type. For instance, if we are trying to double all the even natural numbers upto 10, we might do:
    jshell> IntStream
             .rangeClosed(1, 10)
             .filter((n) -> n%2 == 0)
             .map((d) -> d*2).forEach(System.out::println)
    4
    8
    12
    16
    20
    

    In this case, the Java Runtime knows that the filter method takes a parameter of the form: Predicate. The Predicate functional interface has a single method – boolean test(Test t). So what the Runtime does is to check that the provided lambda expression matches this signature, and if verified, proceeds to invoke the “test” method implicitly. Similarly for the map function as well.

  • The second case arises in the case where we make use of Functional Interfaces explicitly and then use them as the “target” of a lambda expression. For instance, suppose we want to write a function that takes a String and an Integer and returns their concatenated form as a String, we might have something like:
    jshell> BiFunction<String, Integer, String> f = 
               (s, i) -> s + String.valueOf(i)
    f ==> $Lambda$17/103887628@42f93a98
    
    jshell> f.apply("Hello", 99)
    $21 ==> "Hello99"
    

    In this case as well, the compiler will ensure the the lambda expression matches the type of the declared function variable. Pretty straightforward.

So far so good, but there is a huge problem in the second case above. The problem is that even once the function object has been created, the name of the SAM must be known before we can use it. This is because Java does not have operator overloading (unlike C++). This is why in the current framework, we must know the exact name of each functional interface that we use. The “apply” method used above is the name of the SAM in the BiFunction functional interface. The problem is compounded because each functional interface (even in the standard package) defines its own names. Of course, this is not an insurmountable problem, but the same problem did not exist even in pre-C++-11. For instance, the previous example could have been done so in C++ (using a functor):

// pre C++-11
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

class string_int_string
{
    public:
        std::string operator()(std::string s, int i)
        {
            std::ostringstream oss;
            oss << s << i;
            return oss.str();
        }
};

int main()
{
    std::cout << concatenate("Hello", 99, string_int_string()) << std::endl;
   
    return 0;
}

A bit brittle, but it works. The generic function, “concatenate” is important to note here since it can basically take any functor (or lambda expression from C++-11 onwards), and invokes the function object with the supplied arguments. The same approach is used in the C++ STL generic functions. Now if we look at how the code might look like with C++-11, we get:

// C++-11 and above
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

int main()
{
   std::cout << concatenate("Hello", 99, 
		[](std::string s, int i) {
        			std::ostringstream oss;
        			oss << s << i;
        			return oss.str();
    		}) << std::endl;
    
    return 0;
}

As can be seen, the approach is much cleaner. The difference between the functor-version and the lambda-based one is that in this case, we’ve essentially got rid of the class representing the functor object and inserted its logic inside the lambda expression’s body. So it essentially appears that the lambda expression’ is basically an object that can bind the parameters just as in the case of a regular functor.

As can be seen, even in C++-11, we can write generic functions and all we need to do it invoke it like a function. No messy SAMs there! I personally feel that C++’s lambda support is far superior to that of Java, especially since C++ supports closures. More on that in the next section.

Another disadvantage of Java’s lambda support is that the following is impossible in Java:

#include <iostream>

int main()
{
    int x = 100, y = 100;
    
    std::cout << ([x, y]() { return x + y; })() << std::endl;
    
    return 0;
}

The code above simply uses a lambda expression to capture variables defined in the outer lexical scope (more on that in the next section), but the interesting bit is that the lambda expression can be invoked like a proper function object even without assigning it to a variable.

If we tried the same in Java, we’d get an error:

jshell> int x = 1100
x ==> 1100

jshell> int y = 200
y ==> 200

jshell> (() -> x + y)
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  (() -> x + y)
|   ^---------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  (() -> x + y)
|  ^-----------^

As can be seen from the error message, the Java Runtime complains that “Object” is not a functional interface. Even if we assumed that the runtime would be able to discern the functional interface type from its signature and produce a result, we still get an error:

jshell> ((int a, int b) -> { return a + b; })).apply(x, y)
|  Error:
|  ';' expected
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                       ^
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|   ^---------------------------------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^-----------------------------------^
|  Error:
|  cannot find symbol
|    symbol:   method apply(int,int)
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                         ^---^
|  Error:
|  missing return statement
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^------------------------------------------------^

So no go there. A point down for Java lambdas! More seriously, I find this to be an extremely irritating reminder that Java’s lambdas are not really lambdas. They are more like syntactic sugar for the good old anonymous classes. In fact, there are more serious implications precisely for this reason.

Closures

This is again one of those concepts that are notoriously badly explained. A lot of newbies to programming are often scared to death and put-off from learning more about Functional Programming due to unnecessary FUD on the part of many “experts” in the field. So, let’s try and explain this as clearly as possible:

In Set Theory, a set is defined to be “closed” under an operation if applying the operation to members of the set produces a result that belongs to the same set. For instance, if the set under consideration is the set of Natural Numbers (N) and the operation is + (addition), we can say Natural numbers are closed under addition. Why? The reason is quite simple and follows straight from the definition – adding any two natural numbers (or indeed any number of numbers, but we’re considering the strict binary operation here) always produces a Natural number. On the other hand, N is not closed under – (subtraction). This is because subtracting some Natural number from another Natural number might produce 0 or some negative number, which is clearly not a member of N. So much for mathematics.

In Psychology, “closure” refers to the strict need of an individual to find a definitive answer to a problem.

In business, “closure” refers to the process by which a business closes down.

You see what I’m getting at? The term “closure” is highly overloaded, and even within mathematics, the term has different meanings in different branches. So my point is this – simply forget about the name and focus on the concept.

In Computer Science, a closure is intricately tied to the concept of scoping, specifically lexical scoping. This is why closures are often referred to as “lexical closures”. In order to understand closures properly, we must clearly understand what lexical scoping entails.

Lexical scoping is intimately tied with the rules defining the lifetimes (and visibility) of variables. Dynamic scoping, in general, refers to a situation where a variable has effectively global visibility and lifetime. Pure lexical scoping, on the other hand, ensures that the visibility of variables is limited to the current lexical block (say a function or a local block), or to nested blocks. However, lexically scoped variables are not visible to outer blocks, and variables defined in inner blocks will effectively “shadow” those defined in the outer scope. If no new variables with the same name are defined in the inner block, references to variables will always refer to those in the outer scope. This behaviour forms the basis of what is known as “variable capture”.

A variable is said to be captured by a lambda function if the lambda function refers to an outer-scope variable during its time of creation. The lambda function is said to “close over” those variables, and this is the reason why this feature is called a “closure”. So what does this variable capture actually implicate in the grand scheme of things? What it implicates is this – when a lambda function captures a variable in its outer scope, the lifetime of the variable is effectively changed. Under normal circumstances, local variables die when the function is exited. In this case, however, since the lambda function has captured the variable, the variable will not die even when the function in which it was defined dies!

In this respect, Java behaves absolutely horribly. Java has extremely weird scoping rules. In some ways, it does use lexical scoping. In most respects, not:

jshell> void demoScopingInJava() {
   ...>          int x = 100;
   ...>          
   ...>          System.out.format("Function scope x = %d\n", x);
   ...>         {
   ...>             System.out.format("Function scope x (before shadowing) = %d\n", x);
   ...>             /// int x = 999 is not allowed!
   ...>             x = 999;
   ...>             System.out.format("Function scope x (after shadowing) = %d\n", x);
   ...>         }
   ...>          System.out.format("Function scope (again) x = %d\n", x);
   ...>      }
|  created method demoScopingInJava()

jshell> demoScopingInJava()
Function scope x = 100
Function scope x (before shadowing) = 100
Function scope x (after shadowing) = 999
Function scope (again) x = 999

Java does not allow any shadowing because we cannot define any new variables inside the block. Instead, all references to the variable ‘x’ are actually to the function scope variable. In this case, we are able to mutate ‘x’ from 100 to 999, but this is because the inner block is within the outer function block and the Java Runtime can therefore ensure that this variable is freed before the function exits. However, this is not allowed when are in a situation where the variable could be referenced even after the local function where it was declared exits.

For instance, if we want to implement a function that prints line numbers in an increasing order every time it is called, we might try to do something like this in Java:

jshell> Function<Void, Void> lineNumberGenerator() {
   ...>      int lineNumber = 0;
   ...>      return (n) -> 
                 { lineNumber++; 
                   System.out.format("Line number: %d\n", lineNumber); 
                   return null; };
   ...> }
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                       ^--------^
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                                                                            ^--------^

We can see, though, that modifying a variable defined in the outer scope is not allowed in the case here the code escapes the local scope. As can be clearly seen in the error messages, the variable lineNum must be declared “final” for the code to even compile (and of course, then it would fail again unless we removed the mutating statement inside the lambda function).
This is the reason why we cannot implement closures in Java – Java’s bizarre downward-percolating forced visibility of variables.

And, oh, just in case you thought this applies only to lambda blocks, it’s always been the case:

jshell> void scopingRulesTest () {
   ...>    int x = 100;
   ...>    
   ...>    (new Thread(new Runnable () {
   ...>        @Override
   ...>        public void run() {
   ...>           x++;
   ...>          System.out.println(x);
   ...>        }
   ...>      })).start();
   ...> }
|  Error:
|  local variables referenced from an inner class must be final or effectively final
|            x++;
|            ^

The same example in C++ works as expected (including modification of the outer scope’s variable):

#include <iostream>

using namespace std;

int main()
{
    ios_base::sync_with_stdio(false);
    
    int x = 100;
    cout << "Function scope x = " << x << endl;
    {
        x = 101;
        cout << "Block scope x = " << x << endl;
        int x = 999;
        cout << "Block scope x = " << x << endl;
    }
    
    cout << "Function scope x = " << x << endl;
    
    return 0;
}

sh-4.3$ main                                                                                                                                                              
Function scope x = 100                                                                                                                                                    
Block scope x = 101                                                                                                                                                       
Block scope x = 999                                                                                                                                                       
Function scope x = 101   

And to complete the line number closure demo (line number example):

// C++-11 (and above)
#include <iostream>
#include <functional>

using namespace std;

function<void()> line_number_generator()
{
    int line_num = 0;
    
    return [line_num]() mutable
            {
                line_num++;
                cout << "Line number: " << line_num << endl; 
            };
}

int main()
{
    ios_base::sync_with_stdio(false);
    
    function<void()> print_line_numbers = line_number_generator();
    
    for (int i = 0; i < 5; i++) {
        print_line_numbers();    
    }
}

sh-4.3$ g++ -std=c++11 -o main *.cpp                                                                                                                                      
sh-4.3$ main                                                                                                                                                              
Line number: 1                                                                                                                                                            
Line number: 2                                                                                                                                                            
Line number: 3                                                                                                                                                            
Line number: 4                                                                                                                                                            
Line number: 5  

Note that default variable capture is read-only in C++. However, the “mutable” keyword can be used to change that behaviour. In all respects, C++11 supports closures while Java cannot!

The Common Lisp version is pretty much identical in behaviour to the C++ one. In the case of Common Lisp, however, we have the extra implication that any references to outer-scope variable always capture the mentioned variable (unless a local variable with the same name is already defined). This is seen in the Common Lisp version of the same example:

CL-USER> (defun foo ()
           (let ((x 100))
             (format t "Function scope x = ~d~%" x)
             (progn
               (setf x 101)
               (let ((x 999))
                 (format t "Inner block x = ~d~%" x)))
             (format t "Function scope (again) x = ~d~%" x)))
STYLE-WARNING: redefining COMMON-LISP-USER::FOO in DEFUN
FOO
CL-USER> (foo)
Function scope x = 100
Inner block x = 999
Function scope (again) x = 101
NIL

This effectively ensures that a nested function that refer to the outer scope var(s), and which is then returned from the function is always a closure as can be seen from the following example (same example as the C++ one):

CL-USER> (defun line-number-generator ()
                      (let ((line-number 0))
                        #'(lambda ()
                             (incf line-number)
                            (format t "Line number: ~d~%" line-number))))
LINE-NUMBER-GENERATOR

CL-USER> (defvar print-line-numbers (line-number-generator))
PRINT-LINE-NUMBERS

CL-USER> print-line-numbers
#<CLOSURE (LAMBDA () :IN LINE-NUMBER-GENERATOR) {100439133B}>

CL-USER> (dotimes (i 5)
           (funcall print-line-numbers))
Line number: 1
Line number: 2
Line number: 3
Line number: 4
Line number: 5
NIL

Conclusion

Well, that about wraps up this rather long blog post! As can be seen from this post (as well as the JShell post and more posts to come, especially on Java Streams), lambda support in Java is an extremely welcome and necessary feature. However, in many ways, it’s a very crippled version of lambda support found in other languages, especially with regards to how closures are not supported in Java.Thankfully, most code that uses lambda expressions will be code that uses the Streams API, and as such most (if not all) the wartiness of Java’s lambdas will be effectively secreted within map, filter, reduce or some other mechanism in the Streams API.

Note: All the Java code posted in this blogpost was executed on JShell. For use in a regular Java environment, ensure to add semi-colons wherever appropriate.

Written by Timmy Jose

August 13, 2016 at 1:45 am

My favourite feature in Java 9 – JShell (and other ruminations)

leave a comment »

It used to be the case that Java releases were few and far between. Back in the olden days, releases used to happen so infrequently that many users were forced to write libraries to compensate for the lack of features in Java. I still recall how Generics itself was a big feature that was included only in Java 5 after Java 1.4 had been around for a long time. The problem with that was that a lot of code had already been written which made use of “raw” (in Java Generics parlance) types, and the Java folks could not afford to break all that code! Thus started this incongruous (if understandable) tradition in Java – introduce features but never break any existing code. The issue with this approach is that while it saves hundreds of thousands of man-hours of effort, it does introduce inherent limitations in the feature being introduced. This is the reason why Java Generics are such a mess, and now why Lambdas in Java are less than ideal (in fact I’d argue they’re pretty much non-features and we’ll see why in future posts). Notice also that Lambdas in Java were introduced as late as Java 8, and arguably primarily in order to support Streams. Thankfully, major major Java releases have been occurring with greater frequency in recent years, and surely that is a good thing. Of course, I don’t want a situation where releases happen with such frequency that codebases get broken on a regular basis – that’d be even worse.

Java 9 has been in the offing for some time now, and I believe that its planning had already begun even before Java 8 was released. Java 8 introduced some much needed features – basic Lambda support, Streams (my favourite feature in Java 8!), and default methods in interfaces. The interesting bit about default methods in interfaces is that that feature was introduced to support the new Streams feature in Java 8. How do you figure? Let’s take a simple example – the List interface in the java.util package. Till Java 8, this package did not have a stream() method. This means that if stream() had been made a method in the List interface in JDK 8, all legacy code using the List interface (which would practically be all codebases) would be irreparably broken. If, on the other hand, this method was not part of List, then streams could not be supported on List! To solve this conundrum, JDK 8 introduced a new List interface where ‘stream’ was made a default method of the List interface. This means that legacy code would work fine with the new List interface since the JRE would take care to ensure that the stream method (which is, to be pedantic, actually defined in the base Collection interface with the signature: default Stream stream()) would be ignored for older code whereas new code that made use of this new feature would hum along nicely as well. Hackish, but it works.

Anyway, to get on with it, Java 9 introduces some very interesting new features as well. The full list can be seen here: JDK 9 features. However, a couple of major features stand out in this list – JShell (and the JShell API), and support for Modules. The latter feature is technically a separate project that’s will be bundled with the main Java release, and is far more complicated than I would have liked. I reserve further comments on that till such time as Java 9 itself is released. However, I absolutely love JShell and feel its arguably the best productivity-boosting feature ever released in Java. Of course, I’m talking about the JShell tool itself (which comes bundled with Java 9. Download EA (Early Access) releases of Java 9 here – JDK 9 Downloads).

Anyone who has ever worked with languages with ecosystems that include at least a REPL (Read-Eval-Print-Loop for the Luddites) will agree that once you get used to that mode of working, it feels severely constraining to get back to the traditional Code-Compile-run cycle. The best environment in this regard is provided by Lisps – Common Lisp in particular ensures a very interactive image-based ecosystem that is in a league of its own. Scheme and Racket also provide interactive development environments (DrRacket, for instance), but they’re still inferior (in my opinion) to Common Lisp environments (SBCL + Emacs + SLIME is what I use myself). Python also provides a decent REPL system, and is arguably the best of the mainstream languages in that regard. Traditionally, dynamic languages have had it easy when it comes to REPLs by their very nature while statically-typed languages have not been able to provide something comparable. Haskell is a very good example of a language that defies this rule. Haskell is a very strongly-statically-typed language and yet it has a very nice REPL. Plus, of course, the wonderful Hindley-Milner type inference system in place ensures that you hardly ever have to type in the types yourself (pun intended). Scala is also a strongly-typed static language that has a decent REPL (on par with Haskell). More traditional static languages like C++ and Java haven’t had a REPL in years, apart from some projects that have attempted to provide one in the form of libraries – that never does work as well though as coming bundled with the language, of course. With the introduction of JShell in Java 9, I believe Java has at least one feature that’s clearly superior to C++. In terms of the other “functional” features such as lambdas and closures, not at all. More on this comparison in later posts. Now, let’s jump into JShell and play around a bit!

When you install the JDK 9 EA bits, the “jshell” executable comes bundled in the “bin/“ folder by default. JShell also comes with an new sets of JShell APIs that can be used to essentially create our own custom shells, but I’m more interested in the interactive JShell tool for now. To run jshell, simply run it in a shell, and you should see something like the following:

Timmys-MacBook-Pro:Blogs z0ltan$ jshell
|  Welcome to JShell -- Version 9-ea
|  For an introduction type: /help intro

jshell>

Now, let’s see. As far as I have evaluated it, some points to be noted while developing with this REPL are:

  • No semi-colons required (well, almost none).
  • No mandatory class wrappers to execute code – plain Java code works fine.
  • Ability to use the generated variable names ($N, where N is an integer) to refer to objects (a la Scala).
  • Ability to import custom packages (if the JARs are in the classpath).
  • Ability to save code snippets to file.
  • Ability to persist state between sessions (very important for interactive development).
  • Ability to open external files and load them.
  • Ability to list all session-declared variables, methods, classes, and history.
  • And most importantly, code completion using the tab key (eat that, Python!).

Of course, that is not an exhaustive list. There are a lot more features that are available, and you can see all the options using the “/help” command:

Now let’s get down to some hacking! (For those unfamiliar with Lambdas and Streams in Java, I’ll post a series of introductory blogs on those topics in the near future. For now, bear with me as the main point of this blog is to demonstrate JShell’s support interactive development environment).

0). First let us import the classes that we are interested in:

jshell> import java.util.*

jshell> import java.util.stream.*

jshell> import java.util.function.*

1). Let us now create a list of names, convert them to uppercase, and the print them out:

jshell> List<String> names =
            Arrays.asList("Peter", "Timmy", "Gennady", "Petr", "Slava")
names ==> [Peter, Timmy, Gennady, Petr, Slava]

jshell> names.stream()
             .map((name) -> name.toUpperCase())
             .forEach(System.out::println)
PETER
TIMMY
GENNADY
PETR
SLAVA

jshell> names.forEach(System.out::println)
Peter
Timmy
Gennady
Petr
Slava

Note that the original list is not modified at all since streams are functional (in most respects). All stream functions return a new version of the original data structure with the necessary processing done. Also note how the “names” variable is printed out in human-readable form.

2). With the same list of names, let’s sort them out in non-decreasing order (lexicographically speaking):

jshell> Collections.sort(names, (f, s) -> f.compareTo(s))

jshell> names
names ==> [Gennady, Peter, Petr, Slava, Timmy]

Note that Collections.sort() is a mutating operation.

3). Now that this list has been sorted, let us do a series of operations: let’s filter out the names that start with G, convert the rest to uppercase, and concatenate them to a single string, and return that value:

jshell> names.stream()
             .filter((s) -> !(s.startsWith("G") || s.startsWith("g")))
             .map(String::toUpperCase)
             .collect(Collectors.joining())
$9 ==> "PETERPETRSLAVATIMMY"

Observe how map(String::toUpperCase) is essentially the same as map((r) -> r.toUpperCase()), and also how the return type, which is a string, is automatically assigned to a generated variable, $9, which can now be used in further processing. For instance, if we wished to find the length of this concatenated string, we could do something like (it’s obviously a contrived example):

jshell> Function<String, Void> func =
                 (s) -> { System.out.println(s.length()); return null; }
func ==> $Lambda$16/1427810650@35d176f7

jshell> func.apply($9)
19
$11 ==> null

A couple of interesting observations here: first off, Note how we use the semi-colons inside the body of the lambda expression – this is required in JShell when you have multiple statements in the body or when you use an explicit return statement. Moreover, if a class or function is being defined explicitly, a semi-colon is also required. If you want to play safe, you might as well use semi-colons everywhere.

Secondly, we assign the lambda expression to a Function variable. Function<T, R> is a “functional interface”, which basically means that it is an interface with a single abstract (non-default) method (also called SAMs). Any interface that conforms to this convention is a functional interface. However, the problem is that we can’t just invoke the function object. We have to know that this interface’s method name is “apply”, and thus use that explicitly. More on this on the series of planned posts on Lambdas and Streams in Java 8.

4). That’s about strings. Now let’s see some more examples with other types. Let’s generate an infinite stream of positive integers, collect upto a certain limit, filter out the evens, take their sum and return the maximum in the set:

jshell> IntStream.iterate(1, (n) -> n+1)
                  .limit(100).filter((d) -> d%2 == 0)
                  .summaryStatistics()
$12 ==> IntSummaryStatistics{count=50, sum=2550, min=2, average=51.000000, max=100}

jshell> System.out.format("Max: %d, Sum: %d\n", $12.getMax(), $12.getSum())
Max: 100, Sum: 2550
$13 ==> java.io.PrintStream@612679d6

The “iterate” method takes an initial starting value and a lambda expression that acts as a generator function for the series.

5). Finally, let’s observe some of the aforementioned JShell features in action:

  • Viewing all the imports:
    jshell> /imports
    |    import java.io.*
    |    import java.math.*
    |    import java.net.*
    |    import java.util.concurrent.*
    |    import java.util.prefs.*
    |    import java.util.regex.*
    |    import java.util.*
    |    import java.util.stream.*
    |    import java.util.function.*
    
  • Open and load a source file:
    jshell> /open /Users/z0ltan/HelloWorld.java
    |  Warning:
    |  Modifier 'public'  not permitted in top-level declarations, ignored
    |  public class HelloWorld {
    |  ^----^
    
    jshell> HelloWorld main = new HelloWorld()
    main ==> HelloWorld@757942a1
    
    jshell> main.main(null)
    Hello, world!
    
  • Save the current session:
    jshell> /save -all /Users/z0ltan/session1
    
    jshell>
    

    This session file can then be opened using the “open” command when launching a fresh jshell session.

  • Finally, to exit the current session:
    jshell> /exit
    |  Goodbye
    

Well, that’s all for a very basic introduction to JShell in Java 9! I didn’t go into much detail about why such interactivity is useful. I can’t list out a few reason why (without going deeper, which I defer to future posts):

  • Smalls snippets of code can be tested without the whole Code->Save->Compiler->Run->Debug cycle that one would normally have to do.
  • Top down and Bottom up programming can be done in a REPL. For instance:
    jshell> void printFactorial(int n) {
       ...> System.out.format("factorial(%d) = %d\n", n, factorial(n));
       ...> }
    |  created method printFactorial(int), however, it cannot be invoked until method factorial(int) is declared
    
    jshell> long factorial(int num) {
       ...>   long f = 1L;
       ...>   for (int i = 1; i <= num; i++) 
       ...>      f *= i;
       ...>   return f;
       ...> }
    |  created method factorial(int)
    
    jshell> printFactorial(10)
    factorial(10) = 3628800
    

    In this case, we could define functions (or methods, if you want to be pedantic about it) that reference functions that haven’t been defined as yet. This is an example of top-down programming.

  • There is much less thought impedance when developing in a REPL. Of course, Java as a whole is not as suited to interactive development as, say, Common Lisp is, but it is nevertheless invaluable. In other words, it doesn’t hamper flow as much as the traditional compile cycle, and this definitely helps boost creativity and productivity.

 

 

 

Next up, I will discuss lambda support in Java (8 and above) and how it compares to similar support in other languages – C++, Python, Common Lisp, and Haskell.

Written by Timmy Jose

August 12, 2016 at 6:43 pm

Posted in Java, Programming, Uncategorized

Tagged with , , ,

Google Code Jam 2016 results – no surprises there!

leave a comment »

Gennady Korotkevich is, without doubt, the best competitive programmer in the world (and arguably the best of all time). This whizkid just wrapped up another Google Code Jam title with consummate ease! Check this main site out for news (and to try out the problems, if you dare!): Google CodeJam 2016.

The great part is that you can also watch the live stream recording of the whole event here: Live Stream Recording.  This is the third straight title for this amazing kid.

Written by Timmy Jose

August 11, 2016 at 10:28 am

Follow

Get every new post delivered to your Inbox.