A Line number closure in Rust, and an interesting battle with the Rust compiler

It had been a while since I’d played around with Rust, so I decided to implement one of my two favourite programs when learning new languages viz., a line number closure (with mutation, of course) as shown by Hoyte in his seminal book, “Let Over Lambda”. The other program is, of course, implementing a functioning Rational number class (as in the previous blog post, for instance).

Well, with Rust, it’s a very…interesting experience to say the least. The Rust Borrow Checker is a very strange beast indeed. So, let’s have a look at how we can arrive at a working version of the program.

Background

Rust does have powerful support for closures (what respectable language does not today?). However, the interesting part is that closures in Rust are simply syntactic sugar for traits. To be more precise, there are three available traits that a closure can belong to:

  • the Fn trait – this is the least permissive of all, and it allows a read-only closure (in effect). This is ideal when we want to simply have a self-contained immutable closure. For instance,
    fn square<F>(n: i32, callback: F)
        where
        F: Fn(i64) ->() {
        callback(n as i64 * n as i64);
    }
    
    fn main() {
        square(100, |n| {  println!("Received {}", n); });
    }
    

    This duly prints out:

    Received: 10000
    

    This is quite straightforward. As we can see, The type of the callback, which is the closure, is Fn(i64) -> (), which means that it takes a 64-bit integer and does some side-effect (printing to screen in this case). No mutation anywhere, no capturing of environments, and so the Fn trait works superbly here.

  • the FnMut trait is more interesting. This is needed when we need to mutate the environment captured by the closure. As a small example:
    fn process_mutating_closure<F>(mut closure: F)
        where
        F: FnMut(i32) -> () {
        closure(100);
    }
    
    fn main() {
        let mut local = 100;
    
        process_mutating_closure(|x| { 
                  local += 100; 
                  println!("Value = {}", x + local); 
        });
    
    

    which produces:

    Value = 300
    

    All right, this is a bit more complicated than the previous version. So, let’s break it down. What we intend to do is this – the closure that’s created in the main function captures a mutable variable, local, in the enclosing lexical scope. The closure then modifies this captured variable, and uses it for some dummy processing before printing the final value out.

    Since a captured variable is being modified, the type of the closure will be FnMut. This is exactly what’s reflected in the type signature when the closure is passed as an argument to process_mutating_closure.

  • , and finally, the FnOnce trait is the most restrictive of all. It basically transfers ownership of the capture to the closure itself. If you are not familiar with Rust’s wonderful (sarcasm) Ownership and Borrowing world, consider yourself lucky.
    fn move_closure_demo<F: FnOnce(i32) -> ()>(closure: F) {
        closure(100)
    }
    
    fn main() {
        let mut local = 100;
    
        move_closure_demo(move |x| {
            local += 100;
            println!("Result = {}", x + local);
        });
    
        println!("{}", local);
    }
    
    Result = 300
    

    Eh? This code looks almost identical to the previous one (except for the change in the function signature of the closure from FnMut to FnOnce and the move in front of the closure). Well, if you notice, you will also see that the function var, closure in the function move_closure_demo also does not require mut unlike in the previous case.

    What’s happening here is that because we have “moved” the closure, the closure is now the owner of the captured variable, local. Wait a minute, so how come we can still use local in main after it’s been moved? It turns out that because the binding of local is to a Copy-able type (i32 in this case), a copy of the binding’s value is actually moved. However, this won’t work for types that do not implement the Copy trait. Confused? Well, you should be. I mean, wtf?

All right, now let’s get on with the meat of the actual matter.

First attempt

Let’s put our knowledge of how closures are actually syntactic sugar for the aforementioned trifecta of traits to use. Here’s the first shot:

fn gen_line_number() -> (FnMut() -> ()) {
    let mut line_no = 0;

    || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    }
}

fn main() {
    let foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

Since we’re modifying the capture var, line_no, we need the closure to be of the trait type, FnMut. Looks okay, let’s take it for a spin:

bash-3.2$ rustc hoyte-blog.rs 
error[E0277]: the trait bound `std::ops::FnMut() + 'static: std::marker::Sized` is not satisfied
 --> hoyte-blog.rs:1:26
  |
1 | fn gen_line_number() -> (FnMut() -> ()) {
  |                          ^^^^^^^^^^^^^ the trait `std::marker::Sized` is not implemented for `std::ops::FnMut() + 'static`
  |
  = note: `std::ops::FnMut() + 'static` does not have a constant size known at compile-time
  = note: the return type of a function must have a statically known size

error[E0308]: mismatched types
 --> hoyte-blog.rs:4:5
  |
4 |     || {
  |     ^ expected trait std::ops::FnMut, found closure
  |
  = note: expected type `std::ops::FnMut() + 'static`
  = note:    found type `[closure@hoyte-blog.rs:4:5: 8:6 line_no:_]`

error[E0277]: the trait bound `std::ops::FnMut(): std::marker::Sized` is not satisfied
  --> hoyte-blog.rs:12:9
   |
12 |     let foo = gen_line_number();
   |         ^^^ the trait `std::marker::Sized` is not implemented for `std::ops::FnMut()`
   |
   = note: `std::ops::FnMut()` does not have a constant size known at compile-time
   = note: all local variables must have a statically known size

error[E0277]: the trait bound `std::ops::FnMut(): std::marker::Sized` is not satisfied
  --> hoyte-blog.rs:20:9
   |
20 |     let bar = gen_line_number();
   |         ^^^ the trait `std::marker::Sized` is not implemented for `std::ops::FnMut()`
   |
   = note: `std::ops::FnMut()` does not have a constant size known at compile-time
   = note: all local variables must have a statically known size

error: aborting due to 4 previous errors

Whoa! Okay, let’s read the error messages in order. Maybe that will help us fix the issue better. Right, so the first one says something about std::marker::Sized not being implemented… blah blah blah. What all this verbosity actually means is that we are returning a closure from the function, and it needs to know the size of the closure at compile time itself.

Fair enough. However, why does it work when we’re passing a closure into a function the same way (instead of returning the closure from it)? Who the hell really knows? Well, the basic problem here (as I figure it) is that we need to wrap the whole damned thing inside a heap pointer (aka the Box type) since we’re returning from the function, and the stack frame for that function is going to get de-allocated, we need to return a heap pointer to the object so that it can be used even after the current function frame’s destroyed.

This is the reason why code like the following works because the callback function, closure cannot outlive the function where the closure was generated (main):

fn foo<F: Fn() -> ()>(closure: F) {
    closure();
}

fn main() {
    foo(|| { println!("Hello, world!");});
}
Hello, world!

Okay, let’s proceed.

Second Attempt

Having gained that great insight from the first version, here’s how the second version looks like:

fn gen_line_number() -> Box<FnOnce() -> ()> {
    let mut line_no = 0;

    Box::new(move || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    })
}

fn main() {
    let foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

and…

bash-3.2$ rustc hoyte-blog.rs 
error[E0161]: cannot move a value of type std::ops::FnOnce(): the size of std::ops::FnOnce() cannot be statically determined
  --> hoyte-blog.rs:15:9
   |
15 |         foo();
   |         ^^^

error[E0161]: cannot move a value of type std::ops::FnOnce(): the size of std::ops::FnOnce() cannot be statically determined
  --> hoyte-blog.rs:23:9
   |
23 |         bar();
   |         ^^^

error[E0382]: use of moved value: `*foo`
  --> hoyte-blog.rs:15:9
   |
15 |         foo();
   |         ^^^ value moved here in previous iteration of loop
   |
   = note: move occurs because `*foo` has type `std::ops::FnOnce()`, which does not implement the `Copy` trait

error[E0382]: use of moved value: `*bar`
  --> hoyte-blog.rs:23:9
   |
23 |         bar();
   |         ^^^ value moved here in previous iteration of loop
   |
   = note: move occurs because `*bar` has type `std::ops::FnOnce()`, which does not implement the `Copy` trait

error: aborting due to 4 previous errors

Sigh. Just when things were looking up… theoretically. All right, before we lose our sanity, let’s read all the bloody error messages and try to make some sense of them. Okay, ignoring the first two about statically size something, we see that the other errors are complaining about foo and bar being already moved in the previous iteration of the loop. At least that makes some sort of sense – since we have the closure passed to main and assigned to foo and bar, and, as the error message says, FnOnce does not implement the Copy trait, we cannot change the owner of the closure in subsequent iterations of the for loop.

Let’s rectify that by using the FnMut trait instead, which is actually what we want since we are not using any ownership changes, and we’re mutating the captured local variable, it sounds like the perfect candidate for our use case.

In case you’re wondering about modifying the main function instead to use references inside the for loops, I assure you that it’s not an oversight on my part – down that road lies perdition! (Reflect upon the fact that I have a working copy, and I’m actually working backwards through some sane progression amongst the many that I had to endure along the way!)

Third attempt

Here’s the version with the closure type changed to FnMut:

fn gen_line_number() -> Box<FnMut() -> ()> {
    let mut line_no = 0;

    Box::new(move || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    })
}

fn main() {
    let foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

And yes, you guessed it, it simply doesn’t work, but you’ll love the error messages:

bash-3.2$ rustc hoyte-blog.rs 
error: cannot borrow immutable `Box` content `*foo` as mutable
  --> hoyte-blog.rs:15:9
   |
15 |         foo();
   |         ^^^

error: cannot borrow immutable `Box` content `*bar` as mutable
  --> hoyte-blog.rs:23:9
   |
23 |         bar();
   |         ^^^

error: aborting due to 2 previous errors

Hmmm… hmmm? It mentions that when I call foo and bar, it’s trying to make something mutable that’s actually immutable. Eh? Okay, let’s see. The local captured variable was “moved” to live beyond the function’s stack frame, so that’s fine. I’m using FnMut so that I can actually mutate the captured variable, that’s fine. I’m wrapping the whole damned thing inside a Box so that the whole closure can persist beyond the function call, and finally, all I’m trying to do is to increment the mutable captured line_no var inside my for loops. That also sounds fine. So when did the contents of the box become immutable?

All right, let’s see the final version first.

Final version

fn gen_line_number() -> Box<FnMut() -> ()> {
    let mut line_no = 0;

    Box::new(move || {
        line_no += 1;

        println!("Current line number: {}", line_no);
    })
}

fn main() {
    let mut foo = gen_line_number();

    for _ in 0..5 {
        foo();
    }
    
    println!("");

    let mut bar = gen_line_number();

    for _ in 0..5 {
        bar();
    }
}

and running it,

bash-3.2$ rustc hoyte-blog.rs 
bash-3.2$ ./hoyte-blog 
Current line number: 1
Current line number: 2
Current line number: 3
Current line number: 4
Current line number: 5

Current line number: 1
Current line number: 2
Current line number: 3
Current line number: 4
Current line number: 5

Wow! Glory at last! Heh. Well, if you have the eyes of a hawk, you’ll observe that the only change was adding the mut qualifier to the closure handles in main. So what that extremely useful, intuitive, and helpful error message was telling us is that we have a boxed closure containing a captured mutable local var which we’re updating in every iteration of the respective for loops, but unfortunately we have assigned the closures to immutable handles, and so the whole thing just won’t work.

Whew. And seriously, wtf?

Rust appears to be great fun when things go well, but when things go badly (and they will, of course), it can be quite a PITA. Granted, I am not an expert or even an intermediate Rust programmer, but perhaps people’s complaints about the learning curve being seriously fucked up aren’t really exaggerated. If the learning curve had been at least logical, it would be a different matter. As things stand though, the more I venture into the language, the more “patched-together” it appears to be. Also, all the various exceptions to the rules in addition to the myriad rules about exceptional scenarios aren’t really helping, folks. I doubt anything can really be done about it now, though. Heh.

A Line number closure in Rust, and an interesting battle with the Rust compiler

Hoyte’s Line Number closure in a few choice languages

Douglas Hoyte, in his excellent (if a bit fanatical) book, “Let over Lambda” gives a simple and pithy example to demonstrate the concept of closures – little anonymous functions that can capture variables in the environment in which the closure was created.

The example is very simple – create a mini-logging facility by capturing a variable representing the current line number (initially set to 0), incrementing it for every invocation of the closure, and printing it out.

An implementation in Common Lisp might look so –

(defun get-line-logger ()
  (let ((line-num 0))
    #'(lambda (id)
        (incf line-num)
        (format t "[~a] Line number: ~d~%" id line-num))))


(defun logger-demo ()
  (let ((logger-1 (get-line-logger))
        (logger-2 (get-line-logger)))
    (flet ((f (id logger)
             (dotimes (i 5)
               (funcall logger id))
             (terpri)))
      (f "logger-1" logger-1)
      (f "logger-2" logger-2))))

Sample run:

CL-USER> (logger-demo)
[logger-1] Line number: 1
[logger-1] Line number: 2
[logger-1] Line number: 3
[logger-1] Line number: 4
[logger-1] Line number: 5

[logger-2] Line number: 1
[logger-2] Line number: 2
[logger-2] Line number: 3
[logger-2] Line number: 4
[logger-2] Line number: 5

NIL

As expected, we can not only capture local variables in the lexical environment during the time of the closure’s creation, but also modify them independently of any other instances of the closure. This is what makes is a proper closure. Also note that the capture is automatically done (whether we actually use the variables or not is irrelevant).

The Racket version is, unsurprisingly, almost identical not only in syntax, but also semantics:

#lang racket

(define (get-line-logger)
  (let ([line-no 0])
    (lambda (id)
      (set! line-no (+ 1 line-no))
      (fprintf (current-output-port)"[~a] Line number: ~a~%" id line-no))))

(define (logger-demo)
  (let ([logger-1 (get-line-logger)]
        [logger-2 (get-line-logger)])
    (letrec ([f (lambda (id logger)
                  (do
                      ((i 0 (+ i 1)))
                      ((= i 5))
                    (logger id))
                  (newline))])
      (f "logger1" logger-1)
      (f "logger2" logger-2))))

And the behaviour is exactly the same:

hoyte-closure.rkt> (logger-demo)
[logger1] Line number: 1
[logger1] Line number: 2
[logger1] Line number: 3
[logger1] Line number: 4
[logger1] Line number: 5

[logger2] Line number: 1
[logger2] Line number: 2
[logger2] Line number: 3
[logger2] Line number: 4
[logger2] Line number: 5

Racket is, in some ways, more elegant than even Common Lisp. I especially love the part where lambdas don’t need any funcallS or applyS to make them run (unlike in Common Lisp). Still, pretty much a branch off the same family tree.

Moving on, let’s try the same in Java, shall we?

import java.util.function.Function;

public class HoyteClosure {
    private static Function<String, Void> getLineLogger() {
        int lineNum = 0;

        return (String id) -> { lineNum++; System.out.printf("[%s] Line number: %d\n",
                                                                 id, lineNum); return null; };
    }

    private static void invokeLogger(String id, Function<String, Void> logger) {
        for (int i = 0; i < 5; i++) {
            logger.apply(id);
        }
        System.out.println();
    }

    public static void main(String[] args) {
        Function<String, Void> logger1 = getLineLogger();
        Function<String, Void> logger2 = getLineLogger();

        invokeLogger("logger1", logger1);
        invokeLogger("logger2", logger2);
    }
}

Okay, looks good. However, if we try to run it, we run into issues immediately:

Timmys-MacBook-Pro:Java8 z0ltan$ javac HoyteClosure.java 
Timmys-MacBook-Pro:Java8 z0ltan$ javac HoyteClosure.java 
HoyteClosure.java:7: error: local variables referenced from a lambda expression must be final or effectively final
        return (String id) -> { lineNum++; System.out.printf("[%s] Line number: %d\n",
                                ^
HoyteClosure.java:8: error: local variables referenced from a lambda expression must be final or effectively final
                                                                 id, lineNum); return null; };
                                                                     ^
2 errors

The problem is that Java really does not have real closures. The lambda support in Java 8 is just syntactic sugar for the good old anonymous class which can read local variables in the environment, but cannot modify them. So what can we do?

To make Java happy, we can create a new object for every logger instance i.e., use instance variables in lieu of local variables so that the modification of local variables is not an issue any more:

import java.util.function.Function;

public class HoyteClosureModified {
    static class Closure {
        private int lineNum;
        
        private  Function<String, Void> getLineLogger() {
            return (String id) -> { lineNum++; System.out.printf("[%s] Line number: %d\n",
                                                                 id, lineNum); return null; };
        }
    }

    private static void invokeLogger(String id, Function<String, Void> logger) {
        for (int i = 0; i < 5; i++) {
            logger.apply(id);
        }
        System.out.println();
    }

    public static void main(String[] args) {
        Function<String, Void> logger1 = new Closure().getLineLogger();
        Function<String, Void> logger2 = new Closure().getLineLogger();

        invokeLogger("logger1", logger1);
        invokeLogger("logger2", logger2);
    }
}

Taking it for a test spin, we get:

Timmys-MacBook-Pro:Java8 z0ltan$ javac HoyteClosureModified.java
Timmys-MacBook-Pro:Java8 z0ltan$ java -cp . HoyteClosureModified
[logger1] Line number: 1
[logger1] Line number: 2
[logger1] Line number: 3
[logger1] Line number: 4
[logger1] Line number: 5

[logger2] Line number: 1
[logger2] Line number: 2
[logger2] Line number: 3
[logger2] Line number: 4
[logger2] Line number: 5

It’s not the same though, is it? The whole point of using a closure was so that we wouldn’t have to do this explicit management of state ourselves. As such, Java doesn’t really have full-blown closures, just a poor man’
s version of it. Better luck next time, Java.

Finally, the same using Ruby. As I have said before, Ruby feels remarkably like a Lisp despite the syntactic differences.

module HoyteClosure 
    class Demo
        def self.get_line_logger
            line_no = 0

            lambda do |id|
                line_no += 1
                puts "[" + id + "] Line number: " + line_no.to_s
            end
        end

        def self.logger_demo(id, logger)
            5.times {
                logger.call(id)
            }
            puts ""
        end

        def self.main
            logger1 = get_line_logger
            logger2 = get_line_logger

            logger_demo("logger1", logger1)
            logger_demo("logger2", logger2)
        end
    end
end

And the final test run:

irb(main):009:0> load "./hoyte_closure.rb"
load "./hoyte_closure.rb"
=> true
irb(main):010:0> HoyteClosure::Demo.main
HoyteClosure::Demo.main
[logger1] Line number: 1
[logger1] Line number: 2
[logger1] Line number: 3
[logger1] Line number: 4
[logger1] Line number: 5

[logger2] Line number: 1
[logger2] Line number: 2
[logger2] Line number: 3
[logger2] Line number: 4
[logger2] Line number: 5

=> nil

Short, non-idiomatic, and sweet!

Hoyte’s Line Number closure in a few choice languages

A highly opinionated review of Java Lambdas

What really is a lambda expression?

A lambda expression is, for all means and purposes, an anonymous function. That really is all there is to it. In languages that support first-class functions, this is yet another feature of the language – functions are on par with other types in the language. However, in language that don’t consider functions first class, it becomes a bit of an esoteric concept.

The origin of the concept is in the Lambda Calculus first propounded by the great Alonzo Church. According to that scheme, functions are basically entities which take some (or no) parameters, and have a body of code that can use those parameters. There is essentially no side-effect in such functions. That means that the function is deterministic – given the same set of parameters, it will always produce the same output. This is, in fact, the very foundation of Functional Programming. In modern times, Functional Programming is often conflated with strongly and statically typed languages. This is clearly wrong. The original Lambda Calculus had really no notion of types! (There is a variant of it though, the typed Lambda Calculus). Most of the languages that support lambda expressions today, however, freely allow plenty of side-effects within lambda expressions. The main takeaway here though is that lambda expressions are conceptually what named functions are made out of.

Lambdas in Java 8 and how they came to be

The biggest features in Java 8 were lambda support and the Stream API. In many ways, lambdas are important only with respect to their heavy use in the stream APIs (as seen in the previous blog on Shell). The key concept to understand when learning lambdas in Java is that lambdas/functions are not first-class objects in Java. Their entire existence is strictly bound to and controlled by interfaces, specifically the concept of SAMs (Single Abstract Method) interfaces – interface which contain only a single abstract method. In my opinion, this severe crippling of lambdas in Java has created more problems than it has solved. Now new programmers who pick up Java and run with it are liable to be very confused when to move on to languages which do support lambdas in a more natural and proper manner. In any case, let’s work with what we’ve got.

So why a Functional Interface? Prior to Java 8, if we wanted to simulate cases where we wanted to pass some functionality into another function, we had to make do with anonymous classes. For instance, to create a new thread, we could have done the following:

jshell> (new Thread (new Runnable () {
   ...>    @Override
   ...>    public void run () {
   ...>     System.out.println("Hello from thread!");
   ...>    }
   ...>  })).start()

jshell> Hello from thread!

We observe that the sole purpose of the anonymous class is to perform some actions, but the cognitive dissonance comes into play when we see that the Thread class constructor experts an instance (basically a data object) of type Runnable. This is exactly the same pattern that was followed by C++ until C++11. In fact, this is what is known (rather pompously, I must add) as a functor.

Here is what the Runnable interface looks like:

public interface Runnable {
   void run();
}

This pattern of the use of a (dummy) interface containing a single method which basically does all the work that an anonymous or named function should have done in the first place, was found to be in such widespread use amongst Java developers that the committee which worked on developing lambda support in Java decided to make it kosher and provide additional support from the Java Runtime. As a result, from Java 8 onwards, wherever a SAM is present, a lambda expression can be used as its target or in its stead. They have been made essentially the same.
For example, the previous example can now be written more succinctly as:

jshell> (new Thread(() -> System.out.println("Hello from thread...again!"))).start()
Hello from thread...again!

An optional annotation, @FunctionalInterface has also been introduced for bookkeeping purposes. In fact, in order to help out developers, a bunch of Functional Interfaces now come bundled with the JDK in the java.util.function package. I would highly recommend exploring them and testing them out to get a feel for them.

Custom Functional Interfaces

We can define our own functional interface in Java 8 (and above). The only restriction for the interface to be a functional interface is, as mentioned before, is that the interface have a single abstract method.

For instance, the standard package (java.util.function) comes with functional interfaces that support single parameter (Function) and double parameter (BiFunction) functions. Let us define a triple parameter function just for this example.

jshell> @FunctionalInterface interface TriFunction<T, U, V, R> {
   ...>     R apply(T t, U u, V v);
   ...> }
|  created interface TriFunction

jshell> int x = 100;
x ==> 100

jshell> int x = 100
x ==> 100

jshell> String y = "Hello"
y ==> "Hello"

jshell> double z = Math.PI
z ==> 3.141592653589793

jshell> TriFunction<Integer, String, Double, String> f = 
          (i, s, d) -> i + s + d;
f ==> $Lambda$6/1318822808@6d7b4f4c

jshell> System.out.println(f.apply(x, y, z))
100Hello3.141592653589793

Features and Limitations of Java Lambdas

So how exactly does a SAM map onto a lambda expression? To understand this better, first we need to get the syntax and semantics of lambda expressions out of the way:

Java’s lambda syntax was clearly influenced by Scala. A basic lambda expression has the following form:

(<param*>) -> [{] <body-form+> [}]

where,

’param’ is a comma-separated list of zero or more parameters with optional types (note that in some cases where Java’s type inference mechanism is unable to infer the type, you will need to specify the type(s) explicitly), the braces are optional in the case of a single line body, but are required when the body spans more than one line. Finally, each body-form is a series of normal Java statements. In the case of multiple statements, each body form is separated by a semi-colon, and a return statement is also required in this case (if the return type is not void).
So a lambda expression that takes a String and returns a String might take on several forms in actual code:

(s) -> s.toUpperCase()

The type signature is not required in this case, and the return statement is not allowed in this case, This would be the recommend usage of a typical lambda expression – don’t declare the types and don’t use any return statement. Of course, this only works for a single-statement (or, more correctly, a single-expression) body.

In case we want to use braces, we need to have the whole expression take the following form:

(String s) -> { return s.toUpperCase() }

So we need to specify the type of the parameter(s) as well as include an explicit return statement. In all cases where the body contains multiple statements, this would be the recommended format for a lambda expression.

Now getting back to how a SAM is mapped onto a lambda expression, whenever the Java Runtime encounters a lambda expression, it can do either of two things depending on the context in which the SAM is used:

  • In case the lambda expression is used along with a Stream API function (such as map, filter, reduce, etc.), the Java Runtime already has enough context about the form of the function that is expected – the parameter types and the return type. For instance, if we are trying to double all the even natural numbers upto 10, we might do:
    jshell> IntStream
             .rangeClosed(1, 10)
             .filter((n) -> n%2 == 0)
             .map((d) -> d*2).forEach(System.out::println)
    4
    8
    12
    16
    20
    

    In this case, the Java Runtime knows that the filter method takes a parameter of the form: Predicate. The Predicate functional interface has a single method – boolean test(Test t). So what the Runtime does is to check that the provided lambda expression matches this signature, and if verified, proceeds to invoke the “test” method implicitly. Similarly for the map function as well.

  • The second case arises in the case where we make use of Functional Interfaces explicitly and then use them as the “target” of a lambda expression. For instance, suppose we want to write a function that takes a String and an Integer and returns their concatenated form as a String, we might have something like:
    jshell> BiFunction<String, Integer, String> f = 
               (s, i) -> s + String.valueOf(i)
    f ==> $Lambda$17/103887628@42f93a98
    
    jshell> f.apply("Hello", 99)
    $21 ==> "Hello99"
    

    In this case as well, the compiler will ensure the the lambda expression matches the type of the declared function variable. Pretty straightforward.

So far so good, but there is a huge problem in the second case above. The problem is that even once the function object has been created, the name of the SAM must be known before we can use it. This is because Java does not have operator overloading (unlike C++). This is why in the current framework, we must know the exact name of each functional interface that we use. The “apply” method used above is the name of the SAM in the BiFunction functional interface. The problem is compounded because each functional interface (even in the standard package) defines its own names. Of course, this is not an insurmountable problem, but the same problem did not exist even in pre-C++-11. For instance, the previous example could have been done so in C++ (using a functor):

// pre C++-11
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

class string_int_string
{
    public:
        std::string operator()(std::string s, int i)
        {
            std::ostringstream oss;
            oss << s << i;
            return oss.str();
        }
};

int main()
{
    std::cout << concatenate("Hello", 99, string_int_string()) << std::endl;
   
    return 0;
}

A bit brittle, but it works. The generic function, “concatenate” is important to note here since it can basically take any functor (or lambda expression from C++-11 onwards), and invokes the function object with the supplied arguments. The same approach is used in the C++ STL generic functions. Now if we look at how the code might look like with C++-11, we get:

// C++-11 and above
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

int main()
{
   std::cout << concatenate("Hello", 99, 
		[](std::string s, int i) {
        			std::ostringstream oss;
        			oss << s << i;
        			return oss.str();
    		}) << std::endl;
    
    return 0;
}

As can be seen, the approach is much cleaner. The difference between the functor-version and the lambda-based one is that in this case, we’ve essentially got rid of the class representing the functor object and inserted its logic inside the lambda expression’s body. So it essentially appears that the lambda expression’ is basically an object that can bind the parameters just as in the case of a regular functor.

As can be seen, even in C++-11, we can write generic functions and all we need to do it invoke it like a function. No messy SAMs there! I personally feel that C++’s lambda support is far superior to that of Java, especially since C++ supports closures. More on that in the next section.

Another disadvantage of Java’s lambda support is that the following is impossible in Java:

#include <iostream>

int main()
{
    int x = 100, y = 100;
    
    std::cout << ([x, y]() { return x + y; })() << std::endl;
    
    return 0;
}

The code above simply uses a lambda expression to capture variables defined in the outer lexical scope (more on that in the next section), but the interesting bit is that the lambda expression can be invoked like a proper function object even without assigning it to a variable.

If we tried the same in Java, we’d get an error:

jshell> int x = 1100
x ==> 1100

jshell> int y = 200
y ==> 200

jshell> (() -> x + y)
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  (() -> x + y)
|   ^---------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  (() -> x + y)
|  ^-----------^

As can be seen from the error message, the Java Runtime complains that “Object” is not a functional interface. Even if we assumed that the runtime would be able to discern the functional interface type from its signature and produce a result, we still get an error:

jshell> ((int a, int b) -> { return a + b; })).apply(x, y)
|  Error:
|  ';' expected
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                       ^
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|   ^---------------------------------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^-----------------------------------^
|  Error:
|  cannot find symbol
|    symbol:   method apply(int,int)
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                         ^---^
|  Error:
|  missing return statement
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^------------------------------------------------^

So no go there. A point down for Java lambdas! More seriously, I find this to be an extremely irritating reminder that Java’s lambdas are not really lambdas. They are more like syntactic sugar for the good old anonymous classes. In fact, there are more serious implications precisely for this reason.

Closures

This is again one of those concepts that are notoriously badly explained. A lot of newbies to programming are often scared to death and put-off from learning more about Functional Programming due to unnecessary FUD on the part of many “experts” in the field. So, let’s try and explain this as clearly as possible:

In Set Theory, a set is defined to be “closed” under an operation if applying the operation to members of the set produces a result that belongs to the same set. For instance, if the set under consideration is the set of Natural Numbers (N) and the operation is + (addition), we can say Natural numbers are closed under addition. Why? The reason is quite simple and follows straight from the definition – adding any two natural numbers (or indeed any number of numbers, but we’re considering the strict binary operation here) always produces a Natural number. On the other hand, N is not closed under – (subtraction). This is because subtracting some Natural number from another Natural number might produce 0 or some negative number, which is clearly not a member of N. So much for mathematics.

In Psychology, “closure” refers to the strict need of an individual to find a definitive answer to a problem.

In business, “closure” refers to the process by which a business closes down.

You see what I’m getting at? The term “closure” is highly overloaded, and even within mathematics, the term has different meanings in different branches. So my point is this – simply forget about the name and focus on the concept.

In Computer Science, a closure is intricately tied to the concept of scoping, specifically lexical scoping. This is why closures are often referred to as “lexical closures”. In order to understand closures properly, we must clearly understand what lexical scoping entails.

Lexical scoping is intimately tied with the rules defining the lifetimes (and visibility) of variables. Dynamic scoping, in general, refers to a situation where a variable has effectively global visibility and lifetime. Pure lexical scoping, on the other hand, ensures that the visibility of variables is limited to the current lexical block (say a function or a local block), or to nested blocks. However, lexically scoped variables are not visible to outer blocks, and variables defined in inner blocks will effectively “shadow” those defined in the outer scope. If no new variables with the same name are defined in the inner block, references to variables will always refer to those in the outer scope. This behaviour forms the basis of what is known as “variable capture”.

A variable is said to be captured by a lambda function if the lambda function refers to an outer-scope variable during its time of creation. The lambda function is said to “close over” those variables, and this is the reason why this feature is called a “closure”. So what does this variable capture actually implicate in the grand scheme of things? What it implicates is this – when a lambda function captures a variable in its outer scope, the lifetime of the variable is effectively changed. Under normal circumstances, local variables die when the function is exited. In this case, however, since the lambda function has captured the variable, the variable will not die even when the function in which it was defined dies!

In this respect, Java behaves absolutely horribly. Java has extremely weird scoping rules. In some ways, it does use lexical scoping. In most respects, not:

jshell> void demoScopingInJava() {
   ...>          int x = 100;
   ...>          
   ...>          System.out.format("Function scope x = %d\n", x);
   ...>         {
   ...>             System.out.format("Function scope x (before shadowing) = %d\n", x);
   ...>             /// int x = 999 is not allowed!
   ...>             x = 999;
   ...>             System.out.format("Function scope x (after shadowing) = %d\n", x);
   ...>         }
   ...>          System.out.format("Function scope (again) x = %d\n", x);
   ...>      }
|  created method demoScopingInJava()

jshell> demoScopingInJava()
Function scope x = 100
Function scope x (before shadowing) = 100
Function scope x (after shadowing) = 999
Function scope (again) x = 999

Java does not allow any shadowing because we cannot define any new variables inside the block. Instead, all references to the variable ‘x’ are actually to the function scope variable. In this case, we are able to mutate ‘x’ from 100 to 999, but this is because the inner block is within the outer function block and the Java Runtime can therefore ensure that this variable is freed before the function exits. However, this is not allowed when are in a situation where the variable could be referenced even after the local function where it was declared exits.

For instance, if we want to implement a function that prints line numbers in an increasing order every time it is called, we might try to do something like this in Java:

jshell> Function<Void, Void> lineNumberGenerator() {
   ...>      int lineNumber = 0;
   ...>      return (n) -> 
                 { lineNumber++; 
                   System.out.format("Line number: %d\n", lineNumber); 
                   return null; };
   ...> }
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                       ^--------^
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                                                                            ^--------^

We can see, though, that modifying a variable defined in the outer scope is not allowed in the case here the code escapes the local scope. As can be clearly seen in the error messages, the variable lineNum must be declared “final” for the code to even compile (and of course, then it would fail again unless we removed the mutating statement inside the lambda function).
This is the reason why we cannot implement closures in Java – Java’s bizarre downward-percolating forced visibility of variables.

And, oh, just in case you thought this applies only to lambda blocks, it’s always been the case:

jshell> void scopingRulesTest () {
   ...>    int x = 100;
   ...>    
   ...>    (new Thread(new Runnable () {
   ...>        @Override
   ...>        public void run() {
   ...>           x++;
   ...>          System.out.println(x);
   ...>        }
   ...>      })).start();
   ...> }
|  Error:
|  local variables referenced from an inner class must be final or effectively final
|            x++;
|            ^

The same example in C++ works as expected (including modification of the outer scope’s variable):

#include <iostream>

using namespace std;

int main()
{
    ios_base::sync_with_stdio(false);
    
    int x = 100;
    cout << "Function scope x = " << x << endl;
    {
        x = 101;
        cout << "Block scope x = " << x << endl;
        int x = 999;
        cout << "Block scope x = " << x << endl;
    }
    
    cout << "Function scope x = " << x << endl;
    
    return 0;
}

sh-4.3$ main                                                                                                                                                              
Function scope x = 100                                                                                                                                                    
Block scope x = 101                                                                                                                                                       
Block scope x = 999                                                                                                                                                       
Function scope x = 101   

And to complete the line number closure demo (line number example):

// C++-11 (and above)
#include <iostream>
#include <functional>

using namespace std;

function<void()> line_number_generator()
{
    int line_num = 0;
    
    return [line_num]() mutable
            {
                line_num++;
                cout << "Line number: " << line_num << endl; 
            };
}

int main()
{
    ios_base::sync_with_stdio(false);
    
    function<void()> print_line_numbers = line_number_generator();
    
    for (int i = 0; i < 5; i++) {
        print_line_numbers();    
    }
}

sh-4.3$ g++ -std=c++11 -o main *.cpp                                                                                                                                      
sh-4.3$ main                                                                                                                                                              
Line number: 1                                                                                                                                                            
Line number: 2                                                                                                                                                            
Line number: 3                                                                                                                                                            
Line number: 4                                                                                                                                                            
Line number: 5  

Note that default variable capture is read-only in C++. However, the “mutable” keyword can be used to change that behaviour. In all respects, C++11 supports closures while Java cannot!

The Common Lisp version is pretty much identical in behaviour to the C++ one. In the case of Common Lisp, however, we have the extra implication that any references to outer-scope variable always capture the mentioned variable (unless a local variable with the same name is already defined). This is seen in the Common Lisp version of the same example:

CL-USER> (defun foo ()
           (let ((x 100))
             (format t "Function scope x = ~d~%" x)
             (progn
               (setf x 101)
               (let ((x 999))
                 (format t "Inner block x = ~d~%" x)))
             (format t "Function scope (again) x = ~d~%" x)))
STYLE-WARNING: redefining COMMON-LISP-USER::FOO in DEFUN
FOO
CL-USER> (foo)
Function scope x = 100
Inner block x = 999
Function scope (again) x = 101
NIL

This effectively ensures that a nested function that refer to the outer scope var(s), and which is then returned from the function is always a closure as can be seen from the following example (same example as the C++ one):

CL-USER> (defun line-number-generator ()
                      (let ((line-number 0))
                        #'(lambda ()
                             (incf line-number)
                            (format t "Line number: ~d~%" line-number))))
LINE-NUMBER-GENERATOR

CL-USER> (defvar print-line-numbers (line-number-generator))
PRINT-LINE-NUMBERS

CL-USER> print-line-numbers
#<CLOSURE (LAMBDA () :IN LINE-NUMBER-GENERATOR) {100439133B}>

CL-USER> (dotimes (i 5)
           (funcall print-line-numbers))
Line number: 1
Line number: 2
Line number: 3
Line number: 4
Line number: 5
NIL

Conclusion

Well, that about wraps up this rather long blog post! As can be seen from this post (as well as the JShell post and more posts to come, especially on Java Streams), lambda support in Java is an extremely welcome and necessary feature. However, in many ways, it’s a very crippled version of lambda support found in other languages, especially with regards to how closures are not supported in Java.Thankfully, most code that uses lambda expressions will be code that uses the Streams API, and as such most (if not all) the wartiness of Java’s lambdas will be effectively secreted within map, filter, reduce or some other mechanism in the Streams API.

Note: All the Java code posted in this blogpost was executed on JShell. For use in a regular Java environment, ensure to add semi-colons wherever appropriate.

A highly opinionated review of Java Lambdas