What really is a lambda expression?
A lambda expression is, for all means and purposes, an anonymous function. That really is all there is to it. In languages that support first-class functions, this is yet another feature of the language – functions are on par with other types in the language. However, in language that don’t consider functions first class, it becomes a bit of an esoteric concept.
The origin of the concept is in the Lambda Calculus first propounded by the great Alonzo Church. According to that scheme, functions are basically entities which take some (or no) parameters, and have a body of code that can use those parameters. There is essentially no side-effect in such functions. That means that the function is deterministic – given the same set of parameters, it will always produce the same output. This is, in fact, the very foundation of Functional Programming. In modern times, Functional Programming is often conflated with strongly and statically typed languages. This is clearly wrong. The original Lambda Calculus had really no notion of types! (There is a variant of it though, the typed Lambda Calculus). Most of the languages that support lambda expressions today, however, freely allow plenty of side-effects within lambda expressions. The main takeaway here though is that lambda expressions are conceptually what named functions are made out of.
Lambdas in Java 8 and how they came to be
The biggest features in Java 8 were lambda support and the Stream API. In many ways, lambdas are important only with respect to their heavy use in the stream APIs (as seen in the previous blog on Shell). The key concept to understand when learning lambdas in Java is that lambdas/functions are not first-class objects in Java. Their entire existence is strictly bound to and controlled by interfaces, specifically the concept of SAMs (Single Abstract Method) interfaces – interface which contain only a single abstract method. In my opinion, this severe crippling of lambdas in Java has created more problems than it has solved. Now new programmers who pick up Java and run with it are liable to be very confused when to move on to languages which do support lambdas in a more natural and proper manner. In any case, let’s work with what we’ve got.
So why a Functional Interface? Prior to Java 8, if we wanted to simulate cases where we wanted to pass some functionality into another function, we had to make do with anonymous classes. For instance, to create a new thread, we could have done the following:
jshell> (new Thread (new Runnable () {
...> @Override
...> public void run () {
...> System.out.println("Hello from thread!");
...> }
...> })).start()
jshell> Hello from thread!
We observe that the sole purpose of the anonymous class is to perform some actions, but the cognitive dissonance comes into play when we see that the Thread class constructor experts an instance (basically a data object) of type Runnable. This is exactly the same pattern that was followed by C++ until C++11. In fact, this is what is known (rather pompously, I must add) as a functor.
Here is what the Runnable interface looks like:
public interface Runnable {
void run();
}
This pattern of the use of a (dummy) interface containing a single method which basically does all the work that an anonymous or named function should have done in the first place, was found to be in such widespread use amongst Java developers that the committee which worked on developing lambda support in Java decided to make it kosher and provide additional support from the Java Runtime. As a result, from Java 8 onwards, wherever a SAM is present, a lambda expression can be used as its target or in its stead. They have been made essentially the same.
For example, the previous example can now be written more succinctly as:
jshell> (new Thread(() -> System.out.println("Hello from thread...again!"))).start()
Hello from thread...again!
An optional annotation, @FunctionalInterface has also been introduced for bookkeeping purposes. In fact, in order to help out developers, a bunch of Functional Interfaces now come bundled with the JDK in the java.util.function package. I would highly recommend exploring them and testing them out to get a feel for them.
Custom Functional Interfaces
We can define our own functional interface in Java 8 (and above). The only restriction for the interface to be a functional interface is, as mentioned before, is that the interface have a single abstract method.
For instance, the standard package (java.util.function) comes with functional interfaces that support single parameter (Function) and double parameter (BiFunction) functions. Let us define a triple parameter function just for this example.
jshell> @FunctionalInterface interface TriFunction<T, U, V, R> {
...> R apply(T t, U u, V v);
...> }
| created interface TriFunction
jshell> int x = 100;
x ==> 100
jshell> int x = 100
x ==> 100
jshell> String y = "Hello"
y ==> "Hello"
jshell> double z = Math.PI
z ==> 3.141592653589793
jshell> TriFunction<Integer, String, Double, String> f =
(i, s, d) -> i + s + d;
f ==> $Lambda$6/1318822808@6d7b4f4c
jshell> System.out.println(f.apply(x, y, z))
100Hello3.141592653589793
Features and Limitations of Java Lambdas
So how exactly does a SAM map onto a lambda expression? To understand this better, first we need to get the syntax and semantics of lambda expressions out of the way:
Java’s lambda syntax was clearly influenced by Scala. A basic lambda expression has the following form:
(<param*>) -> [{] <body-form+> [}]
where,
’param’ is a comma-separated list of zero or more parameters with optional types (note that in some cases where Java’s type inference mechanism is unable to infer the type, you will need to specify the type(s) explicitly), the braces are optional in the case of a single line body, but are required when the body spans more than one line. Finally, each body-form is a series of normal Java statements. In the case of multiple statements, each body form is separated by a semi-colon, and a return statement is also required in this case (if the return type is not void).
So a lambda expression that takes a String and returns a String might take on several forms in actual code:
(s) -> s.toUpperCase()
The type signature is not required in this case, and the return statement is not allowed in this case, This would be the recommend usage of a typical lambda expression – don’t declare the types and don’t use any return statement. Of course, this only works for a single-statement (or, more correctly, a single-expression) body.
In case we want to use braces, we need to have the whole expression take the following form:
(String s) -> { return s.toUpperCase() }
So we need to specify the type of the parameter(s) as well as include an explicit return statement. In all cases where the body contains multiple statements, this would be the recommended format for a lambda expression.
Now getting back to how a SAM is mapped onto a lambda expression, whenever the Java Runtime encounters a lambda expression, it can do either of two things depending on the context in which the SAM is used:
- In case the lambda expression is used along with a Stream API function (such as map, filter, reduce, etc.), the Java Runtime already has enough context about the form of the function that is expected – the parameter types and the return type. For instance, if we are trying to double all the even natural numbers upto 10, we might do:
jshell> IntStream
.rangeClosed(1, 10)
.filter((n) -> n%2 == 0)
.map((d) -> d*2).forEach(System.out::println)
4
8
12
16
20
In this case, the Java Runtime knows that the filter method takes a parameter of the form: Predicate. The Predicate functional interface has a single method – boolean test(Test t). So what the Runtime does is to check that the provided lambda expression matches this signature, and if verified, proceeds to invoke the “test” method implicitly. Similarly for the map function as well.
- The second case arises in the case where we make use of Functional Interfaces explicitly and then use them as the “target” of a lambda expression. For instance, suppose we want to write a function that takes a String and an Integer and returns their concatenated form as a String, we might have something like:
jshell> BiFunction<String, Integer, String> f =
(s, i) -> s + String.valueOf(i)
f ==> $Lambda$17/103887628@42f93a98
jshell> f.apply("Hello", 99)
$21 ==> "Hello99"
In this case as well, the compiler will ensure the the lambda expression matches the type of the declared function variable. Pretty straightforward.
So far so good, but there is a huge problem in the second case above. The problem is that even once the function object has been created, the name of the SAM must be known before we can use it. This is because Java does not have operator overloading (unlike C++). This is why in the current framework, we must know the exact name of each functional interface that we use. The “apply” method used above is the name of the SAM in the BiFunction functional interface. The problem is compounded because each functional interface (even in the standard package) defines its own names. Of course, this is not an insurmountable problem, but the same problem did not exist even in pre-C++-11. For instance, the previous example could have been done so in C++ (using a functor):
// pre C++-11
#include <iostream>
#include <sstream>
template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
return f(s, t);
}
class string_int_string
{
public:
std::string operator()(std::string s, int i)
{
std::ostringstream oss;
oss << s << i;
return oss.str();
}
};
int main()
{
std::cout << concatenate("Hello", 99, string_int_string()) << std::endl;
return 0;
}
A bit brittle, but it works. The generic function, “concatenate” is important to note here since it can basically take any functor (or lambda expression from C++-11 onwards), and invokes the function object with the supplied arguments. The same approach is used in the C++ STL generic functions. Now if we look at how the code might look like with C++-11, we get:
// C++-11 and above
#include <iostream>
#include <sstream>
template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
return f(s, t);
}
int main()
{
std::cout << concatenate("Hello", 99,
[](std::string s, int i) {
std::ostringstream oss;
oss << s << i;
return oss.str();
}) << std::endl;
return 0;
}
As can be seen, the approach is much cleaner. The difference between the functor-version and the lambda-based one is that in this case, we’ve essentially got rid of the class representing the functor object and inserted its logic inside the lambda expression’s body. So it essentially appears that the lambda expression’ is basically an object that can bind the parameters just as in the case of a regular functor.
As can be seen, even in C++-11, we can write generic functions and all we need to do it invoke it like a function. No messy SAMs there! I personally feel that C++’s lambda support is far superior to that of Java, especially since C++ supports closures. More on that in the next section.
Another disadvantage of Java’s lambda support is that the following is impossible in Java:
#include <iostream>
int main()
{
int x = 100, y = 100;
std::cout << ([x, y]() { return x + y; })() << std::endl;
return 0;
}
The code above simply uses a lambda expression to capture variables defined in the outer lexical scope (more on that in the next section), but the interesting bit is that the lambda expression can be invoked like a proper function object even without assigning it to a variable.
If we tried the same in Java, we’d get an error:
jshell> int x = 1100
x ==> 1100
jshell> int y = 200
y ==> 200
jshell> (() -> x + y)
| Error:
| incompatible types: java.lang.Object is not a functional interface
| (() -> x + y)
| ^---------^
| Error:
| incompatible types: <none> cannot be converted to java.lang.Object
| (() -> x + y)
| ^-----------^
As can be seen from the error message, the Java Runtime complains that “Object” is not a functional interface. Even if we assumed that the runtime would be able to discern the functional interface type from its signature and produce a result, we still get an error:
jshell> ((int a, int b) -> { return a + b; })).apply(x, y)
| Error:
| ';' expected
| ((int a, int b) -> { return a + b; })).apply(x, y)
| ^
| Error:
| incompatible types: java.lang.Object is not a functional interface
| ((int a, int b) -> { return a + b; })).apply(x, y)
| ^---------------------------------^
| Error:
| incompatible types: <none> cannot be converted to java.lang.Object
| ((int a, int b) -> { return a + b; })).apply(x, y)
| ^-----------------------------------^
| Error:
| cannot find symbol
| symbol: method apply(int,int)
| ((int a, int b) -> { return a + b; })).apply(x, y)
| ^---^
| Error:
| missing return statement
| ((int a, int b) -> { return a + b; })).apply(x, y)
| ^------------------------------------------------^
So no go there. A point down for Java lambdas! More seriously, I find this to be an extremely irritating reminder that Java’s lambdas are not really lambdas. They are more like syntactic sugar for the good old anonymous classes. In fact, there are more serious implications precisely for this reason.
Closures
This is again one of those concepts that are notoriously badly explained. A lot of newbies to programming are often scared to death and put-off from learning more about Functional Programming due to unnecessary FUD on the part of many “experts” in the field. So, let’s try and explain this as clearly as possible:
In Set Theory, a set is defined to be “closed” under an operation if applying the operation to members of the set produces a result that belongs to the same set. For instance, if the set under consideration is the set of Natural Numbers (N) and the operation is + (addition), we can say Natural numbers are closed under addition. Why? The reason is quite simple and follows straight from the definition – adding any two natural numbers (or indeed any number of numbers, but we’re considering the strict binary operation here) always produces a Natural number. On the other hand, N is not closed under – (subtraction). This is because subtracting some Natural number from another Natural number might produce 0 or some negative number, which is clearly not a member of N. So much for mathematics.
In Psychology, “closure” refers to the strict need of an individual to find a definitive answer to a problem.
In business, “closure” refers to the process by which a business closes down.
You see what I’m getting at? The term “closure” is highly overloaded, and even within mathematics, the term has different meanings in different branches. So my point is this – simply forget about the name and focus on the concept.
In Computer Science, a closure is intricately tied to the concept of scoping, specifically lexical scoping. This is why closures are often referred to as “lexical closures”. In order to understand closures properly, we must clearly understand what lexical scoping entails.
Lexical scoping is intimately tied with the rules defining the lifetimes (and visibility) of variables. Dynamic scoping, in general, refers to a situation where a variable has effectively global visibility and lifetime. Pure lexical scoping, on the other hand, ensures that the visibility of variables is limited to the current lexical block (say a function or a local block), or to nested blocks. However, lexically scoped variables are not visible to outer blocks, and variables defined in inner blocks will effectively “shadow” those defined in the outer scope. If no new variables with the same name are defined in the inner block, references to variables will always refer to those in the outer scope. This behaviour forms the basis of what is known as “variable capture”.
A variable is said to be captured by a lambda function if the lambda function refers to an outer-scope variable during its time of creation. The lambda function is said to “close over” those variables, and this is the reason why this feature is called a “closure”. So what does this variable capture actually implicate in the grand scheme of things? What it implicates is this – when a lambda function captures a variable in its outer scope, the lifetime of the variable is effectively changed. Under normal circumstances, local variables die when the function is exited. In this case, however, since the lambda function has captured the variable, the variable will not die even when the function in which it was defined dies!
In this respect, Java behaves absolutely horribly. Java has extremely weird scoping rules. In some ways, it does use lexical scoping. In most respects, not:
jshell> void demoScopingInJava() {
...> int x = 100;
...>
...> System.out.format("Function scope x = %d\n", x);
...> {
...> System.out.format("Function scope x (before shadowing) = %d\n", x);
...> /// int x = 999 is not allowed!
...> x = 999;
...> System.out.format("Function scope x (after shadowing) = %d\n", x);
...> }
...> System.out.format("Function scope (again) x = %d\n", x);
...> }
| created method demoScopingInJava()
jshell> demoScopingInJava()
Function scope x = 100
Function scope x (before shadowing) = 100
Function scope x (after shadowing) = 999
Function scope (again) x = 999
Java does not allow any shadowing because we cannot define any new variables inside the block. Instead, all references to the variable ‘x’ are actually to the function scope variable. In this case, we are able to mutate ‘x’ from 100 to 999, but this is because the inner block is within the outer function block and the Java Runtime can therefore ensure that this variable is freed before the function exits. However, this is not allowed when are in a situation where the variable could be referenced even after the local function where it was declared exits.
For instance, if we want to implement a function that prints line numbers in an increasing order every time it is called, we might try to do something like this in Java:
jshell> Function<Void, Void> lineNumberGenerator() {
...> int lineNumber = 0;
...> return (n) ->
{ lineNumber++;
System.out.format("Line number: %d\n", lineNumber);
return null; };
...> }
| Error:
| local variables referenced from a lambda expression must be final or effectively final
| return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
| ^--------^
| Error:
| local variables referenced from a lambda expression must be final or effectively final
| return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
| ^--------^
We can see, though, that modifying a variable defined in the outer scope is not allowed in the case here the code escapes the local scope. As can be clearly seen in the error messages, the variable lineNum must be declared “final” for the code to even compile (and of course, then it would fail again unless we removed the mutating statement inside the lambda function).
This is the reason why we cannot implement closures in Java – Java’s bizarre downward-percolating forced visibility of variables.
And, oh, just in case you thought this applies only to lambda blocks, it’s always been the case:
jshell> void scopingRulesTest () {
...> int x = 100;
...>
...> (new Thread(new Runnable () {
...> @Override
...> public void run() {
...> x++;
...> System.out.println(x);
...> }
...> })).start();
...> }
| Error:
| local variables referenced from an inner class must be final or effectively final
| x++;
| ^
The same example in C++ works as expected (including modification of the outer scope’s variable):
#include <iostream>
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
int x = 100;
cout << "Function scope x = " << x << endl;
{
x = 101;
cout << "Block scope x = " << x << endl;
int x = 999;
cout << "Block scope x = " << x << endl;
}
cout << "Function scope x = " << x << endl;
return 0;
}
sh-4.3$ main
Function scope x = 100
Block scope x = 101
Block scope x = 999
Function scope x = 101
And to complete the line number closure demo (line number example):
// C++-11 (and above)
#include <iostream>
#include <functional>
using namespace std;
function<void()> line_number_generator()
{
int line_num = 0;
return [line_num]() mutable
{
line_num++;
cout << "Line number: " << line_num << endl;
};
}
int main()
{
ios_base::sync_with_stdio(false);
function<void()> print_line_numbers = line_number_generator();
for (int i = 0; i < 5; i++) {
print_line_numbers();
}
}
sh-4.3$ g++ -std=c++11 -o main *.cpp
sh-4.3$ main
Line number: 1
Line number: 2
Line number: 3
Line number: 4
Line number: 5
Note that default variable capture is read-only in C++. However, the “mutable” keyword can be used to change that behaviour. In all respects, C++11 supports closures while Java cannot!
The Common Lisp version is pretty much identical in behaviour to the C++ one. In the case of Common Lisp, however, we have the extra implication that any references to outer-scope variable always capture the mentioned variable (unless a local variable with the same name is already defined). This is seen in the Common Lisp version of the same example:
CL-USER> (defun foo ()
(let ((x 100))
(format t "Function scope x = ~d~%" x)
(progn
(setf x 101)
(let ((x 999))
(format t "Inner block x = ~d~%" x)))
(format t "Function scope (again) x = ~d~%" x)))
STYLE-WARNING: redefining COMMON-LISP-USER::FOO in DEFUN
FOO
CL-USER> (foo)
Function scope x = 100
Inner block x = 999
Function scope (again) x = 101
NIL
This effectively ensures that a nested function that refer to the outer scope var(s), and which is then returned from the function is always a closure as can be seen from the following example (same example as the C++ one):
CL-USER> (defun line-number-generator ()
(let ((line-number 0))
#'(lambda ()
(incf line-number)
(format t "Line number: ~d~%" line-number))))
LINE-NUMBER-GENERATOR
CL-USER> (defvar print-line-numbers (line-number-generator))
PRINT-LINE-NUMBERS
CL-USER> print-line-numbers
#<CLOSURE (LAMBDA () :IN LINE-NUMBER-GENERATOR) {100439133B}>
CL-USER> (dotimes (i 5)
(funcall print-line-numbers))
Line number: 1
Line number: 2
Line number: 3
Line number: 4
Line number: 5
NIL
Conclusion
Well, that about wraps up this rather long blog post! As can be seen from this post (as well as the JShell post and more posts to come, especially on Java Streams), lambda support in Java is an extremely welcome and necessary feature. However, in many ways, it’s a very crippled version of lambda support found in other languages, especially with regards to how closures are not supported in Java.Thankfully, most code that uses lambda expressions will be code that uses the Streams API, and as such most (if not all) the wartiness of Java’s lambdas will be effectively secreted within map, filter, reduce or some other mechanism in the Streams API.
Note: All the Java code posted in this blogpost was executed on JShell. For use in a regular Java environment, ensure to add semi-colons wherever appropriate.