Calling native code from Rust – a very simple example!

I have been playing around with Rust for a bit, and so far so good! I think that people make a bit too much of the whole complexity of the Ownership, Borrowing, and Lifetimes concepts. However, I did find the documentation (in the Rust Book, as also in Rust By Example) on Lifetimes to be not only very sparse, but more confusing than helpful. That part could certainly do with some work.

Niko Matsakis has started a small site to help explain some core concepts which I found to be interesting, useful, and fun! You can find them here:Into Rust. It is still a work in progress, and hopefully they will add many more videos – especially on Lifetimes, Unsafe, and I/O (which is a a bit of a bugbear for me right now).

In any case, for this short tutorial, I will attempt to demonstrate how your custom C++ library may be invoked from Rust. Well, let’s get on with it!

Setting up the native library

Just to make things interesting, let us have a native library that takes in a vector of numbers, sorts them in non-increasing order, and returns them to the caller.

Let’s define the C++ functionality in a file called sorting.cpp, with the declaration in the header file sorting.h:

#ifndef __SORTING_H__
#define __SORTING_H__ "sorting.h"

#include <iostream>
#include <functional>
#include <algorithm>

#ifdef __cplusplus
extern "C" {
#endif
  void interop_sort(int[], size_t);
#ifdef __cplusplus
}
#endif
#endif
#include "sorting.h"

void interop_sort(int numbers[], size_t size)
{
  int* start = &numbers[0];
  int* end = &numbers[0] + size;

  std::sort(start, end, [](int x, int y) {  return x > y; });
}

Let’s wrap this up in a nice static library called libmysorting.a (note that I am building this on a macOS system. Please consult your local OS’ documentation for details specific to your platform):

Timmys-MacBook-Pro:Rust_C_Interop z0ltan$ g++ -Wall -std=c++14 -c sorting.cpp
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated

Timmys-MacBook-Pro:Rust_C_Interop z0ltan$ ar rc libmysorting.a sorting.o
Timmys-MacBook-Pro:Rust_C_Interop z0ltan$ ls
libmysorting.a	sorting.cpp	sorting.h	sorting.o

And just to make sure that no name-mangling has occurred during the process:

Timmys-MacBook-Pro:Rust_C_Interop z0ltan$ nm -gU libmysorting.a 

libmysorting.a(sorting.o):
0000000000000000 T _interop_sort

Okay, looks like everything is in order. Now let’s move on to the Rust bit.

Calling the native library from Rust

To use the library from Rust, let’s create a new Cargo project called sorting_demo:

Timmys-MacBook-Pro:Rust_C_Interop z0ltan$ cargo new --bin sorting_demo
     Created binary (application) `sorting_demo` project

Timmys-MacBook-Pro:Rust_C_Interop z0ltan$ cd sorting_demo

Timmys-MacBook-Pro:sorting_demo z0ltan$ ls
Cargo.toml	src

Now we need to have a way to link the native library into our Rust project. For simplicity’s sake, let’s use the simplest approach:

Timmys-MacBook-Pro:sorting_demo z0ltan$ cp ../libmysorting.a .

Timmys-MacBook-Pro:sorting_demo z0ltan$ ls
Cargo.lock	Cargo.toml	libmysorting.a	src		target

All we did here is to copy libmysorting.a into the root of the Rust project, and set the LD_LIBRARY_PATH environment variable to ensure that the linker can find the library.

Now that that’s set up, let’s write up the logic to link the static library from Rust code. Put the following code inside src/main.rs

#[link(name = "mysorting")]
extern {
    fn interop_sort(arr: &[i32;10], n: u32);
}

pub fn sort_from_cpp(arr: &[i32;10], n: u32) {
    unsafe {
        interop_sort(arr, n);
    }
}

fn main() {
    let my_arr = [100, 200, -99, 34, 56, -656, 99, 94, 84, 89];

    println!("Before sorting...");
    println!("{:?}\n", my_arr);

    sort_from_cpp(&my_arr, 10);

    println!("After sorting...");
    println!("{:?}", my_arr);

}

All right. Now all that remains is to see if this whole setup actually works!

Test spin

Timmys-MacBook-Pro:sorting_demo z0ltan$ cargo run
   Compiling sorting_demo v0.1.0 (file:///Users/z0ltan/Rabota/Blogs/Rust_C_Interop/sorting_demo)
    Finished debug [unoptimized + debuginfo] target(s) in 0.29 secs
     Running `target/debug/sorting_demo`
Before sorting...
[100, 200, -99, 34, 56, -656, 99, 94, 84, 89]

After sorting...
[200, 100, 99, 94, 89, 84, 56, 34, -99, -656]

Nice! So it all works as expected. The array has been correctly sorted in non-increasing order.

Explanation: The code is pretty much self-explanatory. First up we link the static library using the #[link(name = "mysorting")] directive (similar to annotations in other languages). This allows Rust to seek out a library named libmysorting.a or libmysorting.dylib.

Then we simply define the equivalent C++ function, interop_sort along with matching Rust types for the signature (note how we can use an array slice, &[i32;10] to match up with the C array, int arr[10]. Likewise, the size of the array is an unsigned integer, so we declare that to be of type u32. The whole native function itself is wrapped up inside an extern block. There are various options for the extern, and it defaults to extern "C". This block keyword is used to signify foreign code in Rust.

Finally, it’s good practice to wrap up the unsafe function inside a safe Rust function which is what we did with the sort_from_cpp function.

And there you go! A bare minimum working example of calling C++ (or indeed C) code from Rust. A special note here: as we can see, we defined a C interface for the C++ code to avoid name-mangling. If this were not done so directly in the C++ code, we could still have defined a C wrapper (as we have done in other interop examples) and have a callback function from C to the C++ code. That would work as well.

Advertisements
Calling native code from Rust – a very simple example!

Interop mini-series – Calling C and C++ code from Common Lisp using CFFI (Part 1)

Starting with this post, the new few posts will dig into interoperability between various languages. The next couple of posts will cover C and C++ code from Common Lisp, and how to write callback functions in Common Lisp that plug into code residing in a shared library. This will make use of the cffi library.

Then the following two posts will cover the analogue in Java-world. To this end, we will make use of the JNA project to indicate interop between C/C++ and Java.

Finally, this series will (hopefully) conclude with a mini-project of sorts – a completely embedded JVM instance inside a Common Lisp image! A number of demos will illustrate different uses of embedding Java within Common Lisp. This is a bit of an undertaking though, and will definitely take some time to implement least of all due to the fact that I want to extract the maximum amount of learning from this activity!

Contents

  1. Introducing the cffi library
  2. Demos
    1. Interop with C
    2. Interop with C++
  3. Summary of useful functions
  4. References

Setup used for this tutorial series

In order to keep things sane, I will be sticking to a single platform (unless otherwise noted) with the following configuration for this whole mini-series:

  • Mac OS X El Capitan system
  • 8 cores, 16GB RAM, 1600MHz processor
  • SBCL as my Common Lisp implementation
  • JDK 9
  • Apple LLVM 6.1.0 (with Clang as the frontend) as my C and C++ compiler

Note that even though the compiler used is LLVM, the behaviour is more or less the same as that of standard gcc/g++. The same flags also work for compilation, and the only difference vis-a-vis this tutorial will be how the shared (dynamic) library is created.

Introduction to the CFFI library

Let’s install the cffi library using QuickLisp (if you haven’t done so already) first:

LISP-TO-C-USER> (ql:quickload :cffi)
To load "cffi":
  Load 4 ASDF systems:
    alexandria babel trivial-features uiop
  Install 1 Quicklisp release:
    cffi
; Fetching #<URL "http://beta.quicklisp.org/archive/cffi/2016-03-18/cffi_0.17.1.tgz">
; 234.48KB
==================================================
240,107 bytes in 0.33 seconds (712.70KB/sec)
; Loading "cffi"
[package cffi-sys]................................
[package cffi]....................................
..................................................
[package cffi-features]
(:CFFI)

Now, let’s talk a bit about this library and the features that it provides. Links to the download site and manual are provided in the “References” section.

The cffi library is a cross-platform (across Common Lisp implementations that is) library that supports interop with C (and with C++, but we’ll talk more about that later). What this means is that you can load a native library (dylib, so file, DLL, etc.) and use the functions defined therein within your Lisp code.

The interop is two-ways – the general case is that you want to invoke C functions from Lisp code, or you may want to invoke a function in the library that expects a callback, and you can define this callback as pure Common Lisp code! Nifty, isn’t it?

The library is very well-designed and personally I find the Lispy nature of the APIs (and generated functions) an extra bonus.

The best way to learn the library is to see it in action, so let’s get on with it!

Note: Platform support for different features varies according to the quirks of the specific Common Lisp implementation. Refer to the cffi documentation for specifics).

Demos

Top

These demos are aimed to be simple and small, and yet somewhat useful in terms of real-world applicability.

I personally feel that purely contrived demos are best avoided since they hardly teach anything well, most of all for the reason that they are extremely boring!

In the first demo, we will see how a C library may be loaded and run from Common Lisp. This will be the most common use case.

In the second demo, we will do the same, but for a C++ library with a C wrapper around the C++ functionality.

Since both demos are defined in the same package, let’s define the package first:

(require 'cffi)

(defpackage :lisp-to-c-user
  (:use :cl :cffi))

(in-package :lisp-to-c-user)

Interop with C

Top

For the C example, I decided to use use the native library to get some useful system information – architecture type, model name, memory, number of cpus, and the number of logical cpus (cores).

Note that this example works only in Mac OS X. For Linux, the sysctlbyname function can be replaced by sysctl with appropriate changes. For Windows, you will have to check which kernel call provides the same functionality.

We will use the sysctlbyname function to extract these values.

Let’s define the header file first (in system_info.h):

#ifndef __SYSTEM_INFO_H__
#define __SYSTEM_INFO_H__ "system_info.h"

#include <sys/types.h>
#include <sys/sysctl.h>

#ifdef __cplusplus
extern "C" {
#endif

char* get_machine();
char* get_model();
int64_t get_memory();
int32_t get_ncpu();
int32_t get_nlogicalcpu();

#ifdef __cplusplus
}
#endif
#endif

And the corresponding C implementation (in system_info.c:

#include <stdio.h>
#include "system_info.h"

#define MAXSIZE 210

char* get_machine()
{
    static char machine[MAXSIZE];
    size_t len = sizeof(machine);

    sysctlbyname("hw.machine", &machine, &len, NULL, 0);

    return machine;
}

char* get_model()
{
    static char model[MAXSIZE];
    size_t len = sizeof(model);

    sysctlbyname("hw.model", &model, &len, NULL, 0);

    return model;
}

int64_t get_memory()
{
    int64_t mem;
    size_t len = sizeof(mem);

    sysctlbyname("hw.memsize", &mem, &len, NULL, 0);

    return mem;
}

int32_t get_ncpu()
{
    int32_t cpu;
    size_t len = sizeof(cpu);

    sysctlbyname("hw.ncpu", &cpu, &len, NULL, 0);

    return cpu;
}


int32_t get_nlogicalcpu()
{
    int32_t logical_cpu;
    size_t len = sizeof(logical_cpu);

    sysctlbyname("hw.logicalcpu", &logical_cpu, &len, NULL, 0);

    return logical_cpu;
}

int main(int argc, char* argv[])
{
    printf("%s, %s, %lld, %d, %d\n", 
            get_machine(),
            get_model(),
            get_memory(),
            get_ncpu(),
            get_nlogicalcpu());

    return 0;
}

Let’s compile it into a shared library (in this case, Clang + LLVM on Mac OS X. For other compilers such as gcc proper, check the relevant documentation):

Timmys-MacBook-Pro:c_demo_system_info z0ltan$ clang -dynamiclib -o libsysteminfo.lib system_info.c

Timmys-MacBook-Pro:c_demo_system_info z0ltan$ ls
libsysteminfo.lib	system_info.c		system_info.h

Excellent! Finally, let’s write the Common Lisp client code to use this library:

;;; C-demo

(define-foreign-library libsysteminfo
  (:darwin "libsysteminfo.dylib")
  (:unix "libsysteminfo.so")
  (t (:default "libsysteminfo.dylib")))

(load-foreign-library 'libsysteminfo)

(defcfun "get_machine" :string)
(defcfun "get_model" :string)
(defcfun "get_memory" :long-long)
(defcfun "get_ncpu" :int)
(defcfun "get_nlogicalcpu" :int)

(defun print-system-info ()
  (let ((arch (get-machine))
        (model (get-model))
        (mem (/ (get-memory) (* 1024 1024 1024)))
        (ncpu (get-ncpu))
        (nlogicalcpu (get-nlogicalcpu)))
    (format t "System Information~%")
    (format t "Arch: ~a, Model: ~a, Mem = ~dGB, CPUs = ~d, Logical CPUs = ~d~%"
            arch model mem ncpu nlogicalcpu)))

(print-system-info)

(close-foreign-library 'libsysteminfo)

And the output:

LISP-TO-C-USER> (print-system-info)
System Information
Arch: x86_64, Model: MacBookPro11,2, Mem = 16GB, CPUs = 8, Logical CPUs = 8
NIL

Explanation: We define the native library by using the cffi:define-foreign-library macro. This macro also allows us to define the specific name of the shared library depending on the OS.

Then we can load the specified library using the cffi:load-foreign-library macro. Take care to observed that the name of the library is quoted. This can save you a lot of anguish later on.

The next part is interesting – we use the cffi:decfun macro to define the C functions present in the library as Lispy function. For instance, the C function “get_machine” which is defined in the libsysteminfo.dylib library, is proxied into the current Lisp image as “get-machine”. There are ways to perform such name mangling automatically, but letting the cffi library take care of this is my recommendation.

The general syntax of the defcfun macro is:

	(cffi:defcun <C-function-name> &optional<return-type> 
		<arg with types>*) 

So the first defcfun indicates that get_machine is a C function that returns a character array (represented by cffi’s local type, :string), and that it doesn’t take any parameter(s), The cffi library defines a huge set of types that map to C’s primitive, pointer, and structure types extremely well.

Now that we have create proxies for the C functions, we can invoke them as seen in the print-system-info function by passing in the appropriate return type and parameters.

Finally, we unload the native library using another macro, cffi:close-foreign-library, which also takes a quoted library representation.

Interop with C++

Top

This is the more interesting demo for more than one reason! In this example, let’s try and sort an array of integers using the native library.

Again, let’s write the interface out first (in number_sorting.h):

#ifndef __NUMBER_SORTING_H__ 
#define __NUMBER_SORTING_H__ "number_sorting.h"

void callback_function(int[], int);

extern "C" {
    void sort_numbers(int[], int);
}
#endif

Looks good, but what’s the deal with the callback_function? We’ll get to that in just a moment. For now, let’s flesh out the functionality for this interface (in number_sorting.cpp:

#include <iostream>
#include <vector>
#include <algorithm>
#include <functional>

void sort_vector(std::vector<int>&, int[], int);

void callback_function(int array[], int size)
{
    std::vector<int> vec(size);

    sort_vector(vec, array, size);

    int i = 0;
    for (std::vector<int>::const_iterator it = vec.begin(); it != vec.end(); it++)
        array[i++] = *it;
}
    

template <typename T>
void display_elements(const std::vector<T>& vec)
{
    for (std::vector<int>::const_iterator it = vec.begin(); it != vec.end(); it++)
        std::cout << *it << " ";
    std::cout << std::endl;
}


void sort_vector(std::vector<int>& v, int numbers[], int count)
{
    for (int i = 0; i < count; i++)
        v[i] = numbers[i];

    display_elements(v);

    std::sort(v.begin(), v.end(), [](int x, int y) { return x < y; });
}

int main()
{
    std::ios_base::sync_with_stdio(false);

    int sample[] = { 1, 2, 0, -1, 3, 199, 200, 110, -234, 12345 };

    callback_function(sample, sizeof(sample)/sizeof(sample[0]));

    for (int i = 0; i < (int)sizeof(sample)/sizeof(sample[0]); i++)
        std::cout << sample[i] << " ";
    std::cout << std::endl;

    return 0;
}

Hmmm, this seems a bit too convoluted for this simple example? Why all the indirection? The reason will become crystal clear once we define the corresponding C file (in number_sorting.c) as well:

#include "number_sorting.h"

void sort_numbers(int numbers[], int n)
{
    callback_function(numbers, n);
}

Explanation: The reasons why we need both number_sorting.c and number_sorting.cpp, both of which implement the same interface, number_sorting.h
are two-fold:

  1. Since we are using some C++-only features such std::vector , std::sort, and C++11 lambdas, we need to invoke them in a separate function
  2. And the more important reason – C++’s pernicious name-mangling

Now, if we had simply written the entire sorting functionality using integer arrays and sorted using with C-like constructs (say, qsort, or a manually written sorting function), we wouldn’t need all this indirection, and we could have simply written the header as:

#ifndef __NUMBER_SORTING_H__ 
#define __NUMBER_SORTING_H__ "number_sorting.h"

extern "C" {
    void sort_numbers(int[], int);
}
#endif

and provided the implementation in number_sorting.cpp alone. That would have worked out fine. However, because we use all those C++ templated constructs as well as functional constructs, if we had used this same header file, we would have got a name-mangling issue, and the function would not be visible to the Common Lisp client!

To get around this, we write a C wrapper (number_sorting.c) which simply invokes the C++ function callback_function defined in number_sorting.cpp. Now you may think that we could have simply embedded callback_function inside the definition of sort_numbers in the C++ file alone, but that would not work either. Check out the reference “How to mix C and C++” in the “References section” for more details.

All right, let’s compile the code and generate the shared library:

Timmys-MacBook-Pro:c++_demo_sorting z0ltan$ clang++ -std=c++11 -stdlib=libc++ -dynamiclib -o libnumbersorting.dylib number_sorting.c number_sorting.cpp
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated

Timmys-MacBook-Pro:c++_demo_sorting z0ltan$ ls
libnumbersorting.dylib	number_sorting.c	number_sorting.cpp	number_sorting.h

Timmys-MacBook-Pro:c++_demo_sorting z0ltan$ nm -gU libnumbersorting.dylib 
0000000000000a10 T __Z11sort_vectorRNSt3__16vectorIiNS_9allocatorIiEEEEPii
0000000000000730 T __Z17callback_functionPii
0000000000000c00 T _main
0000000000000700 T _sort_numbers

We can also see that the function sort_numbers has not been subjected to name-mangling. Now that we’ve resolved that, let’s flesh out the Common Lisp client, and run the demo!

;;; C++-demo

(define-foreign-library libnumbersorting
  (:darwin "libnumbersorting.dylib")
  (:unix "libnumbersorting.so")
  (t (:default "libnumbersorting.dylib")))

(use-foreign-library libnumbersorting)

(defun sort-some-numbers (&optional (n 10))
  (with-foreign-object (numbers :int n)
    (dotimes (i n)
      (setf (mem-aref numbers :int i) (random 100)))
    (let ((before (loop for i below n
                     collect (mem-aref numbers :int i))))
      (format t "Before: ~{~d ~}~%" before))
    (foreign-funcall "sort_numbers" :pointer numbers :int n :void)
    (let ((after (loop for j below n
                    collect (mem-aref numbers :int j))))
      (format t "After: ~{~d ~}~%" after))))

(sort-some-numbers)

(close-foreign-library 'libnumbersorting)

And the output:

LISP-TO-C-USER> (sort-some-numbers 15)
Before: 52 11 18 62 39 89 2 48 48 66 73 89 73 26 97 
After: 2 11 18 26 39 48 48 52 62 66 73 73 89 89 97 
NIL

Cool!

Explanation: This demo differs only slightly from the C demo in terms of Common Lisp code. We define the native library in the same manner, but we use another macro,
use-foreign-library instead this time. This is my preferred way of loading a native library since I always forget the quoting with load-foreign-library!

Jokes aside, we can see another way of executing a function defined in a native library: cffi:foreign-funcall.

This macro has the following syntax:

	(cffi:foreign-funcall <C-function-name> &optional<args with types>* 
		<return-type>)

I tend to prefer foreign-funcall for functions with only side-effects (as in this case), and use defcfun when I need to use the function in the Common Lisp part more than once. YMMV.

The most interesting bit, of course, in the with-foreign-object macro. I won’t bother to show its general syntax, but suffice to say that this macro is used to allocate, set, and use foreign memory (i.e., from the native library) with encapsulation within its body.

In this case, we simply generate a C integer array (not the usage of the type specifier, :int), set the values of the elements of this array using cffi:mem-aref, and read the values of the array using the same accessor function.

Note that value of the var numbersis a pointer type, and also that this is available only within the body of the macro. In the next post, we will see how we can work with custom C-style structs.

Useful basic functions

Top

Here is a summarised list of the functions used in the demos in this blog post.

  • cffi:define-foreign-library
  • cffi:load-foreign-library
  • cffi:close-foreign-library
  • cffi:use-foreign-library
  • cffi:defcfun
  • cffi:foreign-funcall
  • cffi:with-foreign-object
  • cffi:mem-aref

References

Top

Some references that you might find useful on this subject matter:

Interop mini-series – Calling C and C++ code from Common Lisp using CFFI (Part 1)

A highly opinionated review of Java Lambdas

What really is a lambda expression?

A lambda expression is, for all means and purposes, an anonymous function. That really is all there is to it. In languages that support first-class functions, this is yet another feature of the language – functions are on par with other types in the language. However, in language that don’t consider functions first class, it becomes a bit of an esoteric concept.

The origin of the concept is in the Lambda Calculus first propounded by the great Alonzo Church. According to that scheme, functions are basically entities which take some (or no) parameters, and have a body of code that can use those parameters. There is essentially no side-effect in such functions. That means that the function is deterministic – given the same set of parameters, it will always produce the same output. This is, in fact, the very foundation of Functional Programming. In modern times, Functional Programming is often conflated with strongly and statically typed languages. This is clearly wrong. The original Lambda Calculus had really no notion of types! (There is a variant of it though, the typed Lambda Calculus). Most of the languages that support lambda expressions today, however, freely allow plenty of side-effects within lambda expressions. The main takeaway here though is that lambda expressions are conceptually what named functions are made out of.

Lambdas in Java 8 and how they came to be

The biggest features in Java 8 were lambda support and the Stream API. In many ways, lambdas are important only with respect to their heavy use in the stream APIs (as seen in the previous blog on Shell). The key concept to understand when learning lambdas in Java is that lambdas/functions are not first-class objects in Java. Their entire existence is strictly bound to and controlled by interfaces, specifically the concept of SAMs (Single Abstract Method) interfaces – interface which contain only a single abstract method. In my opinion, this severe crippling of lambdas in Java has created more problems than it has solved. Now new programmers who pick up Java and run with it are liable to be very confused when to move on to languages which do support lambdas in a more natural and proper manner. In any case, let’s work with what we’ve got.

So why a Functional Interface? Prior to Java 8, if we wanted to simulate cases where we wanted to pass some functionality into another function, we had to make do with anonymous classes. For instance, to create a new thread, we could have done the following:

jshell> (new Thread (new Runnable () {
   ...>    @Override
   ...>    public void run () {
   ...>     System.out.println("Hello from thread!");
   ...>    }
   ...>  })).start()

jshell> Hello from thread!

We observe that the sole purpose of the anonymous class is to perform some actions, but the cognitive dissonance comes into play when we see that the Thread class constructor experts an instance (basically a data object) of type Runnable. This is exactly the same pattern that was followed by C++ until C++11. In fact, this is what is known (rather pompously, I must add) as a functor.

Here is what the Runnable interface looks like:

public interface Runnable {
   void run();
}

This pattern of the use of a (dummy) interface containing a single method which basically does all the work that an anonymous or named function should have done in the first place, was found to be in such widespread use amongst Java developers that the committee which worked on developing lambda support in Java decided to make it kosher and provide additional support from the Java Runtime. As a result, from Java 8 onwards, wherever a SAM is present, a lambda expression can be used as its target or in its stead. They have been made essentially the same.
For example, the previous example can now be written more succinctly as:

jshell> (new Thread(() -> System.out.println("Hello from thread...again!"))).start()
Hello from thread...again!

An optional annotation, @FunctionalInterface has also been introduced for bookkeeping purposes. In fact, in order to help out developers, a bunch of Functional Interfaces now come bundled with the JDK in the java.util.function package. I would highly recommend exploring them and testing them out to get a feel for them.

Custom Functional Interfaces

We can define our own functional interface in Java 8 (and above). The only restriction for the interface to be a functional interface is, as mentioned before, is that the interface have a single abstract method.

For instance, the standard package (java.util.function) comes with functional interfaces that support single parameter (Function) and double parameter (BiFunction) functions. Let us define a triple parameter function just for this example.

jshell> @FunctionalInterface interface TriFunction<T, U, V, R> {
   ...>     R apply(T t, U u, V v);
   ...> }
|  created interface TriFunction

jshell> int x = 100;
x ==> 100

jshell> int x = 100
x ==> 100

jshell> String y = "Hello"
y ==> "Hello"

jshell> double z = Math.PI
z ==> 3.141592653589793

jshell> TriFunction<Integer, String, Double, String> f = 
          (i, s, d) -> i + s + d;
f ==> $Lambda$6/1318822808@6d7b4f4c

jshell> System.out.println(f.apply(x, y, z))
100Hello3.141592653589793

Features and Limitations of Java Lambdas

So how exactly does a SAM map onto a lambda expression? To understand this better, first we need to get the syntax and semantics of lambda expressions out of the way:

Java’s lambda syntax was clearly influenced by Scala. A basic lambda expression has the following form:

(<param*>) -> [{] <body-form+> [}]

where,

’param’ is a comma-separated list of zero or more parameters with optional types (note that in some cases where Java’s type inference mechanism is unable to infer the type, you will need to specify the type(s) explicitly), the braces are optional in the case of a single line body, but are required when the body spans more than one line. Finally, each body-form is a series of normal Java statements. In the case of multiple statements, each body form is separated by a semi-colon, and a return statement is also required in this case (if the return type is not void).
So a lambda expression that takes a String and returns a String might take on several forms in actual code:

(s) -> s.toUpperCase()

The type signature is not required in this case, and the return statement is not allowed in this case, This would be the recommend usage of a typical lambda expression – don’t declare the types and don’t use any return statement. Of course, this only works for a single-statement (or, more correctly, a single-expression) body.

In case we want to use braces, we need to have the whole expression take the following form:

(String s) -> { return s.toUpperCase() }

So we need to specify the type of the parameter(s) as well as include an explicit return statement. In all cases where the body contains multiple statements, this would be the recommended format for a lambda expression.

Now getting back to how a SAM is mapped onto a lambda expression, whenever the Java Runtime encounters a lambda expression, it can do either of two things depending on the context in which the SAM is used:

  • In case the lambda expression is used along with a Stream API function (such as map, filter, reduce, etc.), the Java Runtime already has enough context about the form of the function that is expected – the parameter types and the return type. For instance, if we are trying to double all the even natural numbers upto 10, we might do:
    jshell> IntStream
             .rangeClosed(1, 10)
             .filter((n) -> n%2 == 0)
             .map((d) -> d*2).forEach(System.out::println)
    4
    8
    12
    16
    20
    

    In this case, the Java Runtime knows that the filter method takes a parameter of the form: Predicate. The Predicate functional interface has a single method – boolean test(Test t). So what the Runtime does is to check that the provided lambda expression matches this signature, and if verified, proceeds to invoke the “test” method implicitly. Similarly for the map function as well.

  • The second case arises in the case where we make use of Functional Interfaces explicitly and then use them as the “target” of a lambda expression. For instance, suppose we want to write a function that takes a String and an Integer and returns their concatenated form as a String, we might have something like:
    jshell> BiFunction<String, Integer, String> f = 
               (s, i) -> s + String.valueOf(i)
    f ==> $Lambda$17/103887628@42f93a98
    
    jshell> f.apply("Hello", 99)
    $21 ==> "Hello99"
    

    In this case as well, the compiler will ensure the the lambda expression matches the type of the declared function variable. Pretty straightforward.

So far so good, but there is a huge problem in the second case above. The problem is that even once the function object has been created, the name of the SAM must be known before we can use it. This is because Java does not have operator overloading (unlike C++). This is why in the current framework, we must know the exact name of each functional interface that we use. The “apply” method used above is the name of the SAM in the BiFunction functional interface. The problem is compounded because each functional interface (even in the standard package) defines its own names. Of course, this is not an insurmountable problem, but the same problem did not exist even in pre-C++-11. For instance, the previous example could have been done so in C++ (using a functor):

// pre C++-11
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

class string_int_string
{
    public:
        std::string operator()(std::string s, int i)
        {
            std::ostringstream oss;
            oss << s << i;
            return oss.str();
        }
};

int main()
{
    std::cout << concatenate("Hello", 99, string_int_string()) << std::endl;
   
    return 0;
}

A bit brittle, but it works. The generic function, “concatenate” is important to note here since it can basically take any functor (or lambda expression from C++-11 onwards), and invokes the function object with the supplied arguments. The same approach is used in the C++ STL generic functions. Now if we look at how the code might look like with C++-11, we get:

// C++-11 and above
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

int main()
{
   std::cout << concatenate("Hello", 99, 
		[](std::string s, int i) {
        			std::ostringstream oss;
        			oss << s << i;
        			return oss.str();
    		}) << std::endl;
    
    return 0;
}

As can be seen, the approach is much cleaner. The difference between the functor-version and the lambda-based one is that in this case, we’ve essentially got rid of the class representing the functor object and inserted its logic inside the lambda expression’s body. So it essentially appears that the lambda expression’ is basically an object that can bind the parameters just as in the case of a regular functor.

As can be seen, even in C++-11, we can write generic functions and all we need to do it invoke it like a function. No messy SAMs there! I personally feel that C++’s lambda support is far superior to that of Java, especially since C++ supports closures. More on that in the next section.

Another disadvantage of Java’s lambda support is that the following is impossible in Java:

#include <iostream>

int main()
{
    int x = 100, y = 100;
    
    std::cout << ([x, y]() { return x + y; })() << std::endl;
    
    return 0;
}

The code above simply uses a lambda expression to capture variables defined in the outer lexical scope (more on that in the next section), but the interesting bit is that the lambda expression can be invoked like a proper function object even without assigning it to a variable.

If we tried the same in Java, we’d get an error:

jshell> int x = 1100
x ==> 1100

jshell> int y = 200
y ==> 200

jshell> (() -> x + y)
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  (() -> x + y)
|   ^---------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  (() -> x + y)
|  ^-----------^

As can be seen from the error message, the Java Runtime complains that “Object” is not a functional interface. Even if we assumed that the runtime would be able to discern the functional interface type from its signature and produce a result, we still get an error:

jshell> ((int a, int b) -> { return a + b; })).apply(x, y)
|  Error:
|  ';' expected
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                       ^
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|   ^---------------------------------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^-----------------------------------^
|  Error:
|  cannot find symbol
|    symbol:   method apply(int,int)
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                         ^---^
|  Error:
|  missing return statement
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^------------------------------------------------^

So no go there. A point down for Java lambdas! More seriously, I find this to be an extremely irritating reminder that Java’s lambdas are not really lambdas. They are more like syntactic sugar for the good old anonymous classes. In fact, there are more serious implications precisely for this reason.

Closures

This is again one of those concepts that are notoriously badly explained. A lot of newbies to programming are often scared to death and put-off from learning more about Functional Programming due to unnecessary FUD on the part of many “experts” in the field. So, let’s try and explain this as clearly as possible:

In Set Theory, a set is defined to be “closed” under an operation if applying the operation to members of the set produces a result that belongs to the same set. For instance, if the set under consideration is the set of Natural Numbers (N) and the operation is + (addition), we can say Natural numbers are closed under addition. Why? The reason is quite simple and follows straight from the definition – adding any two natural numbers (or indeed any number of numbers, but we’re considering the strict binary operation here) always produces a Natural number. On the other hand, N is not closed under – (subtraction). This is because subtracting some Natural number from another Natural number might produce 0 or some negative number, which is clearly not a member of N. So much for mathematics.

In Psychology, “closure” refers to the strict need of an individual to find a definitive answer to a problem.

In business, “closure” refers to the process by which a business closes down.

You see what I’m getting at? The term “closure” is highly overloaded, and even within mathematics, the term has different meanings in different branches. So my point is this – simply forget about the name and focus on the concept.

In Computer Science, a closure is intricately tied to the concept of scoping, specifically lexical scoping. This is why closures are often referred to as “lexical closures”. In order to understand closures properly, we must clearly understand what lexical scoping entails.

Lexical scoping is intimately tied with the rules defining the lifetimes (and visibility) of variables. Dynamic scoping, in general, refers to a situation where a variable has effectively global visibility and lifetime. Pure lexical scoping, on the other hand, ensures that the visibility of variables is limited to the current lexical block (say a function or a local block), or to nested blocks. However, lexically scoped variables are not visible to outer blocks, and variables defined in inner blocks will effectively “shadow” those defined in the outer scope. If no new variables with the same name are defined in the inner block, references to variables will always refer to those in the outer scope. This behaviour forms the basis of what is known as “variable capture”.

A variable is said to be captured by a lambda function if the lambda function refers to an outer-scope variable during its time of creation. The lambda function is said to “close over” those variables, and this is the reason why this feature is called a “closure”. So what does this variable capture actually implicate in the grand scheme of things? What it implicates is this – when a lambda function captures a variable in its outer scope, the lifetime of the variable is effectively changed. Under normal circumstances, local variables die when the function is exited. In this case, however, since the lambda function has captured the variable, the variable will not die even when the function in which it was defined dies!

In this respect, Java behaves absolutely horribly. Java has extremely weird scoping rules. In some ways, it does use lexical scoping. In most respects, not:

jshell> void demoScopingInJava() {
   ...>          int x = 100;
   ...>          
   ...>          System.out.format("Function scope x = %d\n", x);
   ...>         {
   ...>             System.out.format("Function scope x (before shadowing) = %d\n", x);
   ...>             /// int x = 999 is not allowed!
   ...>             x = 999;
   ...>             System.out.format("Function scope x (after shadowing) = %d\n", x);
   ...>         }
   ...>          System.out.format("Function scope (again) x = %d\n", x);
   ...>      }
|  created method demoScopingInJava()

jshell> demoScopingInJava()
Function scope x = 100
Function scope x (before shadowing) = 100
Function scope x (after shadowing) = 999
Function scope (again) x = 999

Java does not allow any shadowing because we cannot define any new variables inside the block. Instead, all references to the variable ‘x’ are actually to the function scope variable. In this case, we are able to mutate ‘x’ from 100 to 999, but this is because the inner block is within the outer function block and the Java Runtime can therefore ensure that this variable is freed before the function exits. However, this is not allowed when are in a situation where the variable could be referenced even after the local function where it was declared exits.

For instance, if we want to implement a function that prints line numbers in an increasing order every time it is called, we might try to do something like this in Java:

jshell> Function<Void, Void> lineNumberGenerator() {
   ...>      int lineNumber = 0;
   ...>      return (n) -> 
                 { lineNumber++; 
                   System.out.format("Line number: %d\n", lineNumber); 
                   return null; };
   ...> }
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                       ^--------^
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                                                                            ^--------^

We can see, though, that modifying a variable defined in the outer scope is not allowed in the case here the code escapes the local scope. As can be clearly seen in the error messages, the variable lineNum must be declared “final” for the code to even compile (and of course, then it would fail again unless we removed the mutating statement inside the lambda function).
This is the reason why we cannot implement closures in Java – Java’s bizarre downward-percolating forced visibility of variables.

And, oh, just in case you thought this applies only to lambda blocks, it’s always been the case:

jshell> void scopingRulesTest () {
   ...>    int x = 100;
   ...>    
   ...>    (new Thread(new Runnable () {
   ...>        @Override
   ...>        public void run() {
   ...>           x++;
   ...>          System.out.println(x);
   ...>        }
   ...>      })).start();
   ...> }
|  Error:
|  local variables referenced from an inner class must be final or effectively final
|            x++;
|            ^

The same example in C++ works as expected (including modification of the outer scope’s variable):

#include <iostream>

using namespace std;

int main()
{
    ios_base::sync_with_stdio(false);
    
    int x = 100;
    cout << "Function scope x = " << x << endl;
    {
        x = 101;
        cout << "Block scope x = " << x << endl;
        int x = 999;
        cout << "Block scope x = " << x << endl;
    }
    
    cout << "Function scope x = " << x << endl;
    
    return 0;
}

sh-4.3$ main                                                                                                                                                              
Function scope x = 100                                                                                                                                                    
Block scope x = 101                                                                                                                                                       
Block scope x = 999                                                                                                                                                       
Function scope x = 101   

And to complete the line number closure demo (line number example):

// C++-11 (and above)
#include <iostream>
#include <functional>

using namespace std;

function<void()> line_number_generator()
{
    int line_num = 0;
    
    return [line_num]() mutable
            {
                line_num++;
                cout << "Line number: " << line_num << endl; 
            };
}

int main()
{
    ios_base::sync_with_stdio(false);
    
    function<void()> print_line_numbers = line_number_generator();
    
    for (int i = 0; i < 5; i++) {
        print_line_numbers();    
    }
}

sh-4.3$ g++ -std=c++11 -o main *.cpp                                                                                                                                      
sh-4.3$ main                                                                                                                                                              
Line number: 1                                                                                                                                                            
Line number: 2                                                                                                                                                            
Line number: 3                                                                                                                                                            
Line number: 4                                                                                                                                                            
Line number: 5  

Note that default variable capture is read-only in C++. However, the “mutable” keyword can be used to change that behaviour. In all respects, C++11 supports closures while Java cannot!

The Common Lisp version is pretty much identical in behaviour to the C++ one. In the case of Common Lisp, however, we have the extra implication that any references to outer-scope variable always capture the mentioned variable (unless a local variable with the same name is already defined). This is seen in the Common Lisp version of the same example:

CL-USER> (defun foo ()
           (let ((x 100))
             (format t "Function scope x = ~d~%" x)
             (progn
               (setf x 101)
               (let ((x 999))
                 (format t "Inner block x = ~d~%" x)))
             (format t "Function scope (again) x = ~d~%" x)))
STYLE-WARNING: redefining COMMON-LISP-USER::FOO in DEFUN
FOO
CL-USER> (foo)
Function scope x = 100
Inner block x = 999
Function scope (again) x = 101
NIL

This effectively ensures that a nested function that refer to the outer scope var(s), and which is then returned from the function is always a closure as can be seen from the following example (same example as the C++ one):

CL-USER> (defun line-number-generator ()
                      (let ((line-number 0))
                        #'(lambda ()
                             (incf line-number)
                            (format t "Line number: ~d~%" line-number))))
LINE-NUMBER-GENERATOR

CL-USER> (defvar print-line-numbers (line-number-generator))
PRINT-LINE-NUMBERS

CL-USER> print-line-numbers
#<CLOSURE (LAMBDA () :IN LINE-NUMBER-GENERATOR) {100439133B}>

CL-USER> (dotimes (i 5)
           (funcall print-line-numbers))
Line number: 1
Line number: 2
Line number: 3
Line number: 4
Line number: 5
NIL

Conclusion

Well, that about wraps up this rather long blog post! As can be seen from this post (as well as the JShell post and more posts to come, especially on Java Streams), lambda support in Java is an extremely welcome and necessary feature. However, in many ways, it’s a very crippled version of lambda support found in other languages, especially with regards to how closures are not supported in Java.Thankfully, most code that uses lambda expressions will be code that uses the Streams API, and as such most (if not all) the wartiness of Java’s lambdas will be effectively secreted within map, filter, reduce or some other mechanism in the Streams API.

Note: All the Java code posted in this blogpost was executed on JShell. For use in a regular Java environment, ensure to add semi-colons wherever appropriate.

A highly opinionated review of Java Lambdas

Nostalgia…..and a long awaited comeback

Got bored, wanted to test my memory (and latent C skills):

#include 

typedef struct node {
        char* data;
        struct node* next;
}* NODE;

int main()
{

        NODE root = NULL;
        NODE tail = NULL;

        root = (NODE) malloc(sizeof(struct node));
        tail = (NODE) malloc(sizeof(struct node));

        root->data = "hello";
        tail->data = "world";

        root->next= tail;
        tail->next = NULL;

        // display the contents of the linked list
        NODE cur = root;
        for(;cur != NULL; cur= cur->next)
        {
                printf("%s ", cur->data);
        }

        free(root);
        free(tail);

        printf("\n");

        return 0;
}

And it worked. First time around. Nice! 😉

Nostalgia…..and a long awaited comeback