A PigLatin translator in Common Lisp (and contrasting it with the Racket version)

As my curiosity had been piqued by the PigLatin exercise in Racket, I decided to basically translate that code into Common Lisp. The result was interesting to say the least.

Since the explanation of the program logic has already been done in the Racket version, I will skip that part, and only highlight the relevant differences.

First off, the code:

(defpackage #:piglatin
  (:use :cl :cl-user))


(in-package #:piglatin)


(defun translate (sentence)
  (let ((*readtable* (copy-readtable nil)))
    (setf (readtable-case *readtable*) :preserve)
    (mapcar #'make-symbol
            (mapcar #'english-to-piglatin
                    (mapcar #'string-to-list
                            (mapcar #'symbol-name sentence))))))


(defun english-to-piglatin (word)
  (if (starts-vowel-p word)
      (word-to-vowel-rule word)
      (word-to-consonant-rule word)))


(defun starts-vowel-p (word)
  (member (char-downcase (car word)) '(#\a #\e #\i #\o #\u #\y)))


(defun word-to-vowel-rule (word)
  (coerce (append word '(#\w #\a #\y)) 'string))


(defun word-to-consonant-rule (word)
  (let ((was-capital-p (upper-case-p (car word))))
    (labels ((f (word)
               (if (starts-vowel-p word)
                   (cond (was-capital-p (string-capitalize
                                          (coerce (append word '(#\a #\y)) 'string)))
                         (t (coerce (append word '(#\a #\y)) 'string)))
                   (f (append (cdr word) (list (char-downcase (car word))))))))
      (f word))))


(defun string-to-list (word)
  (coerce word 'list))

And a similar test run:

CL-USER> (load "/Users/z0ltan/Rabota/ProgrammingLanguages/CommonLisp/pig-latin.lisp")
T
CL-USER> (piglatin::translate '(|Hello| |world| |we| |meet| |again|))
(#:|Ellohay| #:|orldway| #:|eway| #:|eetmay| #:|againway|)
CL-USER> (piglatin::translate '(|Cucullus| |non| |facit| |monachum|))
(#:|Uculluscay| #:|onnay| #:|acitfay| #:|onachummay|)

This looks much dirtier than the Racket version’s output, but there is good reason for that.

Some notes:

The first thing you’ll observe is the strange input format for the Common Lisp version. What’s the deal with all the pipes in the input? Well, my understanding is this – the default manner in which the Common Lisp reader reads in symbols is to convert the symbol internally to upper-case. This can easily be verified:

CL-USER> (readtable-case *readtable*)
:UPCASE
CL-USER> (symbol-name 'hello)
"HELLO"
CL-USER> (symbol-name 'Hello)
"HELLO"
CL-USER> (symbol-name 'hElLo)
"HELLO"

The :UPCASE in the first output shows that the default behaviour is to convert the symbol to all upper-case. The other possible values for this are: :preserve, :downcase, and :invert. So how do we get around this problem? The easiest approach in this case is to change the reader behaviour to preserve the case of the input symbol (note that this will only work if the input symbol is ensconced within pipes) using the :preserve keyword. That is exactly what we’re doing in the translate function:

(defun translate (sentence)
  (let ((*readtable* (copy-readtable nil)))
    (setf (readtable-case *readtable*) :preserve)
    (mapcar #'make-symbol
            (mapcar #'english-to-piglatin
                    (mapcar #'string-to-list
                            (mapcar #'symbol-name sentence))))))

Remember that dynamic variables have special meaning in Common Lisp. Here, we simply create a copy of the read-table (read up on this if you are unaware of this concept) and bind it to *readtable*. We have to be really careful when working with special dynamic variables. What we are doing here is simply overwriting the actual read-table with our modified version for the scope of the let expression. This avoids poisoning the read-table globally.

So we simply set the read-table mode to :preserve in the line: (setf (readtable-case *readtable*) :preserve), and then we can proceed almost exactly as in the Racket version. We have a top-down approach in which we convert the symbol list into a string list, convert that list into a list of lists of chars (Common Lisp also shares Racket’s behaviour in the sense that a string is not automatically a list of chars), map our english-to-piglatin function over that list, and finally convert the whole list of processed strings into a list of symbols as the output.

Another point of interest is that unlike Racket, Common Lisp doesn’t have built-in convenience functions such as string->list or list->string. Instead, we have a useful (if a bit quirky) and powerful function called coerce. So in the following code, we are simply coercing a string into a list of characters:

(defun string-to-list (word)
  (coerce word 'list))

The rest of the code is pretty much in the same vein, with an interesting twist. As mentioned earlier, the read-table case-munging only works if we escape the symbol within pipes (such as |Hello|), and so the following input screws up the capitalisation entirely, as can be seen in the output:

CL-USER> (piglatin::translate '(hello world))
(#:|Ellohay| #:|Orldway|)

CL-USER> (piglatin::translate '(Hello WORLd))
(#:|Ellohay| #:|Orldway|)

As you can observe, when we use symbols without escaping them with pipes, the reader automatically converts the to upper-case anyway, so the output remains the same irrespective of the case of the input symbol.

This is quite in line with the overall quirkiness of Common Lisp. In that sense, I feel that Racket (and Scheme, by extension) is a much more functional and consistent language, easier to work with. However, Common Lisp did, and always will hold a soft spot in my heart, especially as it offers me countless more opportunities to explore and learn this magnificent beast!

Hoyte’s Line Number closure in a few choice languages

Douglas Hoyte, in his excellent (if a bit fanatical) book, “Let over Lambda” gives a simple and pithy example to demonstrate the concept of closures – little anonymous functions that can capture variables in the environment in which the closure was created.

The example is very simple – create a mini-logging facility by capturing a variable representing the current line number (initially set to 0), incrementing it for every invocation of the closure, and printing it out.

An implementation in Common Lisp might look so –

(defun get-line-logger ()
  (let ((line-num 0))
    #'(lambda (id)
        (incf line-num)
        (format t "[~a] Line number: ~d~%" id line-num))))


(defun logger-demo ()
  (let ((logger-1 (get-line-logger))
        (logger-2 (get-line-logger)))
    (flet ((f (id logger)
             (dotimes (i 5)
               (funcall logger id))
             (terpri)))
      (f "logger-1" logger-1)
      (f "logger-2" logger-2))))

Sample run:

CL-USER> (logger-demo)
[logger-1] Line number: 1
[logger-1] Line number: 2
[logger-1] Line number: 3
[logger-1] Line number: 4
[logger-1] Line number: 5

[logger-2] Line number: 1
[logger-2] Line number: 2
[logger-2] Line number: 3
[logger-2] Line number: 4
[logger-2] Line number: 5

NIL

As expected, we can not only capture local variables in the lexical environment during the time of the closure’s creation, but also modify them independently of any other instances of the closure. This is what makes is a proper closure. Also note that the capture is automatically done (whether we actually use the variables or not is irrelevant).

The Racket version is, unsurprisingly, almost identical not only in syntax, but also semantics:

#lang racket

(define (get-line-logger)
  (let ([line-no 0])
    (lambda (id)
      (set! line-no (+ 1 line-no))
      (fprintf (current-output-port)"[~a] Line number: ~a~%" id line-no))))

(define (logger-demo)
  (let ([logger-1 (get-line-logger)]
        [logger-2 (get-line-logger)])
    (letrec ([f (lambda (id logger)
                  (do
                      ((i 0 (+ i 1)))
                      ((= i 5))
                    (logger id))
                  (newline))])
      (f "logger1" logger-1)
      (f "logger2" logger-2))))

And the behaviour is exactly the same:

hoyte-closure.rkt> (logger-demo)
[logger1] Line number: 1
[logger1] Line number: 2
[logger1] Line number: 3
[logger1] Line number: 4
[logger1] Line number: 5

[logger2] Line number: 1
[logger2] Line number: 2
[logger2] Line number: 3
[logger2] Line number: 4
[logger2] Line number: 5

Racket is, in some ways, more elegant than even Common Lisp. I especially love the part where lambdas don’t need any funcallS or applyS to make them run (unlike in Common Lisp). Still, pretty much a branch off the same family tree.

Moving on, let’s try the same in Java, shall we?

import java.util.function.Function;

public class HoyteClosure {
    private static Function<String, Void> getLineLogger() {
        int lineNum = 0;

        return (String id) -> { lineNum++; System.out.printf("[%s] Line number: %d\n",
                                                                 id, lineNum); return null; };
    }

    private static void invokeLogger(String id, Function<String, Void> logger) {
        for (int i = 0; i < 5; i++) {
            logger.apply(id);
        }
        System.out.println();
    }

    public static void main(String[] args) {
        Function<String, Void> logger1 = getLineLogger();
        Function<String, Void> logger2 = getLineLogger();

        invokeLogger("logger1", logger1);
        invokeLogger("logger2", logger2);
    }
}

Okay, looks good. However, if we try to run it, we run into issues immediately:

Timmys-MacBook-Pro:Java8 z0ltan$ javac HoyteClosure.java 
Timmys-MacBook-Pro:Java8 z0ltan$ javac HoyteClosure.java 
HoyteClosure.java:7: error: local variables referenced from a lambda expression must be final or effectively final
        return (String id) -> { lineNum++; System.out.printf("[%s] Line number: %d\n",
                                ^
HoyteClosure.java:8: error: local variables referenced from a lambda expression must be final or effectively final
                                                                 id, lineNum); return null; };
                                                                     ^
2 errors

The problem is that Java really does not have real closures. The lambda support in Java 8 is just syntactic sugar for the good old anonymous class which can read local variables in the environment, but cannot modify them. So what can we do?

To make Java happy, we can create a new object for every logger instance i.e., use instance variables in lieu of local variables so that the modification of local variables is not an issue any more:

import java.util.function.Function;

public class HoyteClosureModified {
    static class Closure {
        private int lineNum;
        
        private  Function<String, Void> getLineLogger() {
            return (String id) -> { lineNum++; System.out.printf("[%s] Line number: %d\n",
                                                                 id, lineNum); return null; };
        }
    }

    private static void invokeLogger(String id, Function<String, Void> logger) {
        for (int i = 0; i < 5; i++) {
            logger.apply(id);
        }
        System.out.println();
    }

    public static void main(String[] args) {
        Function<String, Void> logger1 = new Closure().getLineLogger();
        Function<String, Void> logger2 = new Closure().getLineLogger();

        invokeLogger("logger1", logger1);
        invokeLogger("logger2", logger2);
    }
}

Taking it for a test spin, we get:

Timmys-MacBook-Pro:Java8 z0ltan$ javac HoyteClosureModified.java
Timmys-MacBook-Pro:Java8 z0ltan$ java -cp . HoyteClosureModified
[logger1] Line number: 1
[logger1] Line number: 2
[logger1] Line number: 3
[logger1] Line number: 4
[logger1] Line number: 5

[logger2] Line number: 1
[logger2] Line number: 2
[logger2] Line number: 3
[logger2] Line number: 4
[logger2] Line number: 5

It’s not the same though, is it? The whole point of using a closure was so that we wouldn’t have to do this explicit management of state ourselves. As such, Java doesn’t really have full-blown closures, just a poor man’
s version of it. Better luck next time, Java.

Finally, the same using Ruby. As I have said before, Ruby feels remarkably like a Lisp despite the syntactic differences.

module HoyteClosure 
    class Demo
        def self.get_line_logger
            line_no = 0

            lambda do |id|
                line_no += 1
                puts "[" + id + "] Line number: " + line_no.to_s
            end
        end

        def self.logger_demo(id, logger)
            5.times {
                logger.call(id)
            }
            puts ""
        end

        def self.main
            logger1 = get_line_logger
            logger2 = get_line_logger

            logger_demo("logger1", logger1)
            logger_demo("logger2", logger2)
        end
    end
end

And the final test run:

irb(main):009:0> load "./hoyte_closure.rb"
load "./hoyte_closure.rb"
=> true
irb(main):010:0> HoyteClosure::Demo.main
HoyteClosure::Demo.main
[logger1] Line number: 1
[logger1] Line number: 2
[logger1] Line number: 3
[logger1] Line number: 4
[logger1] Line number: 5

[logger2] Line number: 1
[logger2] Line number: 2
[logger2] Line number: 3
[logger2] Line number: 4
[logger2] Line number: 5

=> nil

Short, non-idiomatic, and sweet!

Solutions to Chapter 2 exercises, ANSI Common Lisp (2nd Edition)

Problem 1

;; (+ (- 5 1) (+ 3 7)) => (+ 4 (+ 3 7)) => (+ 4 10) => 14

;; (list 1 (+ 2 3)) => (list 1 5) => (1 5)

;; (if (listp 1) (+ 1 2) (+ 3 4)) => (if nil (+ 1 2) (+ 3 4)) => (+ 3 4) => 7

;; (list (and (listp 3) t) (+ 1 2)) => (list (and nil t) (+ 1 2)) => (list nil (+ 1 2))
;; => (list nil 3) => (nil 3)

Problem 2

;; Expression 1

(cons 'a (cons 'b (cons 'c nil)))

;; Expression 2

(cons 'a '(b c))

;; Expression 3

(cons 'a (cons 'b '(c)))

Problem 3

(defun my-fourth (lst)
  (car (cdr (cdr (cdr lst)))))

(defun my-fourth% (lst)
  (cadddr lst))

Problem 4

(defun greater-of-two (x y)
  (if (> x y)
      x
      y))

Problem 5

;; enigma checks if a non-empty list contains at least one ‘nil’ element.

;; mystery returns the zero-based index of the first parameter in the second parameter (which is a
;; list) otherwise nil.

Problem 6

;; car
;; or
;; apply (funcall would return (1 nil))

Problem 7

(defun any-element-is-list (lst)
  (if (null lst)
      nil
      (or (listp (car lst)) (any-element-is-list (cdr lst)))))

(defun any-element-is-list% (lst)
  (let ((res nil))
    (dolist (obj lst)
      (setf res (or res (listp obj))))
    res))

Problem 8

(defun print-dots-recursive (n)
  (if (= n 0)
      (format t "~%")
      (progn
        (format t ".")
        (print-dots-recursive (- n 1)))))

(defun print-dots-iterative (n)
  (do
   ((i 0 (+ i 1)))
   ((= i n) (format t "~%"))
    (format t ".")))


(defun count-a-recursive (lst)
  (if (null lst)
      0
      (if (eql 'a (car lst))
          (+ 1 (count-a (cdr lst)))
          (count-a (cdr lst)))))

(defun count-a-iterative (lst)
  (let ((count 0))
    (dolist (obj lst)
      (if (eql 'a obj)
          (setf count (+ 1 count))))
    count))

Problem 9

;; The problem with the iterative summit is that the “remove” function returns a new list with
;; the specified element removed, but does not touch the original list (no side-effects).

(defun summit (lst)
  (setf lst (remove nil lst))
  (apply #'+ lst))

;; The problem with the recursive version of summit is that there is no stopping condition! This
;; means that summit keeps looping forever even after the list has become empty.

(defun summit (lst)
  (if (null lst)
      0
      (let ((x (car lst)))
        (if (null x)
            (summit (cdr lst))
            (+ x (summit (cdr lst)))))))

;; cleaner version

(defun summit (lst)
  (if (null lst)
      0
      (if (null (car lst))
          (summit (cdr lst))
          (+ (car lst) (summit (cdr lst))))))

Interop mini-series – Calling C and C++ Callbacks from Common Lisp (Part 2c)

This post picks up on the first part of this interop mini-series (Calling C and C++ from Common Lisp). I recommend checking out that post first in order to make sense of this one.

Contents

  1. Intent
  2. Demo
  3. Useful functions
  4. Conclusion

Intent

The scope of this post is to cover interop with C and C++ code from Common Lisp using callbacks. In case you are not sure about what callbacks are, please check the first part of this post out – Callbacks special.

We will continue to use the cffi library for our demo here as well.

Demo

Top

For this demo, let’s pick a very simple example.

We have a person type which has the following slots/fields – name, gender, and age. From our Common Lisp code, we want to instantiate an instance of person, and then use a function in a native library, prefix_name to append either “Mr.” or “Miss” in front of the person’s name, depending on the value of the gender slot (0 for female, anything else for male).

First we define the interface for the native library (in callback_demo.h:

#ifndef __CALLBACK_DEMO_H__
#define __CALLBACK_DEMO_H__ "callback_demo.h"

typedef struct person {
    char* name;
    int gender;
    int age;
} person;

#ifdef __cplusplus
extern "C" {
#endif
    void prefix_name(person*, void (*)(person*));
#ifdef __cplusplus
}
#endif
#endif

We then write the code containing the prefix_name function that will invoke our callback function (in callback_demo.c:

#include 
#include 
#include 
#include "callback_demo.h"

#define MAXSIZE 50

char* concatenate_names(const char* prefix, char* name)
{
    int len = strlen(prefix) + strlen(name) + 1;

    char* full_name = (char*)malloc(len * sizeof(char));

    if (full_name != NULL) {
        char* cp = full_name;

        while (*prefix != '\0')
            *cp++ = *prefix++;

        *cp++ = 0x20;

        while (*name != '\0')
            *cp++ = *name++;
   
         *cp = '\0';

        return full_name;
    }
   return name;
}
   

void prefix_name(person* p, void (*cb)(person*))
{
    const char* MISTER = "Mr.";
    const char* MISS = "Ms.";
    char* res = NULL;

    // 0 - female, anything else male
    res = p->gender == 0 ? concatenate_names(MISS, p->name) :
                           concatenate_names(MISTER, p->name);
    strcpy(p->name, res);
    
    (*cb)(p);
}

void sample_callback(person* p)
{
    printf("%s, %s, %d\n", p->name, p->gender == 0 ? "Female" : "Male", p->age);
}

int main()
{
    person rich;

    rich.name = (char*)malloc(MAXSIZE * sizeof(char));
    strcpy(rich.name, "Rich");
    rich.gender = 1;
    rich.age = 49;

    prefix_name(&rich, &sample_callback);
    
    return 0;
}

Explanation: The code is relatively straightforward. As can be seen from the header file, prefix_name is the entry point to the library (and the one which gets invoked from the Common Lisp code).

The prefix_name function takes an instance of the person structure as well as a callback function. Note the signature of the callback function:

void (*)(person*).

This callback function expects to be passed a modified instance of the person instance that is the first parameter of the prefix_name function.

The logic is very simple – simply check for the gender field, and then depending on whether it is 0 or some other way, update the name field of the person instance by prepending “Miss” or “Mr.” respectively.

Finally, the callback function cb is invoked, passing control back to the client code.

All right, now we compile the code into a library, libcallbackdemo.dylib:

Timmys-MacBook-Pro:Demo z0ltan$ clang -dynamiclib -o libcallbackdemo.dylib callback_demo.c

Timmys-MacBook-Pro:Demo z0ltan$ ls
callback_demo.c		callback_demo.h		libcallbackdemo.dylib

Excellent!

Now we focus on the Common Lisp bit. This part is relatively straight forward. Let’s see the code in action first, and then a bit of explanation.

First the code that calls the native library function, prefix_name (in c-to-lisp.lisp):

;;;; Demonstrating how Common Lisp can invoke functions in C or C++ code, which then themselves invoke a callback function written in Common Lisp.
;;;; This helps in those cases when Common Lisp needs to make use of some 
;;;; functionality present in a native library which is written using callbacks.

(require 'cffi)

(defpackage :c-to-lisp-user
  (:use :cl :cffi))

(in-package :c-to-lisp-user)


;;; Callback demo - first define the foreign library
;;; containing the function which takes a callback function.

(define-foreign-library libcallbackdemo
  (:darwin "libcallbackdemo.dylib")
  (:unix "libcallbackdemo.so")
  (t (:default "libcallbackdemo.dylib")))

(use-foreign-library libcallbackdemo)

;;; define Common Lisp equivalent of the C structure
(defcstruct person
  (name :string)
  (gender :int)
  (age :int))


;;; define the implementation of the callback
(defcallback print-prefixed-person :void
    ((ptr (:pointer (:struct person))))
  (with-foreign-slots ((name gender age) ptr (:struct person))
    (format t "Name: ~a, Gender: ~a, Age: ~d~%"
            name
            (if (zerop gender) "Female" "male")
            age)))


;;; invoke the callback in the C library with a new instance of
;;; a person object
(defun test-callback ()
  (with-foreign-object (rich '(:struct person))
    (setf (foreign-slot-value rich '(:struct person) 'name) "Rich"
          (foreign-slot-value rich '(:struct person) 'gender) 1
          (foreign-slot-value rich '(:struct person) 'age) 49)
    (foreign-funcall "prefix_name"
                     :pointer rich
                     :pointer (callback print-prefixed-person)
                     :void))
  (with-foreign-object (vigdis '(:struct person))
    (setf (foreign-slot-value vigdis '(:struct person) 'name) "Vigdis"
          (foreign-slot-value vigdis '(:struct person) 'gender) 0
          (foreign-slot-value vigdis '(:struct person) 'age) 28)
    (foreign-funcall "prefix_name"
                     :pointer vigdis
                     :pointer (callback print-prefixed-person)
                     :void)))

;;; unload the foreign library
(close-foreign-library 'libcallbackdemo)

And the output:

C-TO-LISP-USER> (test-callback)
Name: Mr. Rich, Gender: male, Age: 49
Name: Ms. Vigdis, Gender: Female, Age: 28
; No value

Explanation: This code is also quite simple. We begin by defining the native library, and then loading it.

Next, we define the callback function using the cffi:defcallback macro. The defined callback function, print-prefixed-person uses a pointer to a person instance (which is returned by the prefix_name function inside libcallbackdemo.dylib), and so need to define the person structure first.

For that, we use another macro, cffi:defcstruct. As you can see, there is simply an exact representation of the structure defined in callback_demo.h albeit in a Lispy manner.

cffi:with-foreign-slots is a very important macro that destructures its pointer argument into the supplied slots. Note that the slot names must be the same as that provided in the person structure defined in the Common Lisp code. Note the use of cffi:foreign-slot-value instead of cffi:mem-aref as in the previous post. The rule of thumb is this – use cffi:foreign-slot-value when accessing slots, and use cffi:mem-aref when accessing atomic types.

Finally, we actually invoke the prefix_name function from test_callback. We create two instances of the person structure, and then we pass the callback function in the foreign-funcall invocation using the macro cffi:callback.

cffi:callback simply returns a pointer which is what the prefix_name function in libcallbackdemo.dylib requires. The cycle is complete!

As we can see from the output, the names are prepended with the correct suffix.

Basic useful functions

Top

Here is the summarised list of the additional functions that were used in the demo:

  • cffi:defcstruct
  • cffi:defcallback
  • cffi:with-foreign-slots
  • cffi:foreign-slot-value
  • cffi:callback

Conclusion

Top

The cffi library is a very powerful and well-designed library for dealing with native libraries. It is also quite vast, and I would most definitely recommend browsing through the official manual for further examples, and also for usage patterns for your specific needs.

Next up, I will demonstrate interop between C (and C++) and Java using the JNA library, which is far superior to the alternative of using pure JNI. That will be also be in two parts.

Interop mini-series – Calling C and C++ code from Common Lisp using CFFI (Part 1)

Starting with this post, the new few posts will dig into interoperability between various languages. The next couple of posts will cover C and C++ code from Common Lisp, and how to write callback functions in Common Lisp that plug into code residing in a shared library. This will make use of the cffi library.

Then the following two posts will cover the analogue in Java-world. To this end, we will make use of the JNA project to indicate interop between C/C++ and Java.

Finally, this series will (hopefully) conclude with a mini-project of sorts – a completely embedded JVM instance inside a Common Lisp image! A number of demos will illustrate different uses of embedding Java within Common Lisp. This is a bit of an undertaking though, and will definitely take some time to implement least of all due to the fact that I want to extract the maximum amount of learning from this activity!

Contents

  1. Introducing the cffi library
  2. Demos
    1. Interop with C
    2. Interop with C++
  3. Summary of useful functions
  4. References

Setup used for this tutorial series

In order to keep things sane, I will be sticking to a single platform (unless otherwise noted) with the following configuration for this whole mini-series:

  • Mac OS X El Capitan system
  • 8 cores, 16GB RAM, 1600MHz processor
  • SBCL as my Common Lisp implementation
  • JDK 9
  • Apple LLVM 6.1.0 (with Clang as the frontend) as my C and C++ compiler

Note that even though the compiler used is LLVM, the behaviour is more or less the same as that of standard gcc/g++. The same flags also work for compilation, and the only difference vis-a-vis this tutorial will be how the shared (dynamic) library is created.

Introduction to the CFFI library

Let’s install the cffi library using QuickLisp (if you haven’t done so already) first:

LISP-TO-C-USER> (ql:quickload :cffi)
To load "cffi":
  Load 4 ASDF systems:
    alexandria babel trivial-features uiop
  Install 1 Quicklisp release:
    cffi
; Fetching #
; 234.48KB
==================================================
240,107 bytes in 0.33 seconds (712.70KB/sec)
; Loading "cffi"
[package cffi-sys]................................
[package cffi]....................................
..................................................
[package cffi-features]
(:CFFI)

Now, let’s talk a bit about this library and the features that it provides. Links to the download site and manual are provided in the “References” section.

The cffi library is a cross-platform (across Common Lisp implementations that is) library that supports interop with C (and with C++, but we’ll talk more about that later). What this means is that you can load a native library (dylib, so file, DLL, etc.) and use the functions defined therein within your Lisp code.

The interop is two-ways – the general case is that you want to invoke C functions from Lisp code, or you may want to invoke a function in the library that expects a callback, and you can define this callback as pure Common Lisp code! Nifty, isn’t it?

The library is very well-designed and personally I find the Lispy nature of the APIs (and generated functions) an extra bonus.

The best way to learn the library is to see it in action, so let’s get on with it!

Note: Platform support for different features varies according to the quirks of the specific Common Lisp implementation. Refer to the cffi documentation for specifics).

Demos

Top

These demos are aimed to be simple and small, and yet somewhat useful in terms of real-world applicability.

I personally feel that purely contrived demos are best avoided since they hardly teach anything well, most of all for the reason that they are extremely boring!

In the first demo, we will see how a C library may be loaded and run from Common Lisp. This will be the most common use case.

In the second demo, we will do the same, but for a C++ library with a C wrapper around the C++ functionality.

Since both demos are defined in the same package, let’s define the package first:

(require 'cffi)

(defpackage :lisp-to-c-user
  (:use :cl :cffi))

(in-package :lisp-to-c-user)

Interop with C

Top

For the C example, I decided to use use the native library to get some useful system information – architecture type, model name, memory, number of cpus, and the number of logical cpus (cores).

Note that this example works only in Mac OS X. For Linux, the sysctlbyname function can be replaced by sysctl with appropriate changes. For Windows, you will have to check which kernel call provides the same functionality.

We will use the sysctlbyname function to extract these values.

Let’s define the header file first (in system_info.h):

#ifndef __SYSTEM_INFO_H__
#define __SYSTEM_INFO_H__ "system_info.h"

#include 
#include 

#ifdef __cplusplus
extern "C" {
#endif

char* get_machine();
char* get_model();
int64_t get_memory();
int32_t get_ncpu();
int32_t get_nlogicalcpu();

#ifdef __cplusplus
}
#endif
#endif

And the corresponding C implementation (in system_info.c:

#include 
#include "system_info.h"

#define MAXSIZE 210

char* get_machine()
{
    static char machine[MAXSIZE];
    size_t len = sizeof(machine);

    sysctlbyname("hw.machine", &machine, &len, NULL, 0);

    return machine;
}

char* get_model()
{
    static char model[MAXSIZE];
    size_t len = sizeof(model);

    sysctlbyname("hw.model", &model, &len, NULL, 0);

    return model;
}

int64_t get_memory()
{
    int64_t mem;
    size_t len = sizeof(mem);

    sysctlbyname("hw.memsize", &mem, &len, NULL, 0);

    return mem;
}

int32_t get_ncpu()
{
    int32_t cpu;
    size_t len = sizeof(cpu);

    sysctlbyname("hw.ncpu", &cpu, &len, NULL, 0);

    return cpu;
}


int32_t get_nlogicalcpu()
{
    int32_t logical_cpu;
    size_t len = sizeof(logical_cpu);

    sysctlbyname("hw.logicalcpu", &logical_cpu, &len, NULL, 0);

    return logical_cpu;
}

int main(int argc, char* argv[])
{
    printf("%s, %s, %lld, %d, %d\n", 
            get_machine(),
            get_model(),
            get_memory(),
            get_ncpu(),
            get_nlogicalcpu());

    return 0;
}

Let’s compile it into a shared library (in this case, Clang + LLVM on Mac OS X. For other compilers such as gcc proper, check the relevant documentation):

Timmys-MacBook-Pro:c_demo_system_info z0ltan$ clang -dynamiclib -o libsysteminfo.lib system_info.c

Timmys-MacBook-Pro:c_demo_system_info z0ltan$ ls
libsysteminfo.lib	system_info.c		system_info.h

Excellent! Finally, let’s write the Common Lisp client code to use this library:

;;; C-demo

(define-foreign-library libsysteminfo
  (:darwin "libsysteminfo.dylib")
  (:unix "libsysteminfo.so")
  (t (:default "libsysteminfo.dylib")))

(load-foreign-library 'libsysteminfo)

(defcfun "get_machine" :string)
(defcfun "get_model" :string)
(defcfun "get_memory" :long-long)
(defcfun "get_ncpu" :int)
(defcfun "get_nlogicalcpu" :int)

(defun print-system-info ()
  (let ((arch (get-machine))
        (model (get-model))
        (mem (/ (get-memory) (* 1024 1024 1024)))
        (ncpu (get-ncpu))
        (nlogicalcpu (get-nlogicalcpu)))
    (format t "System Information~%")
    (format t "Arch: ~a, Model: ~a, Mem = ~dGB, CPUs = ~d, Logical CPUs = ~d~%"
            arch model mem ncpu nlogicalcpu)))

(print-system-info)

(close-foreign-library 'libsysteminfo)

And the output:

LISP-TO-C-USER> (print-system-info)
System Information
Arch: x86_64, Model: MacBookPro11,2, Mem = 16GB, CPUs = 8, Logical CPUs = 8
NIL

Explanation: We define the native library by using the cffi:define-foreign-library macro. This macro also allows us to define the specific name of the shared library depending on the OS.

Then we can load the specified library using the cffi:load-foreign-library macro. Take care to observed that the name of the library is quoted. This can save you a lot of anguish later on.

The next part is interesting – we use the cffi:decfun macro to define the C functions present in the library as Lispy function. For instance, the C function “get_machine” which is defined in the libsysteminfo.dylib library, is proxied into the current Lisp image as “get-machine”. There are ways to perform such name mangling automatically, but letting the cffi library take care of this is my recommendation.

The general syntax of the defcfun macro is:

	(cffi:defcun  &optional 
		*) 

So the first defcfun indicates that get_machine is a C function that returns a character array (represented by cffi’s local type, :string), and that it doesn’t take any parameter(s), The cffi library defines a huge set of types that map to C’s primitive, pointer, and structure types extremely well.

Now that we have create proxies for the C functions, we can invoke them as seen in the print-system-info function by passing in the appropriate return type and parameters.

Finally, we unload the native library using another macro, cffi:close-foreign-library, which also takes a quoted library representation.

Interop with C++

Top

This is the more interesting demo for more than one reason! In this example, let’s try and sort an array of integers using the native library.

Again, let’s write the interface out first (in number_sorting.h):

#ifndef __NUMBER_SORTING_H__ 
#define __NUMBER_SORTING_H__ "number_sorting.h"

void callback_function(int[], int);

extern "C" {
    void sort_numbers(int[], int);
}
#endif

Looks good, but what’s the deal with the callback_function? We’ll get to that in just a moment. For now, let’s flesh out the functionality for this interface (in number_sorting.cpp:

#include 
#include 
#include 
#include 

void sort_vector(std::vector&, int[], int);

void callback_function(int array[], int size)
{
    std::vector vec(size);

    sort_vector(vec, array, size);

    int i = 0;
    for (std::vector::const_iterator it = vec.begin(); it != vec.end(); it++)
        array[i++] = *it;
}
    

template 
void display_elements(const std::vector& vec)
{
    for (std::vector::const_iterator it = vec.begin(); it != vec.end(); it++)
        std::cout << *it << " ";
    std::cout << std::endl;
}


void sort_vector(std::vector& v, int numbers[], int count)
{
    for (int i = 0; i < count; i++)
        v[i] = numbers[i];

    display_elements(v);

    std::sort(v.begin(), v.end(), [](int x, int y) { return x < y; });
}

int main()
{
    std::ios_base::sync_with_stdio(false);

    int sample[] = { 1, 2, 0, -1, 3, 199, 200, 110, -234, 12345 };

    callback_function(sample, sizeof(sample)/sizeof(sample[0]));

    for (int i = 0; i < (int)sizeof(sample)/sizeof(sample[0]); i++)
        std::cout << sample[i] << " ";
    std::cout << std::endl;

    return 0;
}

Hmmm, this seems a bit too convoluted for this simple example? Why all the indirection? The reason will become crystal clear once we define the corresponding C file (in number_sorting.c) as well:

#include "number_sorting.h"

void sort_numbers(int numbers[], int n)
{
    callback_function(numbers, n);
}

Explanation: The reasons why we need both number_sorting.c and number_sorting.cpp, both of which implement the same interface, number_sorting.h
are two-fold:

  1. Since we are using some C++-only features such std::vector , std::sort, and C++11 lambdas, we need to invoke them in a separate function
  2. And the more important reason – C++’s pernicious name-mangling

Now, if we had simply written the entire sorting functionality using integer arrays and sorted using with C-like constructs (say, qsort, or a manually written sorting function), we wouldn’t need all this indirection, and we could have simply written the header as:

#ifndef __NUMBER_SORTING_H__ 
#define __NUMBER_SORTING_H__ "number_sorting.h"

extern "C" {
    void sort_numbers(int[], int);
}
#endif

and provided the implementation in number_sorting.cpp alone. That would have worked out fine. However, because we use all those C++ templated constructs as well as functional constructs, if we had used this same header file, we would have got a name-mangling issue, and the function would not be visible to the Common Lisp client!

To get around this, we write a C wrapper (number_sorting.c) which simply invokes the C++ function callback_function defined in number_sorting.cpp. Now you may think that we could have simply embedded callback_function inside the definition of sort_numbers in the C++ file alone, but that would not work either. Check out the reference “How to mix C and C++” in the “References section” for more details.

All right, let’s compile the code and generate the shared library:

Timmys-MacBook-Pro:c++_demo_sorting z0ltan$ clang++ -std=c++11 -stdlib=libc++ -dynamiclib -o libnumbersorting.dylib number_sorting.c number_sorting.cpp
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated

Timmys-MacBook-Pro:c++_demo_sorting z0ltan$ ls
libnumbersorting.dylib	number_sorting.c	number_sorting.cpp	number_sorting.h

Timmys-MacBook-Pro:c++_demo_sorting z0ltan$ nm -gU libnumbersorting.dylib 
0000000000000a10 T __Z11sort_vectorRNSt3__16vectorIiNS_9allocatorIiEEEEPii
0000000000000730 T __Z17callback_functionPii
0000000000000c00 T _main
0000000000000700 T _sort_numbers

We can also see that the function sort_numbers has not been subjected to name-mangling. Now that we’ve resolved that, let’s flesh out the Common Lisp client, and run the demo!

;;; C++-demo

(define-foreign-library libnumbersorting
  (:darwin "libnumbersorting.dylib")
  (:unix "libnumbersorting.so")
  (t (:default "libnumbersorting.dylib")))

(use-foreign-library libnumbersorting)

(defun sort-some-numbers (&optional (n 10))
  (with-foreign-object (numbers :int n)
    (dotimes (i n)
      (setf (mem-aref numbers :int i) (random 100)))
    (let ((before (loop for i below n
                     collect (mem-aref numbers :int i))))
      (format t "Before: ~{~d ~}~%" before))
    (foreign-funcall "sort_numbers" :pointer numbers :int n :void)
    (let ((after (loop for j below n
                    collect (mem-aref numbers :int j))))
      (format t "After: ~{~d ~}~%" after))))

(sort-some-numbers)

(close-foreign-library 'libnumbersorting)

And the output:

LISP-TO-C-USER> (sort-some-numbers 15)
Before: 52 11 18 62 39 89 2 48 48 66 73 89 73 26 97 
After: 2 11 18 26 39 48 48 52 62 66 73 73 89 89 97 
NIL

Cool!

Explanation: This demo differs only slightly from the C demo in terms of Common Lisp code. We define the native library in the same manner, but we use another macro,
use-foreign-library instead this time. This is my preferred way of loading a native library since I always forget the quoting with load-foreign-library!

Jokes aside, we can see another way of executing a function defined in a native library: cffi:foreign-funcall.

This macro has the following syntax:

	(cffi:foreign-funcall  &optional* 
		)

I tend to prefer foreign-funcall for functions with only side-effects (as in this case), and use defcfun when I need to use the function in the Common Lisp part more than once. YMMV.

The most interesting bit, of course, in the with-foreign-object macro. I won’t bother to show its general syntax, but suffice to say that this macro is used to allocate, set, and use foreign memory (i.e., from the native library) with encapsulation within its body.

In this case, we simply generate a C integer array (not the usage of the type specifier, :int), set the values of the elements of this array using cffi:mem-aref, and read the values of the array using the same accessor function.

Note that value of the var numbersis a pointer type, and also that this is available only within the body of the macro. In the next post, we will see how we can work with custom C-style structs.

Useful basic functions

Top

Here is a summarised list of the functions used in the demos in this blog post.

  • cffi:define-foreign-library
  • cffi:load-foreign-library
  • cffi:close-foreign-library
  • cffi:use-foreign-library
  • cffi:defcfun
  • cffi:foreign-funcall
  • cffi:with-foreign-object
  • cffi:mem-aref

References

Top

Some references that you might find useful on this subject matter:

A highly opinionated review of Java Lambdas

What really is a lambda expression?

A lambda expression is, for all means and purposes, an anonymous function. That really is all there is to it. In languages that support first-class functions, this is yet another feature of the language – functions are on par with other types in the language. However, in language that don’t consider functions first class, it becomes a bit of an esoteric concept.

The origin of the concept is in the Lambda Calculus first propounded by the great Alonzo Church. According to that scheme, functions are basically entities which take some (or no) parameters, and have a body of code that can use those parameters. There is essentially no side-effect in such functions. That means that the function is deterministic – given the same set of parameters, it will always produce the same output. This is, in fact, the very foundation of Functional Programming. In modern times, Functional Programming is often conflated with strongly and statically typed languages. This is clearly wrong. The original Lambda Calculus had really no notion of types! (There is a variant of it though, the typed Lambda Calculus). Most of the languages that support lambda expressions today, however, freely allow plenty of side-effects within lambda expressions. The main takeaway here though is that lambda expressions are conceptually what named functions are made out of.

Lambdas in Java 8 and how they came to be

The biggest features in Java 8 were lambda support and the Stream API. In many ways, lambdas are important only with respect to their heavy use in the stream APIs (as seen in the previous blog on Shell). The key concept to understand when learning lambdas in Java is that lambdas/functions are not first-class objects in Java. Their entire existence is strictly bound to and controlled by interfaces, specifically the concept of SAMs (Single Abstract Method) interfaces – interface which contain only a single abstract method. In my opinion, this severe crippling of lambdas in Java has created more problems than it has solved. Now new programmers who pick up Java and run with it are liable to be very confused when to move on to languages which do support lambdas in a more natural and proper manner. In any case, let’s work with what we’ve got.

So why a Functional Interface? Prior to Java 8, if we wanted to simulate cases where we wanted to pass some functionality into another function, we had to make do with anonymous classes. For instance, to create a new thread, we could have done the following:

jshell> (new Thread (new Runnable () {
   ...>    @Override
   ...>    public void run () {
   ...>     System.out.println("Hello from thread!");
   ...>    }
   ...>  })).start()

jshell> Hello from thread!

We observe that the sole purpose of the anonymous class is to perform some actions, but the cognitive dissonance comes into play when we see that the Thread class constructor experts an instance (basically a data object) of type Runnable. This is exactly the same pattern that was followed by C++ until C++11. In fact, this is what is known (rather pompously, I must add) as a functor.

Here is what the Runnable interface looks like:

public interface Runnable {
   void run();
}

This pattern of the use of a (dummy) interface containing a single method which basically does all the work that an anonymous or named function should have done in the first place, was found to be in such widespread use amongst Java developers that the committee which worked on developing lambda support in Java decided to make it kosher and provide additional support from the Java Runtime. As a result, from Java 8 onwards, wherever a SAM is present, a lambda expression can be used as its target or in its stead. They have been made essentially the same.
For example, the previous example can now be written more succinctly as:

jshell> (new Thread(() -> System.out.println("Hello from thread...again!"))).start()
Hello from thread...again!

An optional annotation, @FunctionalInterface has also been introduced for bookkeeping purposes. In fact, in order to help out developers, a bunch of Functional Interfaces now come bundled with the JDK in the java.util.function package. I would highly recommend exploring them and testing them out to get a feel for them.

Custom Functional Interfaces

We can define our own functional interface in Java 8 (and above). The only restriction for the interface to be a functional interface is, as mentioned before, is that the interface have a single abstract method.

For instance, the standard package (java.util.function) comes with functional interfaces that support single parameter (Function) and double parameter (BiFunction) functions. Let us define a triple parameter function just for this example.

jshell> @FunctionalInterface interface TriFunction<T, U, V, R> {
   ...>     R apply(T t, U u, V v);
   ...> }
|  created interface TriFunction

jshell> int x = 100;
x ==> 100

jshell> int x = 100
x ==> 100

jshell> String y = "Hello"
y ==> "Hello"

jshell> double z = Math.PI
z ==> 3.141592653589793

jshell> TriFunction<Integer, String, Double, String> f = 
          (i, s, d) -> i + s + d;
f ==> $Lambda$6/1318822808@6d7b4f4c

jshell> System.out.println(f.apply(x, y, z))
100Hello3.141592653589793

Features and Limitations of Java Lambdas

So how exactly does a SAM map onto a lambda expression? To understand this better, first we need to get the syntax and semantics of lambda expressions out of the way:

Java’s lambda syntax was clearly influenced by Scala. A basic lambda expression has the following form:

(<param*>) -> [{] <body-form+> [}]

where,

’param’ is a comma-separated list of zero or more parameters with optional types (note that in some cases where Java’s type inference mechanism is unable to infer the type, you will need to specify the type(s) explicitly), the braces are optional in the case of a single line body, but are required when the body spans more than one line. Finally, each body-form is a series of normal Java statements. In the case of multiple statements, each body form is separated by a semi-colon, and a return statement is also required in this case (if the return type is not void).
So a lambda expression that takes a String and returns a String might take on several forms in actual code:

(s) -> s.toUpperCase()

The type signature is not required in this case, and the return statement is not allowed in this case, This would be the recommend usage of a typical lambda expression – don’t declare the types and don’t use any return statement. Of course, this only works for a single-statement (or, more correctly, a single-expression) body.

In case we want to use braces, we need to have the whole expression take the following form:

(String s) -> { return s.toUpperCase() }

So we need to specify the type of the parameter(s) as well as include an explicit return statement. In all cases where the body contains multiple statements, this would be the recommended format for a lambda expression.

Now getting back to how a SAM is mapped onto a lambda expression, whenever the Java Runtime encounters a lambda expression, it can do either of two things depending on the context in which the SAM is used:

  • In case the lambda expression is used along with a Stream API function (such as map, filter, reduce, etc.), the Java Runtime already has enough context about the form of the function that is expected – the parameter types and the return type. For instance, if we are trying to double all the even natural numbers upto 10, we might do:
    jshell> IntStream
             .rangeClosed(1, 10)
             .filter((n) -> n%2 == 0)
             .map((d) -> d*2).forEach(System.out::println)
    4
    8
    12
    16
    20
    

    In this case, the Java Runtime knows that the filter method takes a parameter of the form: Predicate. The Predicate functional interface has a single method – boolean test(Test t). So what the Runtime does is to check that the provided lambda expression matches this signature, and if verified, proceeds to invoke the “test” method implicitly. Similarly for the map function as well.

  • The second case arises in the case where we make use of Functional Interfaces explicitly and then use them as the “target” of a lambda expression. For instance, suppose we want to write a function that takes a String and an Integer and returns their concatenated form as a String, we might have something like:
    jshell> BiFunction<String, Integer, String> f = 
               (s, i) -> s + String.valueOf(i)
    f ==> $Lambda$17/103887628@42f93a98
    
    jshell> f.apply("Hello", 99)
    $21 ==> "Hello99"
    

    In this case as well, the compiler will ensure the the lambda expression matches the type of the declared function variable. Pretty straightforward.

So far so good, but there is a huge problem in the second case above. The problem is that even once the function object has been created, the name of the SAM must be known before we can use it. This is because Java does not have operator overloading (unlike C++). This is why in the current framework, we must know the exact name of each functional interface that we use. The “apply” method used above is the name of the SAM in the BiFunction functional interface. The problem is compounded because each functional interface (even in the standard package) defines its own names. Of course, this is not an insurmountable problem, but the same problem did not exist even in pre-C++-11. For instance, the previous example could have been done so in C++ (using a functor):

// pre C++-11
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

class string_int_string
{
    public:
        std::string operator()(std::string s, int i)
        {
            std::ostringstream oss;
            oss << s << i;
            return oss.str();
        }
};

int main()
{
    std::cout << concatenate("Hello", 99, string_int_string()) << std::endl;
   
    return 0;
}

A bit brittle, but it works. The generic function, “concatenate” is important to note here since it can basically take any functor (or lambda expression from C++-11 onwards), and invokes the function object with the supplied arguments. The same approach is used in the C++ STL generic functions. Now if we look at how the code might look like with C++-11, we get:

// C++-11 and above
#include <iostream>
#include <sstream>

template< typename T, typename Func>
std::string concatenate(std::string s, T t, Func f)
{
    return f(s, t);
}

int main()
{
   std::cout << concatenate("Hello", 99, 
		[](std::string s, int i) {
        			std::ostringstream oss;
        			oss << s << i;
        			return oss.str();
    		}) << std::endl;
    
    return 0;
}

As can be seen, the approach is much cleaner. The difference between the functor-version and the lambda-based one is that in this case, we’ve essentially got rid of the class representing the functor object and inserted its logic inside the lambda expression’s body. So it essentially appears that the lambda expression’ is basically an object that can bind the parameters just as in the case of a regular functor.

As can be seen, even in C++-11, we can write generic functions and all we need to do it invoke it like a function. No messy SAMs there! I personally feel that C++’s lambda support is far superior to that of Java, especially since C++ supports closures. More on that in the next section.

Another disadvantage of Java’s lambda support is that the following is impossible in Java:

#include <iostream>

int main()
{
    int x = 100, y = 100;
    
    std::cout << ([x, y]() { return x + y; })() << std::endl;
    
    return 0;
}

The code above simply uses a lambda expression to capture variables defined in the outer lexical scope (more on that in the next section), but the interesting bit is that the lambda expression can be invoked like a proper function object even without assigning it to a variable.

If we tried the same in Java, we’d get an error:

jshell> int x = 1100
x ==> 1100

jshell> int y = 200
y ==> 200

jshell> (() -> x + y)
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  (() -> x + y)
|   ^---------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  (() -> x + y)
|  ^-----------^

As can be seen from the error message, the Java Runtime complains that “Object” is not a functional interface. Even if we assumed that the runtime would be able to discern the functional interface type from its signature and produce a result, we still get an error:

jshell> ((int a, int b) -> { return a + b; })).apply(x, y)
|  Error:
|  ';' expected
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                       ^
|  Error:
|  incompatible types: java.lang.Object is not a functional interface
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|   ^---------------------------------^
|  Error:
|  incompatible types: <none> cannot be converted to java.lang.Object
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^-----------------------------------^
|  Error:
|  cannot find symbol
|    symbol:   method apply(int,int)
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|                                         ^---^
|  Error:
|  missing return statement
|  ((int a, int b) -> { return a + b; })).apply(x, y)
|  ^------------------------------------------------^

So no go there. A point down for Java lambdas! More seriously, I find this to be an extremely irritating reminder that Java’s lambdas are not really lambdas. They are more like syntactic sugar for the good old anonymous classes. In fact, there are more serious implications precisely for this reason.

Closures

This is again one of those concepts that are notoriously badly explained. A lot of newbies to programming are often scared to death and put-off from learning more about Functional Programming due to unnecessary FUD on the part of many “experts” in the field. So, let’s try and explain this as clearly as possible:

In Set Theory, a set is defined to be “closed” under an operation if applying the operation to members of the set produces a result that belongs to the same set. For instance, if the set under consideration is the set of Natural Numbers (N) and the operation is + (addition), we can say Natural numbers are closed under addition. Why? The reason is quite simple and follows straight from the definition – adding any two natural numbers (or indeed any number of numbers, but we’re considering the strict binary operation here) always produces a Natural number. On the other hand, N is not closed under – (subtraction). This is because subtracting some Natural number from another Natural number might produce 0 or some negative number, which is clearly not a member of N. So much for mathematics.

In Psychology, “closure” refers to the strict need of an individual to find a definitive answer to a problem.

In business, “closure” refers to the process by which a business closes down.

You see what I’m getting at? The term “closure” is highly overloaded, and even within mathematics, the term has different meanings in different branches. So my point is this – simply forget about the name and focus on the concept.

In Computer Science, a closure is intricately tied to the concept of scoping, specifically lexical scoping. This is why closures are often referred to as “lexical closures”. In order to understand closures properly, we must clearly understand what lexical scoping entails.

Lexical scoping is intimately tied with the rules defining the lifetimes (and visibility) of variables. Dynamic scoping, in general, refers to a situation where a variable has effectively global visibility and lifetime. Pure lexical scoping, on the other hand, ensures that the visibility of variables is limited to the current lexical block (say a function or a local block), or to nested blocks. However, lexically scoped variables are not visible to outer blocks, and variables defined in inner blocks will effectively “shadow” those defined in the outer scope. If no new variables with the same name are defined in the inner block, references to variables will always refer to those in the outer scope. This behaviour forms the basis of what is known as “variable capture”.

A variable is said to be captured by a lambda function if the lambda function refers to an outer-scope variable during its time of creation. The lambda function is said to “close over” those variables, and this is the reason why this feature is called a “closure”. So what does this variable capture actually implicate in the grand scheme of things? What it implicates is this – when a lambda function captures a variable in its outer scope, the lifetime of the variable is effectively changed. Under normal circumstances, local variables die when the function is exited. In this case, however, since the lambda function has captured the variable, the variable will not die even when the function in which it was defined dies!

In this respect, Java behaves absolutely horribly. Java has extremely weird scoping rules. In some ways, it does use lexical scoping. In most respects, not:

jshell> void demoScopingInJava() {
   ...>          int x = 100;
   ...>          
   ...>          System.out.format("Function scope x = %d\n", x);
   ...>         {
   ...>             System.out.format("Function scope x (before shadowing) = %d\n", x);
   ...>             /// int x = 999 is not allowed!
   ...>             x = 999;
   ...>             System.out.format("Function scope x (after shadowing) = %d\n", x);
   ...>         }
   ...>          System.out.format("Function scope (again) x = %d\n", x);
   ...>      }
|  created method demoScopingInJava()

jshell> demoScopingInJava()
Function scope x = 100
Function scope x (before shadowing) = 100
Function scope x (after shadowing) = 999
Function scope (again) x = 999

Java does not allow any shadowing because we cannot define any new variables inside the block. Instead, all references to the variable ‘x’ are actually to the function scope variable. In this case, we are able to mutate ‘x’ from 100 to 999, but this is because the inner block is within the outer function block and the Java Runtime can therefore ensure that this variable is freed before the function exits. However, this is not allowed when are in a situation where the variable could be referenced even after the local function where it was declared exits.

For instance, if we want to implement a function that prints line numbers in an increasing order every time it is called, we might try to do something like this in Java:

jshell> Function<Void, Void> lineNumberGenerator() {
   ...>      int lineNumber = 0;
   ...>      return (n) -> 
                 { lineNumber++; 
                   System.out.format("Line number: %d\n", lineNumber); 
                   return null; };
   ...> }
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                       ^--------^
|  Error:
|  local variables referenced from a lambda expression must be final or effectively final
|       return (n) -> { lineNumber++; System.out.format("Line number: %d\n", lineNumber); return null; };
|                                                                            ^--------^

We can see, though, that modifying a variable defined in the outer scope is not allowed in the case here the code escapes the local scope. As can be clearly seen in the error messages, the variable lineNum must be declared “final” for the code to even compile (and of course, then it would fail again unless we removed the mutating statement inside the lambda function).
This is the reason why we cannot implement closures in Java – Java’s bizarre downward-percolating forced visibility of variables.

And, oh, just in case you thought this applies only to lambda blocks, it’s always been the case:

jshell> void scopingRulesTest () {
   ...>    int x = 100;
   ...>    
   ...>    (new Thread(new Runnable () {
   ...>        @Override
   ...>        public void run() {
   ...>           x++;
   ...>          System.out.println(x);
   ...>        }
   ...>      })).start();
   ...> }
|  Error:
|  local variables referenced from an inner class must be final or effectively final
|            x++;
|            ^

The same example in C++ works as expected (including modification of the outer scope’s variable):

#include <iostream>

using namespace std;

int main()
{
    ios_base::sync_with_stdio(false);
    
    int x = 100;
    cout << "Function scope x = " << x << endl;
    {
        x = 101;
        cout << "Block scope x = " << x << endl;
        int x = 999;
        cout << "Block scope x = " << x << endl;
    }
    
    cout << "Function scope x = " << x << endl;
    
    return 0;
}

sh-4.3$ main                                                                                                                                                              
Function scope x = 100                                                                                                                                                    
Block scope x = 101                                                                                                                                                       
Block scope x = 999                                                                                                                                                       
Function scope x = 101   

And to complete the line number closure demo (line number example):

// C++-11 (and above)
#include <iostream>
#include <functional>

using namespace std;

function<void()> line_number_generator()
{
    int line_num = 0;
    
    return [line_num]() mutable
            {
                line_num++;
                cout << "Line number: " << line_num << endl; 
            };
}

int main()
{
    ios_base::sync_with_stdio(false);
    
    function<void()> print_line_numbers = line_number_generator();
    
    for (int i = 0; i < 5; i++) {
        print_line_numbers();    
    }
}

sh-4.3$ g++ -std=c++11 -o main *.cpp                                                                                                                                      
sh-4.3$ main                                                                                                                                                              
Line number: 1                                                                                                                                                            
Line number: 2                                                                                                                                                            
Line number: 3                                                                                                                                                            
Line number: 4                                                                                                                                                            
Line number: 5  

Note that default variable capture is read-only in C++. However, the “mutable” keyword can be used to change that behaviour. In all respects, C++11 supports closures while Java cannot!

The Common Lisp version is pretty much identical in behaviour to the C++ one. In the case of Common Lisp, however, we have the extra implication that any references to outer-scope variable always capture the mentioned variable (unless a local variable with the same name is already defined). This is seen in the Common Lisp version of the same example:

CL-USER> (defun foo ()
           (let ((x 100))
             (format t "Function scope x = ~d~%" x)
             (progn
               (setf x 101)
               (let ((x 999))
                 (format t "Inner block x = ~d~%" x)))
             (format t "Function scope (again) x = ~d~%" x)))
STYLE-WARNING: redefining COMMON-LISP-USER::FOO in DEFUN
FOO
CL-USER> (foo)
Function scope x = 100
Inner block x = 999
Function scope (again) x = 101
NIL

This effectively ensures that a nested function that refer to the outer scope var(s), and which is then returned from the function is always a closure as can be seen from the following example (same example as the C++ one):

CL-USER> (defun line-number-generator ()
                      (let ((line-number 0))
                        #'(lambda ()
                             (incf line-number)
                            (format t "Line number: ~d~%" line-number))))
LINE-NUMBER-GENERATOR

CL-USER> (defvar print-line-numbers (line-number-generator))
PRINT-LINE-NUMBERS

CL-USER> print-line-numbers
#<CLOSURE (LAMBDA () :IN LINE-NUMBER-GENERATOR) {100439133B}>

CL-USER> (dotimes (i 5)
           (funcall print-line-numbers))
Line number: 1
Line number: 2
Line number: 3
Line number: 4
Line number: 5
NIL

Conclusion

Well, that about wraps up this rather long blog post! As can be seen from this post (as well as the JShell post and more posts to come, especially on Java Streams), lambda support in Java is an extremely welcome and necessary feature. However, in many ways, it’s a very crippled version of lambda support found in other languages, especially with regards to how closures are not supported in Java.Thankfully, most code that uses lambda expressions will be code that uses the Streams API, and as such most (if not all) the wartiness of Java’s lambdas will be effectively secreted within map, filter, reduce or some other mechanism in the Streams API.

Note: All the Java code posted in this blogpost was executed on JShell. For use in a regular Java environment, ensure to add semi-colons wherever appropriate.

A simple ‘letn’ macro

The ‘let’ special form is one of the most frequently used constructs in Common Lisp. This allows true lexical binding of variables (and is quite often used like local variables in imperative languages). The general form of this binding is as follows:

(let ((binding)*)
   (body-form)*)

For instance, we might have a function which reads in two numbers and finds their sum:

(defun add-two-numbers ()
      (let ((x (read))
               (y (read)))
           (+ x y)))

The example above is trivial enough and is quite readable. However, let bindings can become increasingly complex and unreadable in most general cases. Having had a taste of the mind-bendingly powerful macro system (plain macros, not Reader Macros, which are an entirely different concept), I decided to patch together a simple macro called ‘letn’ that would allow me to write let-bindings in a much more simplistic (with fewer parentheses) form. Of course, this is merely an exercise in testing out macros rather than a serious extension to the language (even though I have found it useful beyond my own expectations). To wit, instead of a form such as:

(let ((a 1)
         (b 2)
         (c 3)
          d
          (e 100)
          (f “hello”))
     (let ((g “world”)
              (h “again”))
           (format t “~d~%” (+ a b c e))
           (format t “~a~%” (concatenate ‘string f g h))
           (format t “The value of ~a is, of course, ~a~%” ‘d d)
           ‘done)))

We might write:

(letn a 1 b 2 c 3 d e 100 f “hello”
    (letn g “world” h “again”
             (format t “~d~%” (+ a b c e))
             (format t “~a~%” (concatenate ‘string f g h))
             (format t “The value of ~a is, of course, ~a~%” ‘d d)
             ‘done))

Notice how the second form is much more readable and less-dependent on form structure and indentation to convey its meaning. As the level of nesting increases, this becomes even more apparent, and thanks to the power of Common Lisp’s macro system, this nesting can be handle to arbitrary levels without any modifications or special handling on our behalf.

Now let’s write this macro!

To begin with, let us write out the logical form of the macro itself and then fill out the missing functions/macros as we go along.

Here’s what the macro looks like at this stage:

(defmacro letn (&body body)
    (multiple-value-bind (arg-forms body-forms) (parse-body body)
        `(let (,@(loop for arg in arg-forms
                              collect `(,(car arg) ,(cdr arg))))
                      ,@body-forms)))

Explanation: The assumption is that all variable bindings occur in the beginning – either as pairs of variables and values (such as, a 1) or as a variable followed by no value (such as, b, in which case it is assigned nil by default). ‘multiple-value-bind’ allows us to bind multiple return values to variables of our own. In this case, I want to assign the list of all variable bindings to the variable ‘arg-forms’ and then the remaining forms (which may be arbitrarily nested binding and body forms) to the ‘body-forms’ variable. These lists are returned by the ‘parse-body’ function which is filled out next:

(defun parse-body (body)
	   (let ((args '())
		 (forms '()))
	     (labels ((f (lst)
			(if (listp (car lst))
			    (setf forms lst)
			    (progn
			      (push (car lst) args)
			      (f (cdr lst))))))
	       (f body))
	     (values (param-pairs (reverse args)) forms)))

Explanation: The parse-body function simply takes the list of values passed in (body), and then sets about recursing until the entire list has been processed. While recursing, any non-list values are assumed to be variable bindings and are pushed on to the ‘args’ variable, and upon encountering the first list form, the remaining list forms (including the current one) are bound to the ‘forms’ variable. Finally, since ‘push’ pushes values at the front, we need to reverse this list, invoke ‘param-pairs (which will create dotted pairs of variable bindings), and then return this processed list as well as the unprocessed ‘forms’ variable back to the macro (which then uses ‘multiple-value-bind’ to extract out the components as explained before). Finally, the param-pairs function looks as shown below:

(defun param-pairs (args)
	   (labels ((f (lst rev)
		      (if (null lst)
			  (reverse rev)
			  (if (symbolp (cadr lst))
			      (f (cdr lst) (cons (cons (car lst) nil) rev))
			      (f (cddr lst) (cons (cons (car lst) (cadr lst)) rev))))))
	     (f args '())))

Explanation: This is a relatively simply function that simply takes the input list (which might be of the form, say, (a 1 b 2 c 3 d e 100), and then constructs dotted pairs of the form, ((a . 1) (b . 2) (c . 3) (d . nil) (e . 100)). Again, since ‘cons’ adds elements to the front, when the entire list has been processed, the concatenated list needs to be reversed and returned.

And finally, we are done! To see how our example runs, we can simply execute it:

CL-USER> (letn a 1 b 2 c 3 d e 100 f "hello"
	      (letn g "world" h "again"
		    (format t "~d~%" (+ a b c e))
		    (format t "~a~%" (concatenate 'string f g h))
		    (format t "The value of ~a is, of course, ~a~%" 'd d)
		    'done))
106
helloworldagain
The value of D is, of course, NIL
DONE

As can be seen, the new macro itself can be nested. It might seem a bit weird since we are mixing macros (which are expanded at macro-expansion time as part of the compilation phase) and functions (which are evaluated at runtime), but this is completely normal since we are taking care to ensure that evaluated and non-evaluated forms mix together correctly. The functions generate values which are then quasi-quoted and unquoted appropriately. And, of course, the nesting is handled by the wonderful splicing-comma (,@(body-forms) in the ‘letn’ macro). To see what this example actually expands out to, we can used macroexpand (or macroexpand-1):

CL-USER> (setf *print-pretty* t)
T
CL-USER> (macroexpand-1
	  '(letn a 1 b 2 c 3 e 100 f "hello"
	    (letn g "world" h "again"
		  (format t "~d~%" (+ a b c e))
		  (format t "~a~%" (concatenate 'string f g h))
		  (format t "The value of ~a is, of course, ~a~%" 'd d)
		  'done)))
(LET ((A 1) (B 2) (C 3) (E 100) (F "hello"))
  (LETN G
        "world"
        H
        "again"
        (FORMAT T "~d~%" (+ A B C E))
        (FORMAT T "~a~%" (CONCATENATE 'STRING F G H))
        (FORMAT T "The value of ~a is, of course, ~a~%" 'D D)
        'DONE))
T

Of course, the inner ‘letn’ will be expanded according to the same rules as well.

Future improvements: This is a trivial macro because there are a lot of assumptions about the use of the new ‘letn’ construct. The biggest problem is that it cannot handle invocations of the form (amongst others):

(letn a (read) b (read)
      (+ a b))

The reason is that this macro assumes that the first occurrence of a list entity marks the beginning of the forms that constitute the macro body, and as such, this simple example would expand out to:

CL-USER> (macroexpand-1
	  '(letn a (read) b (read)
	    (+ a b)))
(LET ((A NIL)) (READ) B (READ) (+ A B))
T

This is clearly incorrect and results in ‘b’ becoming an unbound variable as well. The problem now is how to distinguish between variable bindings and genuine body forms. I am still working on improving my understanding of how macros work and whether there are mechanisms which can help in this situation, or if I need to handle them myself. I will, of course, post my improvements as I go along. Stay tuned!

Path to Common Lisp

I have always been extremely intrigued by Common Lisp ever since I can remember. The sheer simplicity of the basic concepts of this language (which is essentially Lambda Calculus in disguise) is what drew me on to it, and for a long time, it remained just that – an enigma that attracted me every so often and yet left me exasperated with the paucity of good materials and an active community. However, of late, I have been giving it a real serious go. The immediate impetus has to be this excellent collection of interviews by Vsevolod Dyomkin (available here – Lisp Hackers. It is a veritable treasure trove of information about the latest generation of Lisp hackers. The common theme that is to be found though is that Common Lisp itself is not used as much as it is worked upon.

My current undertaking of Lisp is a rather determined one and I daresay that I have made good progress. Another advantage of this is that Scheme and Clojure (which I had tried before but didn’t like that much) are but a small leap from here. My main interest is however in mastering Lisp concepts well enough so that I can use it for my own personal projects to begin with, and then see where that leads me (I really do envy Zach Beane in this regard – the man hacks on Lisp full time at Clozure Associates!).

In this introductory post (after a long long hiatus from blogging), I would like to begin with describing my own path to Common Lisp (and I believe this should help act as a rough guide for any beginners embarking upon the Lisp journey!):

  • Start off with Land of Lisp – arguably the most user-friendly (and yet powerful) introduction to Common Lisp today. It is extremely well-structured and fun to work with. In my opinion, most of the “games” in later chapters can be safely skipped to begin with. They can taken up after a solid understanding of Lisp fundamentals is in place.
  • Next up is Practical Common Lisp. Never mind what the reviews say, this is definitely not a great first book for absolute beginners in Common Lisp. The book makes a lot more sense after finishing Land of Lisp. In my experience, some of the chapters are well-written, quite a few are slipshod and follow a bizarre series of crammed and cryptic notes (especially the chapter on CLOS), but the practical chapters are absolutely necessary to get the hands-on experience that makes one a well-rounded programmer. Overall, a great book.
  • Now one should be in a position to tackle the venerable (if opinionated) Paul Graham’s anthology – ANSI Common Lisp and On Lisp (in that order). This is where I am as well on my own journey in Common Lisp.
  • One must-read book is Let Over Lambda by Doug Hoyte. This is a good follow-up book to Paul Graham’s classics (and even on its own, a great way to expand your perspective – the first few chapters made me really “get” Closures and true Lexical Scoping at long last).
  • Tons of hands-on (this should be orthogonal to this list in any case. Practical sessions are what really teach one…well, anything really). Some few resources to get started off would be – exercism.io and 99 Problems in Lisp. The latter is really useful for practising Functional Programming in Common Lisp.
  • Just a quick note on development environments. I personally use emacs + SLIME + SBCL. SBCL is a very efficient implementation of Common Lisp. CLisp works fine too, but it’s rather very slow (understandably so since it’s not compiled to machine code like SBCL is). Some other flavours are – Clozure CL, LispWorks, and AllegroCL. The latter two are commercial distros, but they do offer personal editions which work just fine for most purposes. The fun bit is that SLIME can connect to any of these flavours in any case, so your development environment can remain consistent irrespective of which Lisp flavour you choose to work with.

    I suppose that should about do it for an introductory post on Common Lisp! Happy Hacking!