Basic Concurrency and Parallelism in Common Lisp – Part 4b (Parallelism using lparallel – Error Handling)

In this final part, we will discuss the very important topic of error handling, how lparallel handles it, and cap off the series with a small benchmarking example that will tie in all the concepts covered thus far.

The demo will help check actual core usage on the machine when using the lparallel library.

Contents

Initial Setup

There is no additional setup required for this tutorial from the last tutorial.

In case you missed it, please check out the previous post – Parallelism fundamentals using lparallel.

5-minute Error Handling refresher

Before we jump headlong into the demos, here is quick refresher guide to Conditions and Restarts in Common Lisp (error handling). In case you are comfortably familiar with this topic, please skip ahead to the next section.

In case you are a novice interested in getting a more comprehensive treatment of Conditions and Restarts in Common Lisp, I recommend two things – firstly, check out my detailed post on the fundamentals of Conditions and Restarts in Common Lisp, and secondly, check out the links in the References section at the end of this post.

For our refresher, let’s take a simple example. We have a custom square root function. To keeps things simple, let us have a single check to ensure that the argument is zero or positive. We will forego all other validation.

First we define the relevant error condition:

(defpackage :positive-sqrt-user
  (:use :cl))

(in-package :positive-sqrt-user)

;;; define the error condition
(define-condition negative-error (error)
  ((message :initarg :message :reader error-message)))

Now let’s define the square root function itself. It is a simple implementation of the Newton-Raphson algorithm for finding the square root of a positive number (or zero). We take the first approximation/guess as 1.0d0:

(defconstant +eps+ 1e-9)

(defun square-root (n)
  "Find the square root using the Newton-Raphson method."
  (if (< n 0)
      (error 'negative-error :message "number must be zero or positive"))
  (let ((f 1.0d0))
    (loop
       when (< (abs (- (* f f) n)) +eps+)
       do (return f)
       do (setf f (/ (+ f (/ n f)) 2.0d0)))))

Nothing special there. The function simply loops until the candidate square root is within acceptable limits from the actual square root of the argument. For the sake of completion, the key step in the algorithm is the following:

(setf f (/ (+ f (/ n f)) 2.0d0)

This is as per the formula for calculating the next square root approximation at each stage:

x_{n} = \frac{1}{2}\left(x_{n-1}+ \frac{n}{f}\right)

In terms of error handling, we can handle the error in three different canonical ways (amongst others).

First, we can catch and process the error directly (similar to the try-catch-finally construct in some other languages:

;;; handle the error directly
(defun test-sqrt-handler-case ()
  (let ((n (progn
             (princ "Enter a number: ")
             (read))))
    (unwind-protect (handler-case (square-root n)
                      (negative-error (o) (format t "Caught ~a~%" (error-message o)) nil))
      (format t "Nothing to clean up!"))))

Testing it out:

POSITIVE-SQRT-USER> (test-sqrt-handler-case)
Enter a number: 200
Nothing to clean up!
14.142135623730955d0

POSITIVE-SQRT-USER> (test-sqrt-handler-case)
Enter a number: -200
Caught number must be zero or positive
Nothing to clean up!
NIL

Or, we could handle it automatically using a restart. Suppose we want to automatically return 1.0d0 as the result if we encounter an invalid argument to square-root, we could something like this:

;;; automatic restart
(defun test-sqrt-handler-bind ()
  (let ((n (progn
             (princ "Enter a number: ")
             (read))))
    (handler-bind
        ((negative-error #'(lambda (c)
                             (format t "Caught: ~a~%" (error-message c))
                             (invoke-restart 'return-one))))
      (restart-case (square-root n)
        (return-one () 1.0d0)))))

Test run:

POSITIVE-SQRT-USER> (test-sqrt-handler-bind)
Enter a number: 200

14.142135623730955d0
POSITIVE-SQRT-USER> (test-sqrt-handler-bind)
Enter a number: -200
Caught: number must be zero or positive
1.0d0

Of course, the real usefulness of this scheme is realised when we have more restart cases available than these trivial ones.

And finally, we could handle it interactively, which allows us to enter a new value for the argument to square-root. (This interactive mode of development/operation is unique to the Lisp world).

(defun read-new-value ()
  (format *query-io* "Enter a new value: ")
  (force-output *query-io*)
  (multiple-value-list (read)))

;;; Interactive restart
(defun test-sqrt-interactive ()
  (let ((n (progn
             (princ "Enter a number: ")
             (read))))
    (restart-case (square-root n)
      (return-nil () nil)
      (enter-new-value (num)
        :report "Try entering a positive number.”
        :interactive (lambda () (read-new-value))
        (square-root num)))))

Test drive!

POSITIVE-SQRT-USER> (test-sqrt-interactive)
Enter a number: 200

14.142135623730955d0

POSITIVE-SQRT-USER> (test-sqrt-interactive)
Enter a number: -200

Condition POSITIVE-SQRT-USER::NEGATIVE-ERROR was signalled.
   [Condition of type NEGATIVE-ERROR]

Restarts:
 0: [RETURN-NIL] RETURN-NIL
 1: [ENTER-NEW-VALUE] Try entering a positive number.
 2: [RETRY] Retry SLIME REPL evaluation request.
 3: [*ABORT] Return to SLIME's top level.
 4: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1004E98003}>)

Enter a new value: 200

14.142135623730955d0

Error Handling in lparallel

Top

lparallel provides the lparallel:task-handler-bind construct. This is, for all means and purposes, equivalent to the handler-bind construct in Common Lisp. However, it is optimised for error handling inside of parallel tasks launched using the lparallel library.

The problem

Top

Why is this important? Well, take the following example for instance:

(define-condition foo (error) ())

;;; error handling with handler-bind
(defun test-errors-normal ()
  (handler-bind
      ((foo #'(lambda (c)
                (declare (ignore c))
                (invoke-restart 'print-error-message))))
    (pmap 'vector #'(lambda (x)
              (declare (ignore x))
              (restart-case (error 'foo)
                (print-error-message () "error!")))
          '(1 2 3 4 5))))

We declare a handler-bind in the current thread, and we invoke the restart print-error-message when we encounter an error of type foo.

Then we have a single pmap task inside the handler-bind. Notice that we define the restart-case inside the lambda function passed to pmap.

Now, inside the lambda function, we explicitly signal foo. Our expectation then is that the result of the operation is a vector of size 5, with each element being “error!”, right? Wrong! Here’s what we get instead:

Condition CONDS-RESTARTS-USER::FOO was signalled.
   [Condition of type CONDS-RESTARTS-USER::FOO]

Restarts:
 0: [PRINT-ERROR-MESSAGE] CONDS-RESTARTS-USER::PRINT-ERROR-MESSAGE
 1: [TRANSFER-ERROR] Transfer this error to a dependent thread, if one exists.
 2: [KILL-ERRORS] Kill errors in workers (remove debugger instances).
 3: [ABORT] abort thread (#<THREAD "lparallel" RUNNING {1002A01EF3}>)

So what happened? The transfer-error restart case presents a clue. The reason the code didn’t’ work is because the error was spawned in a different context (inside a task), whereas we are trying to handle it in the current thread. To fix this, we can modify the code so that handler-bind is places inside the lambda function itself, in the same thread context:

;;; error handling with handler-bind modified
(defun test-errors-normal-modified ()
  (pmap 'vector #'(lambda (x)
                    (declare (ignore x))
                    (handler-bind
                        ((foo #'(lambda (c)
                                  (declare (ignore c))
                                  (invoke-restart 'print-error-message))))
                      (restart-case (error 'foo)
                        (print-error-message () "error!"))))
        '(1 2 3 4 5)))

Take it for a spin:

CONDS-RESTARTS-USER> (test-errors-normal-modified)

#("error!" "error!" "error!" "error!" "error!")

And now we see the correct output! However, this approach does not scale. Imagine having 100 tasks, each with its own handler-bind! This is one of the compelling reasons we should use what the library provides us – lparallel:task-handler-bind as we shall see next.

The solution

Top

The lparallel:task-handler-bind version of the code looks so:

;;; error handling with task-handler-bind
(defun test-errors-lparallel ()
  (task-handler-bind
      ((foo #'(lambda (c)
                (declare (ignore c))
                (invoke-restart 'print-error-message))))
    (pmap 'vector #'(lambda (x)
              (declare (ignore x))
              (restart-case (error 'foo)
                (print-error-message () "error!")))
          '(1 2 3 4 5))))

And the output is exactly what we expect:

CONDS-RESTARTS-USER> (test-errors-lparallel)

#("error!" "error!" "error!" "error!" "error!")

All we did was to replace handler-bind with lparallel:task-handler-bind in the original code!

Note: You can still override the behaviour per task using: (lparallel:task-handler-bind ((error #’invoke-transfer-error)…), which automatically transfers the error to a thread capable of providing a proper restart for the error condition (if available), by using (lparallel:task-handler-bind ((error #’invoke-transfer-error) …) to always trigger the debugger (good for interactive mode).

Let’s move on now to the demo to complete this whole series!

Demos

Top

The best way of observing performance differences between parallel and non-parallel operations is through a real example (albeit a simple one).

Prime number generation

Top

The code:

;;;; A  benchmarking demo using prime number generation.

(defpackage :benchmarking-demo
  (:use :cl :lparallel))

(in-package :benchmarking-demo)

;;; error conditions
(define-condition prime-number-error (error) ())

(defun primep (x)
  (cond ((<= x 0)
         (error 'prime-number-error))
        ((= x 1)
         nil)
        ((= x 2)
         t)
        (t (loop for i from 2 to (floor (sqrt x))
              when (zerop (mod x i))
              do (return nil)
              finally (return t)))))

;;; prime number generation
(defun gen-prime-numbers (start end)
  (premove-if-not #'(lambda (x)
                      (restart-case (if (primep x) t nil)
                        (just-continue () nil)))
                  (loop for i from start to end
                     collect i)))

(defun prime-client ()
  (task-handler-bind
      ((prime-number-error #'(lambda (c)
                               (declare (ignore c))
                               (invoke-restart 'just-continue))))
    (dotimes (i 1000000000000)
      (gen-prime-numbers (1+ i) (+ i 1000000))
      (incf i 1000000))))

This is a direct implementation of the basic prime number generation algorithm – test from 2 upto sqrt(number) for divisibility. I’m basically creating 1e6 chunks of 1e6 numbers each for the prime number test.

premove-if-not simply filters out the prime numbers from the list that is created from the start and end arguments to gen-prime-numbers.

Core Usage during Prime Number generation using lparallel

The code took a long long time to run, and I could hear the poor machine hissing in protest (I just killed the process after 15 minutes), but on the bright side, all the cores were overloaded full time. Note that I don’t collect the generated numbers into a list because that would definitely have crashed SLIME in any case if I had let it run on.

I had contemplated doing another demo with matrix multiplication, but from an edificational perspective, this single demo seems to have done the job, so I’ll skip matrix multiplication for now.

References

Top

Some additional useful references (definitely check out the video in the second link. Patrick Stein’s tutorial using a simple range class example is most excellent):

That concludes this series on Concurrency and Parallelism using Common Lisp! Next up, we will discuss another extremely important topic – interop between languages. That will also be a mini-series of sorts, and I might throw in a random but useful post in between (depending on what interests me at that point!).

Till then, happy hacking!

Advertisements
Basic Concurrency and Parallelism in Common Lisp – Part 4b (Parallelism using lparallel – Error Handling)

Conditions and Restarts in Common Lisp

One of the most powerful features of the Common Lisp ecosystem is the Conditions system. The first time I heard of this being touted as one of the strengths of Lisp, I was completely flummoxed because I thought they were referring to the ‘cond’ construct! That was a very embarrassing moment for me indeed. Thankfully, I now have a much better understanding of Lisp than I had just a few weeks ago.

If you have ever worked with any Common Lisp distribution using emacs and SLIME, then you have undoubtedly been exposed to the Conditions system. The moment you screw up with anything, you land in what looks like the debugger – there are multiple numbered options available to you along with a comprehensive stack trace. The first few times I did land there, I was completely lost. I had no idea that this was part of the process of becoming a well-rounded Lisp developer. It is a pretty intimidating sight to say the least. Having had much more experience (and having screwed up pretty badly multiple times), the Conditions system now comforts me rather than anything else! It is a beautifully designed system much superior to any other error/condition handling system that I have ever come across.

Conditions and Restarts are a devilishly difficult concept to wrap your head around and use productively (at least for me). The twist is that the concepts are by themselves simple, but truly understanding how the entire system works is not quite that straightforward. I had to experiment with a lot of throwaway code to finally start seeing how the entire system works. In this post, I will attempt to share my (admittedly rudimentary) knowledge of the whole Conditions and Restarts mechanism.

What is a “condition” after all? Well, just like in languages that support exception handling (Java, C++, Python, etc.), a condition represents, for the most part, an “exceptional” situation. However, even more so that those languages, a condition in Common Lisp can represent a general situation where some branching in program logic needs to take place, not necessarily due to some error condition. Due to the highly interactive nature of Lisp development (the Lisp image in conjunction with the REPL), this makes perfect sense in a language like Lisp rather than say, a language like Java or even Python, which has a very primitive REPL. In most cases, however, we may not need (or even allow) the interactivity that this system offers us. Thankfully, the same system works just as well even in non-interactive mode.

Three conceptual levels, two general modes of use

As I figure it, there are three general levels of abstraction at which the Conditions & Restarts system works, and two general ways in which we can make use of this system. I will give examples of both here.

The first conceptual level (at the lowest level) would be defining and throwing the error/condition. Defining a condition is done using the DEFINE-CONDITION macro. If we are handling error conditions specifically (as we’ll assume for the rest of this post), we should extend from the “error” class. For instance, we can define a condition of type “foo” as follows:

(define-condition foo (error)
    ((message :initarg :message :reader error-message)))

Note that defining error conditions is orthogonal to the whole system of signalling , restarting and handling error conditions. “foo”, for instance, can be used by as many functions as we wish. This is pretty much still along the lines of exception classes in OOP languages.

Conditions can be “thrown” (or rather signalled) from code using, amongst others, “error”. We can use MAKE-CONDITION to create an instance of a condition class by suppling it with the initargS, or we can simply create an instance on the fly as, for instance:

(defun foo-thrower ()
    (let ((num (random 2)))
        (when (zerop num)
            (error 'foo :message "foo was thrown!"))))

The second conceptual level is that of restarting the process (this may seem counter-intuitive at first, but bear with me as all will become clear in good time). Now the beauty of the Common Lisp Condition system starts becoming clearer – in brief, the generalised mode of execution is as follows: some code signals an error condition, higher level code defines different modes of restart options for the process, and the actual restart mode is chosen by error handling strategies defined at a higher level of abstraction. The advantage of this approach is that the function call stack unwinds only as much as is needed to execute the restart. If we had error handling code directly instead of restart code, the stack would already have unwound to lose most (if not all) context of the actual low-level code that needed to be run. As such, the higher level code would have to either duplicate the lower level code and redo the process, or simply continue after logging the error condition (as might happen in a language like Java).

Just a small note before an actual example: restarting cases basically offer ways to redo whatever process we were trying to execute in the first place, albeit with different parameters or modes of execution. Now, a restart case is created using the RESTART-CASE macro. For instance,

(defun foo-restarter ()
    (restart-case (foo-thrower)
        (just-continue () nil)
        (retry () (foo-thrower))))

In this trivial case, we provide two restart cases – just-continue, which simply continues after returning a value of nil, and retry, which calls the same function again and hopes for a different result (which is justified in this case due to the random variable we use in foo-thrower to conditionally emit the error condition).

A very important note to mention at this point is that at the same conceptual level here, we could eschew restarting all together and handle the error conditions directly. This is done using the HANDLER-CASE macro. This would look like this:

(defun foo-handler ()
    (handler-case (foo-thrower)
        (foo (foo-obj) 
            (format t "error signalled: ~a~%" (error-message foo-obj)))))

handler-case has the same basic structure as restart-case. The difference is that the error handling stops right there in the case of handler-case. This is similar to the try-catch construct in OOP languages. However, maintaining restarting code at this conceptual level is more flexible and scalable.

The third and final conceptual level is that of handling the error conditions, or defining the restarting strategies. This is done at the highest levels of code in the call stack, and of course, we can have different strategies in different functions that call the same low-level functions, or we could even have multiple strategies defined in the same function. The main macros of interest here are: HANDLER-BIND which does the actual binding of error conditions to handling strategies, and INVOKE-RESTART, which actually invokes the specific restart case defined in lower level code. A point to note here is that restart cases are simply names or symbols. They are not objects. They are useful only to allow us to refer to the specific logic they include by name. Following the same ‘foo’ based example, we could define an error handler as follows (say, in case we wish to simply retry the function invocation):

(defun foo-client ()
    (handler-bind
        ((foo (lambda (c)
                  (format t "error: ~s~%" (error-message c))
                  (invoke-restart 'retry))))
        (foo-handler)))

Note that the actual code which is wrapped by the handler-bind is the code that contains the restart cases (foo-handler) and not the actual function that signals foo.

Now let’s demonstrate all these concepts together in a single, coherent example.

(define-condition first-not-number (error)
    ((message :initarg :message :reader error-message)))

(define-condition second-not-number (error)
    ((message :initarg :message :reader error-message)))


(defun get-new-value (param)
    (format *query-io* "Enter a new value for ~s: " param)
    (force-output *query-io*)
    (list (read)))

(defun add (x y)
    (restart-case (cond ((and (realp x) (realp y))
                         (+ x y))
                        ((not (realp x))
                         (error 'first-not-number 
                                 :message "param 1 not a number"))
                        ((not (realp y))
                         (error 'second-not-number 
                                 :message "param 2 not a number")))
        (return-zero () 0)
        (return-random-value () (random 100))
        (restart-with-new-first (x)
            :report "Supply a new value for first param"
            :interactive (lambda () (get-new-value 'x))
            (add x y))
        (restart-with-new-second (y)
            :report "Supply a new value for second param"
            :interactive (lambda () (get-new-value 'y))
            (add x y))
        (just-continue () nil)))


(defun add-client ()
    (let ((x nil)
          (y nil))
        (princ "Enter the first number: ")
        (setf x (read))
        (princ "Enter the second number: ")
        (setf y (read))
        (handler-bind
            ((first-not-number (lambda (c)
                                   (format t "error: ~s~%" (error-message c))
                                   (invoke-restart 'restart-with-new-first)))
             (second-not-number (lambda (c)
                                    (format t "error: ~s~%" (error-message c))
                                    (invoke-restart 'restart-with-new-second))))
            (add x y))))

Explanation: In our trivial example, we define a function ‘add’ that tries to add together two numbers. We need to validate the parameters to ensure that they are numbers before we can add them. Of course, we could simply check them in code and fast fail or insert asserts to ensure that they are numbers. However, the point of this exercise is to demonstrate how error handling works so forgive me this transgression!

First off, we define two error conditions that check for the first and second parameters respectively. Then we define the actual add function that contains the restart cases. The :report keyword argument is used to provide custom messages in the Conditions UI when the error is actually encountered. Similarly, the :interactive keyword argument is used to enter a new value in the same screen. Of course, if we were writing a non-interactive application, we could simply forego :interactive and inject our own value instead.

Note, in particular, the restart cases restart-with-new-first and restart-with-new-second. The value that is returned by the get-new-value function is passed in as the parameter defined in parentheses and is not handled directly by us. Moreover, get-new-value returns a list (even if it is a single value) – the reason is that the Conditions system actually uses the “apply” function to retrieve the new value(s). The apply function requires at least the last argument to be a list, and so we need to return a list of the new value that we read in.

Finally we define the actual error handling strategies in our add-client function. In this case, we only handle two of the restart cases that could arise. However, other clients may choose to invoke any of the other defined restart cases.

All in all, the Common Lisp Conditions and Restarts system is a magnificent piece of work. Kudos to Kent Pitman for his brilliant work!

 

 

Conditions and Restarts in Common Lisp