Basic Concurrency and Parallelism in Common Lisp – Part 4b (Parallelism using lparallel – Error Handling)


In this final part, we will discuss the very important topic of error handling, how lparallel handles it, and cap off the series with a small benchmarking example that will tie in all the concepts covered thus far.

The demo will help check actual core usage on the machine when using the lparallel library.

Contents

Initial Setup

There is no additional setup required for this tutorial from the last tutorial.

In case you missed it, please check out the previous post – Parallelism fundamentals using lparallel.

5-minute Error Handling refresher

Before we jump headlong into the demos, here is quick refresher guide to Conditions and Restarts in Common Lisp (error handling). In case you are comfortably familiar with this topic, please skip ahead to the next section.

In case you are a novice interested in getting a more comprehensive treatment of Conditions and Restarts in Common Lisp, I recommend two things – firstly, check out my detailed post on the fundamentals of Conditions and Restarts in Common Lisp, and secondly, check out the links in the References section at the end of this post.

For our refresher, let’s take a simple example. We have a custom square root function. To keeps things simple, let us have a single check to ensure that the argument is zero or positive. We will forego all other validation.

First we define the relevant error condition:

(defpackage :positive-sqrt-user
  (:use :cl))

(in-package :positive-sqrt-user)

;;; define the error condition
(define-condition negative-error (error)
  ((message :initarg :message :reader error-message)))

Now let’s define the square root function itself. It is a simple implementation of the Newton-Raphson algorithm for finding the square root of a positive number (or zero). We take the first approximation/guess as 1.0d0:

(defconstant +eps+ 1e-9)

(defun square-root (n)
  "Find the square root using the Newton-Raphson method."
  (if (< n 0)
      (error 'negative-error :message "number must be zero or positive"))
  (let ((f 1.0d0))
    (loop
       when (< (abs (- (* f f) n)) +eps+)
       do (return f)
       do (setf f (/ (+ f (/ n f)) 2.0d0)))))

Nothing special there. The function simply loops until the candidate square root is within acceptable limits from the actual square root of the argument. For the sake of completion, the key step in the algorithm is the following:

(setf f (/ (+ f (/ n f)) 2.0d0)

This is as per the formula for calculating the next square root approximation at each stage:

x_{n} = \frac{1}{2}\left(x_{n-1}+ \frac{n}{f}\right)

In terms of error handling, we can handle the error in three different canonical ways (amongst others).

First, we can catch and process the error directly (similar to the try-catch-finally construct in some other languages:

;;; handle the error directly
(defun test-sqrt-handler-case ()
  (let ((n (progn
             (princ "Enter a number: ")
             (read))))
    (unwind-protect (handler-case (square-root n)
                      (negative-error (o) (format t "Caught ~a~%" (error-message o)) nil))
      (format t "Nothing to clean up!"))))

Testing it out:

POSITIVE-SQRT-USER> (test-sqrt-handler-case)
Enter a number: 200
Nothing to clean up!
14.142135623730955d0

POSITIVE-SQRT-USER> (test-sqrt-handler-case)
Enter a number: -200
Caught number must be zero or positive
Nothing to clean up!
NIL

Or, we could handle it automatically using a restart. Suppose we want to automatically return 1.0d0 as the result if we encounter an invalid argument to square-root, we could something like this:

;;; automatic restart
(defun test-sqrt-handler-bind ()
  (let ((n (progn
             (princ "Enter a number: ")
             (read))))
    (handler-bind
        ((negative-error #'(lambda (c)
                             (format t "Caught: ~a~%" (error-message c))
                             (invoke-restart 'return-one))))
      (restart-case (square-root n)
        (return-one () 1.0d0)))))

Test run:

POSITIVE-SQRT-USER> (test-sqrt-handler-bind)
Enter a number: 200

14.142135623730955d0
POSITIVE-SQRT-USER> (test-sqrt-handler-bind)
Enter a number: -200
Caught: number must be zero or positive
1.0d0

Of course, the real usefulness of this scheme is realised when we have more restart cases available than these trivial ones.

And finally, we could handle it interactively, which allows us to enter a new value for the argument to square-root. (This interactive mode of development/operation is unique to the Lisp world).

(defun read-new-value ()
  (format *query-io* "Enter a new value: ")
  (force-output *query-io*)
  (multiple-value-list (read)))

;;; Interactive restart
(defun test-sqrt-interactive ()
  (let ((n (progn
             (princ "Enter a number: ")
             (read))))
    (restart-case (square-root n)
      (return-nil () nil)
      (enter-new-value (num)
        :report "Try entering a positive number.”
        :interactive (lambda () (read-new-value))
        (square-root num)))))

Test drive!

POSITIVE-SQRT-USER> (test-sqrt-interactive)
Enter a number: 200

14.142135623730955d0

POSITIVE-SQRT-USER> (test-sqrt-interactive)
Enter a number: -200

Condition POSITIVE-SQRT-USER::NEGATIVE-ERROR was signalled.
   [Condition of type NEGATIVE-ERROR]

Restarts:
 0: [RETURN-NIL] RETURN-NIL
 1: [ENTER-NEW-VALUE] Try entering a positive number.
 2: [RETRY] Retry SLIME REPL evaluation request.
 3: [*ABORT] Return to SLIME's top level.
 4: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1004E98003}>)

Enter a new value: 200

14.142135623730955d0

Error Handling in lparallel

Top

lparallel provides the lparallel:task-handler-bind construct. This is, for all means and purposes, equivalent to the handler-bind construct in Common Lisp. However, it is optimised for error handling inside of parallel tasks launched using the lparallel library.

The problem

Top

Why is this important? Well, take the following example for instance:

(define-condition foo (error) ())

;;; error handling with handler-bind
(defun test-errors-normal ()
  (handler-bind
      ((foo #'(lambda (c)
                (declare (ignore c))
                (invoke-restart 'print-error-message))))
    (pmap 'vector #'(lambda (x)
              (declare (ignore x))
              (restart-case (error 'foo)
                (print-error-message () "error!")))
          '(1 2 3 4 5))))

We declare a handler-bind in the current thread, and we invoke the restart print-error-message when we encounter an error of type foo.

Then we have a single pmap task inside the handler-bind. Notice that we define the restart-case inside the lambda function passed to pmap.

Now, inside the lambda function, we explicitly signal foo. Our expectation then is that the result of the operation is a vector of size 5, with each element being “error!”, right? Wrong! Here’s what we get instead:

Condition CONDS-RESTARTS-USER::FOO was signalled.
   [Condition of type CONDS-RESTARTS-USER::FOO]

Restarts:
 0: [PRINT-ERROR-MESSAGE] CONDS-RESTARTS-USER::PRINT-ERROR-MESSAGE
 1: [TRANSFER-ERROR] Transfer this error to a dependent thread, if one exists.
 2: [KILL-ERRORS] Kill errors in workers (remove debugger instances).
 3: [ABORT] abort thread (#<THREAD "lparallel" RUNNING {1002A01EF3}>)

So what happened? The transfer-error restart case presents a clue. The reason the code didn’t’ work is because the error was spawned in a different context (inside a task), whereas we are trying to handle it in the current thread. To fix this, we can modify the code so that handler-bind is places inside the lambda function itself, in the same thread context:

;;; error handling with handler-bind modified
(defun test-errors-normal-modified ()
  (pmap 'vector #'(lambda (x)
                    (declare (ignore x))
                    (handler-bind
                        ((foo #'(lambda (c)
                                  (declare (ignore c))
                                  (invoke-restart 'print-error-message))))
                      (restart-case (error 'foo)
                        (print-error-message () "error!"))))
        '(1 2 3 4 5)))

Take it for a spin:

CONDS-RESTARTS-USER> (test-errors-normal-modified)

#("error!" "error!" "error!" "error!" "error!")

And now we see the correct output! However, this approach does not scale. Imagine having 100 tasks, each with its own handler-bind! This is one of the compelling reasons we should use what the library provides us – lparallel:task-handler-bind as we shall see next.

The solution

Top

The lparallel:task-handler-bind version of the code looks so:

;;; error handling with task-handler-bind
(defun test-errors-lparallel ()
  (task-handler-bind
      ((foo #'(lambda (c)
                (declare (ignore c))
                (invoke-restart 'print-error-message))))
    (pmap 'vector #'(lambda (x)
              (declare (ignore x))
              (restart-case (error 'foo)
                (print-error-message () "error!")))
          '(1 2 3 4 5))))

And the output is exactly what we expect:

CONDS-RESTARTS-USER> (test-errors-lparallel)

#("error!" "error!" "error!" "error!" "error!")

All we did was to replace handler-bind with lparallel:task-handler-bind in the original code!

Note: You can still override the behaviour per task using: (lparallel:task-handler-bind ((error #’invoke-transfer-error)…), which automatically transfers the error to a thread capable of providing a proper restart for the error condition (if available), by using (lparallel:task-handler-bind ((error #’invoke-transfer-error) …) to always trigger the debugger (good for interactive mode).

Let’s move on now to the demo to complete this whole series!

Demos

Top

The best way of observing performance differences between parallel and non-parallel operations is through a real example (albeit a simple one).

Prime number generation

Top

The code:

;;;; A  benchmarking demo using prime number generation.

(defpackage :benchmarking-demo
  (:use :cl :lparallel))

(in-package :benchmarking-demo)

;;; error conditions
(define-condition prime-number-error (error) ())

(defun primep (x)
  (cond ((<= x 0)
         (error 'prime-number-error))
        ((= x 1)
         nil)
        ((= x 2)
         t)
        (t (loop for i from 2 to (floor (sqrt x))
              when (zerop (mod x i))
              do (return nil)
              finally (return t)))))

;;; prime number generation
(defun gen-prime-numbers (start end)
  (premove-if-not #'(lambda (x)
                      (restart-case (if (primep x) t nil)
                        (just-continue () nil)))
                  (loop for i from start to end
                     collect i)))

(defun prime-client ()
  (task-handler-bind
      ((prime-number-error #'(lambda (c)
                               (declare (ignore c))
                               (invoke-restart 'just-continue))))
    (dotimes (i 1000000000000)
      (gen-prime-numbers (1+ i) (+ i 1000000))
      (incf i 1000000))))

This is a direct implementation of the basic prime number generation algorithm – test from 2 upto sqrt(number) for divisibility. I’m basically creating 1e6 chunks of 1e6 numbers each for the prime number test.

premove-if-not simply filters out the prime numbers from the list that is created from the start and end arguments to gen-prime-numbers.

Core Usage during Prime Number generation using lparallel

The code took a long long time to run, and I could hear the poor machine hissing in protest (I just killed the process after 15 minutes), but on the bright side, all the cores were overloaded full time. Note that I don’t collect the generated numbers into a list because that would definitely have crashed SLIME in any case if I had let it run on.

I had contemplated doing another demo with matrix multiplication, but from an edificational perspective, this single demo seems to have done the job, so I’ll skip matrix multiplication for now.

References

Top

Some additional useful references (definitely check out the video in the second link. Patrick Stein’s tutorial using a simple range class example is most excellent):

That concludes this series on Concurrency and Parallelism using Common Lisp! Next up, we will discuss another extremely important topic – interop between languages. That will also be a mini-series of sorts, and I might throw in a random but useful post in between (depending on what interests me at that point!).

Till then, happy hacking!

Advertisements
Basic Concurrency and Parallelism in Common Lisp – Part 4b (Parallelism using lparallel – Error Handling)

4 thoughts on “Basic Concurrency and Parallelism in Common Lisp – Part 4b (Parallelism using lparallel – Error Handling)

  1. Thank you very much for this series of posts, and all your blog. Very insightful posts written with clear style, and with lots of examples!

    Btw, a typo in the link to your ‘Conditions and restarts’ . it has an extra quote in the link.

    Like

    1. Timmy Jose says:

      Thank you! I’m using this blog as a sort of scratchpad as I continue in my own learning! Your kind words are very much appreciated :).

      I’ll get about fixing the typos. I really need to proofread the posts better.

      Like

Speak your mind!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s