Compile time evaluation in Rust macros… or not (contrast with Common Lisp)

Rust macros are a great help in reducing boilerplate as well as creating tools to perform advanced code manipulation at compile time (the nom crate comes to mind). However, I did run into its limitations (again) when I started tinkering with a small idea.

Well, the idea that I had is quite simple – create a macro that will reverse the words of a string, but defer checking of types to runtime (so that correct entries would still produce output). However, this turned out to be not so simple, again due to the fact that Rust macros apparently do not provide a way to – perform compile-time checks, or eval code during macroexpansion (like Lisp and Scheme/Racket macros do).

Here was my first shot at creating that simple macro in Rust:

use std::any::Any;
use std::io::{self, Write};

fn is_string(s: &Any) -> bool {
    s.is::<String>()
}

macro_rules! reverse_string {
    ($string: expr) => {{
                if !is_string($string) {
                    writeln!(io::stderr(), "{:?} must be a String", $string).unwrap();
                    std::process::exit(1);
                }
                                  
                let mut rev_string = String::new();

                for word in $string.split_whitespace().rev() {
                    rev_string.push_str(word);
                    rev_string.push(' ');
                }
                rev_string
    }};
}


fn main() {
    let my_string = "Hello, world. How do you do?".to_string();
    println!("{}", reverse_string!(&my_string));
}

This works as expected, of course:

Macushla:Type_Checking_Macro_Rust z0ltan$ rustc reverse.rs && ./reverse
do? you do How world. Hello,

Now, suppose I added a call to the same macro, reverse_string on an integer instead of a string, then the results are not quite what I wanted:

fn main() {
    let my_string = "Hello, world. How do you do?".to_string();
    println!("{}", reverse_string!(&my_string));

    let my_num = 99;
    println!("{}", reverse_string!(&my_num));
}

Running which gives:

Macushla:Type_Checking_Macro_Rust z0ltan$ rustc reverse.rs && ./reverse
error[E0599]: no method named `split_whitespace` found for type `&{integer}` in the current scope
  --> reverse.rs:17:37
   |
17 |                 for word in $string.split_whitespace().rev() {
   |                                     ^^^^^^^^^^^^^^^^
...
31 |     println!("{}", reverse_string!(&my_num));
   |                    ------------------------ in this macro invocation

error: aborting due to previous error(s)

And what I really wanted was to see output for the string argument, and then the nice error message that I generate inside the macro. So what’s going on? Why can’t I generate the template, so to speak, of the macroexpansion and defer the actual error checking to runtime? Let’s expand the macro and take a peek:

Macushla:Type_Checking_Macro_Rust z0ltan$ rustc -Z unstable-options --pretty expanded reverse.rs

This produces a ton of output, but the relevant part of the expanded macro is here:

 let my_num = 99;
    ::io::_print(::std::fmt::Arguments::new_v1(
        {
            static __STATIC_FMTSTR: &'static [&'static str] = &["", "\n"];
            __STATIC_FMTSTR
        },
        &match (&{
            if !is_string(&my_num) {
                io::stderr()
                    .write_fmt(::std::fmt::Arguments::new_v1(
                        {
                            static __STATIC_FMTSTR: &'static [&'static str] =
                                &["", " must be a String\n"];
                            __STATIC_FMTSTR
                        },
                        &match (&&my_num,) {
                            (__arg0,) => {
                                [::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Debug::fmt)]
                            }
                        },
                    ))
                    .unwrap();
                std::process::exit(1);
            }
            let mut rev_string = String::new();
            for word in &my_num.split_whitespace().rev() {
                rev_string.push_str(word);
                rev_string.push(' ');
            }
            rev_string
        },) {
            (__arg0,) => {
                [
                    ::std::fmt::ArgumentV1::new(__arg0, ::std::fmt::Display::fmt),
                ]
            }
        },
    ));

These, of course, correspond to the fully expanded form of the following two lines:

    let my_num = 99;
    println!("{}", reverse_string!(&my_num));

Now we begin to see why the code doesn’t work as expected. Here’s how it works –
macroexpansion happens as part of the overall compilation phase. During this time, the Rust Type Checker is still very much active (so we cannot inject arbitrary code that doesn’t satisfy the Type Checker or the Borrow Checker). Now, Rust doesn’t really have a way to “escape” or defer the actual checking till runtime. This is as much due to the Type Checker as to the fact that the Rust macro system does not provide such means (as Lisp or Scheme/Racket macros do).

So, in this case, the Type Checker sees this snippet: for word in &my_num.split_whitespace().rev(), realises that we are trying to call split_whitespace on an i32 variable, and immediately stops with a compilation error.

The other part (though not directly relevant here) is that all the defensive error checks using if !is_string(...) wouldn’t really work even if we were to try to check that at compile time, since Rust macros do not have, as far as I know, any way of doing compile-time conditional checking.

So, at this point I just stopped with the Rust version. Now, let’s try and implement the same macro using Common Lisp:

(defmacro reverse-string (x)
  "reverse the words of the string, error checking done at runtime"
  `(if (not (stringp ,x))
       (error "~a must be a string, not a ~a~%" ,x (type-of ,x))
       ,(let ((collect (gensym))
              (lst (gensym))
              (f (gensym))
              (s (gensym)))
             `(labels ((,collect (,lst)
                         (reduce (lambda (,f ,s)
                                   (concatenate 'string ,f " " ,s)) ,lst)))
                (,collect (reverse (loop for i = 0 then (1+ j)
                                      as j = (position #\Space ,x :start i)
                                      collect (subseq ,x i j)
                                      while j)))))))
(defun main ()
  (let ((s "Hello world")
         (d 99))
    (format t "~a reversed is ~a~%" s (reverse-string s))
    (format t "~a reversed is ~a~%" d (reverse-string d))))


(defun view-macro-expansion (form)
  "helper function to display the macro-expanded form for the
   supplied form"
  (macroexpand form))

The only point of interest is the reverse-string macro. It’s pretty much the same logic as in the attempted Rust macro – create a template that checks, at runtime, whether the supplied argument is a string, and if not, generate a proper error message. If indeed the argument is a string, then reverse the words of the original string – this is the bit being done inside the loop macro.

The interesting bit is that the Lisp distro that I use – SBCL, does do rigorous compile-time analysis, and actual gives plenty of notice that it’s deleting redundant code (corresponding to the actual call in main, (format t "~a reversed is ~a~%" d (reverse-string d)) which the compiler realises will never actually be executed). However, the expanded macro has the relevant checks, and the relevant call itself is preserved so that the macro behaves exactly as desired:

CL-USER> (main)
Hello world reversed is world Hello

and, in the Lisp debugger,

99 must be a string, not a (INTEGER 0 4611686018427387903)
   [Condition of type SIMPLE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [REMOVE-FD-HANDLER] Remove #<SB-IMPL::HANDLER INPUT on descriptor 14: #<CLOSURE (LABELS SWANK/SBCL::RUN :IN SWANK/BACKEND:ADD-FD-HANDLER) {1002F80ADB}>>
 3: [ABORT] Exit debugger, returning to top level.

Backtrace:
  0: (MAIN)
  1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MAIN) #<NULL-LEXENV>)
  2: (EVAL (MAIN))

Excellent! And here is how the expanded form of the macro call actually looks like:

CL-USER> (view-macro-expansion '(reverse-string 99))
(IF (NOT (STRINGP 99))
    (ERROR "~a must be a string, not a ~a~%" 99 (TYPE-OF 99))
    (LABELS ((#:G602 (#:G603)
               (REDUCE
                (LAMBDA (#:G604 #:G605)
                  (CONCATENATE 'STRING #:G604 " " #:G605))
                #:G603)))
      (#:G602
       (REVERSE
        (LOOP FOR I = 0 THEN (1+ J) AS J = (POSITION #\  99 :START I)
              COLLECT (SUBSEQ 99 I J)
              WHILE J)))))
T

Of course, I am being a bit unduly harsh on Rust here because Common Lisp, despite all vendor-specific quirks, is still pretty much a dynamic language, so we reasonably expect it to defer most type checking to runtime. In the case of Rust, it is a very strongly-typed static language, so it can ill afford to leave a lot of checking to runtime especially since it is hardly expected to have a runtime to carry out those checks (even though Rust does have a runtime, I suspect it’s quite lightweight). In any case, an interesting little experiment.

Compile time evaluation in Rust macros… or not (contrast with Common Lisp)