# Functional Programming – a Quick and Dirty Introduction

### What is Functional Programming?

Functional Programming is, despite its newfound popularity in recent times, a rather old programming paradigm. What exactly constitutes Functional Programming is a surprisingly difficult concept to explain today. Part of the reason why that is so is because of the fact that almost every language tends to do things in its own way and call it “functional”, and even developers tend to use the term for a wide variety of situations. In a sense, it has been reduced to a mere buzzword. Ultimately, there is no real consensus as to what exactly Functional Programming is. In this blog, I will try and give my own perspective on the whole situation and try and make sense of this highly messy situation we find ourselves in today.

Functional Programming is, historically speaking, considered to be a paradigm of programming in which programming is done using only functions. What do we mean by function? We, of course, mean mathematical functions. A function in mathematics is a mapping of each value taken from a set of values (called the Domain of the function) to a unique value in another set (which may be the same set, and which is called the Range of the function). There are various types of functions – bijective, injective, surjective, identity, constant, empty, etc. However, we need not bother ourselves with the distinction. The only important bit to remember is that all functions (whatever sub-type they may be) have the following two essential properties:

• Each element in the Range set is used for “mapping”
• Each such element in the Range had a unique value in the Range

For instance, suppose we have $f(x) = sin(x)$, $f(x)$ here defines a function. Let us plot for this function to better understand this function:

Here we have plotted the graph for the $f(x)$ using Wolfram Alpha. The function is clearly periodic (in fact, it has a period of 2π radians), and we can clearly see from the graph that the Domain is the set of all real numbers, and the Range is the set of real numbers in the bounded range [-1, 1]. It is easy to see that both the constraints incumbent upon a function are fulfilled – each real number in the Domain is used (continuous curve), and each real number in the Domain has a unique value in the Domain (single curve). Note that two elements in the Domain may have the same value in the Range. There is no restriction on that. So just why are we flogging this horse to death? Well, it’s because it is vitally important to unambiguously understand what a function is and what it is not. A lot of times, people mistake a relation for a function. A relation is a superset of a function. A relation may map the same element in the Domain to multiple elements in the Range. For instance, the relation “is a son of” is a Relation and not a Function. This is because a person is the son of both his mother as well as his father. What we’re interested in in Functional Programming is the concept of a function.

Functional Programming is therefore a paradigm (note the emphasis – the reason will shortly become clear) of programming in which functions form the basis of all processing. Historically, it has its earliest roots in the Lambda Calculus. Alonzo Church developed this beautifully elegant framework early in the 20th Century, and in its most extreme form, even data was created using only single-parameter functions (Church Numerals). However, even if we assume that data exists in a different world compared to functions, the basic implication of using the idea of a mathematical function in programming is that for every input to a function, we get a unique value, and that this is repeatable ad infinitum. This is also known as “referential transparency”, which means that in compiled languages, if we can determine the parameter to the function, we can replace function calls with that parameter to a constant! This behaviour also means that we can run such functions in parallel, and testing becomes trivial.

As mentioned in the last paragraph, a language doesn’t necessarily have to ensure that it supports only functions (in a mathematical sense). The only constraints are writing functions that generate their output only on the basis of their parameters (no globals involved), always generating the same output for the same argument(s), and ensuring that no side-effects take place within the body of the function. This means that any language can be written in a functional style. However, as we will see in the next section, there are some concepts that are considered the core concepts of Functional Programming, and different languages come with different levels of built-in support for these concepts.

### Functional Programming concepts

Functional Programming today is considered to consist of the following core ideas:

• First-Class functions and Higher-Order functions:These are related concepts which (from a programming perspective) mean that in a language that supports first-class functions (or higher-order functions), a function is treated on par with any other language entity. This means that functions can be bound to variables, defined within other functions, passed to other functions, and returned from other functions. Most of the strength of Functional Programming derives from these features, especially from two crucial ideas – Currying, and Partial Application.

Currying and Partial Application are intimately connected with each other. To understand them better, let us take the help of a simple example. Consider that we wish to write a function that takes three parameters and produces a result. Here’s how we might write it in Haskell:

mult :: (Num a) => (a, a, a) -> a
mult (x, y, z) = x * y * z

*Main> mult (1, 2, 3)
6


As expected, it evaluates the product correctly. Nothing special here – this is exactly how we would write this function in any run-of-the-mill imperative language.

Now let’s try something else. Let’s try to write this as a function that will accept one parameter at a time:

mult' :: (Num a) => a -> (a -> (a -> a))
mult' x y z = x * y * z

*Main> let f = mult' 1
*Main> :t f
f :: Num a => a -> a -> a

*Main> let g = f 2
*Main> :t g
g :: Num a => a -> a

*Main> let h = g 3
*Main> :t h
h :: Num a => a

*Main> h
6


So what changed compared to the previous function? Observe that the type of the function has changed. Previously, it required a triplet as a single parameter. Here’s how one would read the type signature: mult’ is a function that takes a numeric parameter and returns a function that takes a numeric parameter and returns a function that takes a numeric parameter and which finally returns a numeric value as the return value of the function.

The main advantage of this is that this allows us to make our multiplication function out of functions of a single parameter each. To notice that better, look at the type signatures of ‘f’, ‘g’, and ‘h’ at each stage. They correspond exactly with the verbal description of mult’. Functions like mult’, which implement multi-parameter functions in terms of functions that take a single parameter each are called “curried” functions. ‘f’, ‘g’, and ‘h’ demonstrate “partial application” of the overall mult’ function whereby we can build up a series of functions from a curried function by supplying an argument at a time. Yes, even ‘h’, which is a value can be considered to be a function in Functional Programming land! First-class functions, remember? Also note how both these concepts point back to the Lambda Calculus as discussed in the previous section. In fact, in order to make this equivalence blatantly clear, we could implement mult’ explicitly in terms of lambdas of a single parameter each (note that \x is the Haskell syntax for a lambda expression that takes the parameter ‘x’):

mult'' :: (Num a) => a -> a -> a -> a
mult'' = \x -> \y -> \z -> x + y + z

Main> let f = mult' 1
*Main> :t f
f :: Num a => a -> a -> a

*Main> let g = f 2
*Main> :t g
g :: Num a => a -> a

*Main> let h = g 3
*Main> :t h
h :: Num a => a

*Main> h
6


Exactly the same result as expected. In fact, all functions in Functional Programming can be considered to be built up of single-parameter functions. Nifty, isn’t it?

• Pure Functions:
A pure function is a function whose return value is determined entirely by its arguments, and which returns the same value for the same set of arguments each time the function is invoked. As mentioned earlier, this also means that the function must not have any side-effects (such as modifying a global variable, or even accessing the value of a mutable global variable). This also automatically provides a favourable property called “Referential Transparency” as explained before.Since I/O is the major source of side-effects, how can we do anything useful with such constaints? Well, different languages deal with this conundrum in different ways – Haskell tries to remain purely functional by using the concept of a “Monad”. Specifically, it uses the IO Monad to wrap the IO operations within an abstraction that takes the current state of the world, performs the modifications, and then returns this modified state as the new state of the world. Mathematically speaking, this ensures purity in Haskell even during IO, but of course, side-effects have happened in the meantime and the whole point appears moot. Other languages such as Clojure opt for a more pragmatic approach and accept some side-effects as a fact of life. Clojure provides some very efficient immutable data-structures that ensure that mutation is almost never needed. It also has a very strong STM (Shared Transactional Memory) to wrap IO operations within transactions that ensure that the state of the entire system remains consistent. Yet other languages such as Common Lisp put the onus of ensuring proper handling of side-effects on the user.
• Recursion:Recursion is the process whereby a function calls itself (or another function). This is, apart from Higher-Order Functions, the most important concept in practical Functional Programming. In fact, most of the earlier Lisps, especially Scheme, depended tremendously on recursion as a substitute for looping (which is another major source of side-effects). This is why most of the languages in the Lisp family (especially Common Lisp and Scheme) have highly-optimised implementations of operations on the list data-structure (a list is how S-expressions in Lisp are represented).

Strangely enough, in Common Lisp, the standard does not mandate Tail-Call Optimisation (TCO). However, most implementations come bundled with TCO. TCO is simply a way of converting recursive calls in the “tail-call position” to a simple branch (goto). This ensures that the stack does not blow up when executing deeply recursive calls. To get an idea of this, consider the following example:

The factorial of a number is defined as:

$n! = \begin{cases} 1 & if n = 0\\ n(n-1) & if n \geq 1 \end{cases}$

This may be directly translated into code in Common Lisp as:

CL-USER> (defun factorial (n)
(if (zerop n)
1
(* n (factorial (1- n)))))
FACTORIAL

CL-USER> (factorial 20)
2432902008176640000


However, the above code will eventually blow up the stack even for small values of ’n’. To ensure that we don’t end up with stack overflow errors, let’s convert this program into a TCO version of the same algorithm:

CL-USER> (defun factorial (n)
(labels ((f (n acc)
(if (zerop n)
acc
(f (1- n) (* acc n)))))
(f n 1)))
FACTORIAL

CL-USER> (factorial 100)
93326215443944152681699238856266
70049071596826438162146859296389
52175999932299156089414639761565
18286253697920827223758251185210
916864000000000000000000000000


How is this function TCO but not the previous one? Remember that a function’s arguments are always evaluated before the function call in made. In the previous case, the * operator is the culprit – it tries to multiply the recursive call and the value of n. This ensures that the current state of the function is saved on the stack, and finally the stack is unwound, the multiplications performed, and the result returned.

In the second example, we remove this binding and ensure that the recursive call is independent of the value of the ‘acc’ variable, which contains the running product which is the factorial of the supplied argument. This is called TCO. Note though that many (if not all) Common Lisp implementations may convert the first example into a TCO version anyway. However, in many cases, the compiler can’t perform this magic conversion, and the programmer has to ensure that the recursion is indeed TCO.

• Strict vs Non-Strict (Lazy):
Strict evaluation evaluates an expression and returns the complete value immediately whereas non-strict (better known as lazy evaluation) evaluation only generates as many results as are currently requested.Strict evaluation has the advantage that performance is deterministic with such an evaluation strategy. The main disadvantage is that certain operations are not possible with such a strategy. For instance, suppose we wanted to process a series of numbers and we do not know upfront how many of these numbers we’ll be using. Under strict evaluation this is not possible. We have to know the size of the data set before processing it and if we try to store a large data set, it might cause the whole program to crash.

Non-strict evaluation solves this very situation by processing only as many elements as currently required (this is how generators, used extensively in Python, are built). This enables us to even define infinite streams of data! Non-strict evaluation also has the added benefit that not all branches of a computation need be done. In an eager language, this is not really possible. The main problem with non-strict evaluation is that performance is non-deterministic. This can lead to issues when trying to profile a large software application. This is also the reason that previously purely non-strict languages such as Haskell are providing more and more support for strict evaluation while keeping non-strict evaluation as the default mode. Clojure does this as well.

• Strong Type Systems:A type system is a way of assigning a property called a “type” to entities in the programming language such that they may be classified into sets whose collective properties can then be defined, and which then allows the compiler to check for invalid programs during compilation itself. Even though it may appear that a type system is exclusively associated with statically-typed languages, that is not so. Even dynamic languages may be strongly (Common Lisp) or weakly (JavaScript) typed, and the environment is responsible for defining correct behaviour. The only difference between those two categories of typed languages is that statically-typed languages provide us with a mechanism to explicitly specify types for program entities ourselves.

My personal opinion is that this is not really a core concept of Functional Programming. However, most experts today claim that this is a vital pillar of the whole paradigm. In any case, one cannot truly work with modern functional languages unless one has a good understanding of the different types of type systems in place today. Most of the statically typed functional languages have type systems based on the Hindley-Milner system. Haskell is one such example. Scala, however, despite being strongly typed, cannot fully support Hindley-Milner because of its OOP nature.

### Conclusion and Next Steps

This was a brief but comprehensive introduction to what constitutes Functional Programming today. In the next few posts, I will examine these concepts in a few modern languages (whether considered functional by the Hoi Polloi or otherwise): Common Lisp, Java, and C++. In the near future, I will also examine the same for Python, and most importanly, Haskell. Haskell is arguably the most prominent purely functional language in use today. I will (in the not-too-distant future) write a series of posts discussing and researching Functional Programming in-depth, and my vehicle of choice for that endeavour will be Haskell!