Teaching Monads Slightly Differently

Chipping away at Lady Monadgreen’s curse

15 min readMar 16, 2018

“Once you understand monads, you immediately become incapable of explaining them to anyone else”
— Lady Monadgreen’s curse, Gilad Bracha

It’s my monad article! Everyone does one at some point! In this article, I propose a slightly different strategy to teaching monads, and attempt an explanation in that style. Will I break Lady Monadgreen’s curse? Spoiler Alert: probably not.

The two camps of Functional Terminology

In the functional programming community, there is a divide on the correct terminology to use when teaching functional concepts such as Functor and Monad to beginners. There are two main camps:

Side 1: Terms like Functor and Monad are unnecessarily dense and mathematical. Having to handle the twofold burden of both introducing a term and identifying the semantic meaning behind it is too much for a reasonable beginner to bear. We should be using terminology that guides the beginner to understanding, such as Mappable and Chainable.

Side 2: Terms like Functor and Monad are exactly as dense as they need to be. They’re inherently confusing concepts, enough so that even well established programmers get it wrong a lot of the time. Furthermore, why shouldn’t we be using the technical term? There’s no push in Electrical Engineering to call transistors “gateways”, nor should there be. As engineers, we learn the terms of the trade, despite the difficulty. We should be using terminology that is technically correct and does not guide the beginner into a false sense of understanding.

Both strategies are unhelpful to beginners

Both these strategies often introduce a term, then define it (a Functor is f where f …). This is extremely useful for being mathematically precise and correct — it’s the kind of language you’d invoke in a mathematical paper to be as unambiguous and free of error as possible. But I’d argue it’s the wrong way to teach beginners about these concepts. Our goal isn’t to create a mathematically rigorous definition here, our goal is to guide a beginner to build intuition.

Does introducing the term Monad and defining it build intuition? Of course not — this is the problem faced by many, many, beginners, including myself. A quick search on google for “what is a monad” shows what any person struggling to learn statically typed functional languages already know: monads are confusing.

Furthermore, you have to parse through a myriad of different definitions to come to a unified understanding of monads. “Monad is a term from category theory. Does that mean it’ll be easier to be more correct by learning the category theoretic definition of a monad? If so, what’s the correspondence?” There is inherently a choice to be made when learning monads on your own: what level will be the easiest to understand? Expecting beginners to effectively navigate this is unrealistic.

Does lying a little to the beginner by saying Mappable and Chainable work? In the context of functional programming, definitely no. People still believe that pointfree refers to literal ‘.’ characters in their code. There’s an eternal september with functional programming languages, guaranteeing that there will always be people confused by misleading terminology.

Is Chainable any more useful to a beginner to guiding correctness than Monad? Of course not. You have to explicitly know “oh, chain refers to a flatmap which has properties x, y, and z.” You have to know that a chainable even wraps a type, that chainable is inherently mappable, and a whole host of other constraints. Sure, it tells you where you might use a monad, but it doesn’t guide the intuition of what a monad actually is — and that’s a bad thing. Using bad terminology leads to pitfalls on pitfalls. Chainable doesn’t work here.

Neither of these options sound good. I instead propose a third option.

Deduction to Induction

The strategy wherein we first introduce a term like Monad, then figure out the consequences, is inherently deductive in nature. It presupposes the usefulness of the term, then tries to convince you of its utility.

We should instead be teaching monads inductively: We should encapsulate the breadth of monads by showing motivating examples and connecting them together. We should not teach them by defining a weirdly mechanical and arbitrary ruleset, then showing how a bunch of different things might happen to follow that arbitrary ruleset. The connections between monads should be obvious to start. The exact ruleset doesn’t need to be.

We do this all the time when teaching mathematics: We use “what was the mathematical history for this concept?” to essentially proxy for “what’s the motivation behind this?”, to relatively good effect. The people who did best at math in school were those who were comfortable with the motivation, and could if they needed to, re-derive formulae or techniques. In the same vein, I want to provide a path to “What led to this weird structure being named and isolated as a useful concept?”

“So how would you teach monads, Evin?”

Monads, taught inductively

This strategy is largely taken from Jafar Husain’s Intro to Reactive Programming with RxJS. In this series, he NEVER uses the word monad, but manages to teach the essence of working with monads extremely effectively. His method revolves around forming intuitive connections between datatypes, and showing how solving two related problems are roughly the same. So I’m going to go through several toy problems using Javascript, each driving at monads from a slightly different angle.

Round 1: Arrays

Let’s say Javascript has an Array.flatten method, wherein you can take an array of arrays, and flatten it into one array. So, for example:

[[1, 2, 3], [4, 5]].flatten() // evaluates to [1, 2, 3, 4, 5]

Let’s say also that we’ve got a JSON blob which represents users and their tweets:

const users = [
  {
    name: 'Chad Brogrammer',
    tweets: ['lol generics', 'hn is my bestie'],
  },
  {
    name: 'Freddie Hubbard',
    tweets: ['i make trumpet sounds', 'haha good memes there']
  }
  // ...
];

And we want to reduce this down to a log of their tweets, in the following format:

[
  '<Chad Brogrammer> lol generics',
  '<Chad Brogrammer> hn is my bestie',
  '<Freddie Hubbard> i make trumpet sounds',
  ...
]

With Array.flatten in our pocket, we can do a slick little map inside of a map to a flatten to get to our target:

const log = users.map(
  user => user.tweets.map(
    tweet => `<${user.name}> ${tweet}`
  )
).flatten();

First we map over the users, then we map over the tweets. Then we join the username with the tweet, then we flatten the array together. Easy solution to an easy problem! Our goal is achieved!

Round 2: Optionals

To the next toy problem!

First, let’s introduce the concept of an Optional. An Optional is a data type which either does or does not contain a value. In JS class notation, we can write the bare bones of an optional datatype as follows:

class Optional {
  constructor (hasValue, value) {
    this.hasValue = hasValue;
    this.value = value;
  }
}

Which seems just a little bit silly in javascript when you’re first introduced to it, but being able to differentiate between a value that shouldn’t be undefined and one that can be undefined is pretty useful. And because we’ve got a separate class for it, we can introduce two additional methods:

map(): If you’ve got an Optional that contains 1, you should be able to do optional.map(a => a+1) to get an optional that contains two. If the optional doesn’t contain a value, it just ignores the operation.

join(): If you’ve got an Optional that contains another Optional, you should be able to collapse those into one Optional. Is Optional(Nothing) really different than Nothing itself?

With this in mind, let’s set up the problem: I’ve got a path on a website, and I want to find the filename extension off that path. So, for /homepage/index.html, the thing I’m looking for is html. But:

1: a path doesn’t necessarily have a file, e.g. /homepage/

2: a file doesn’t necessarily have an an extension, e.g. /homepage/index

Now I’ve got the functions getFile() and getExtension(), which each may or may not produce a value. In fact, they return an Optional. With a little slickness, we should be able to get an optional that contains only the extension, if the extension is there!

const optionalExtension = getFile(path).map(getExtension).join()

Nice! getFile returns an optional, and mapping over getExtension gets you an optional of an optional, so now you just reduce those into one optional that (possibly) contains the file extension. Another easy problem, easy solution.

Round 3: Observables

Next toy problem, Cookie Clicker edition:

Let’s imagine another datatype, namely an Observable. This one is more difficult to construct than the other two, so if you’re familiar with observables this will go easily. Otherwise, this brief explanation might make sense:

An Observable represents a stream of events. Events can be anything: numbers, strings, mouseclick events, websocket messages, tweets by Donald Trump (or Freddie Hubbard apparently). I’ll represent them like this:

// pretend this represents an observable where time goes -> that way
[.........E1....E2....E4...]

You can map over these. So if I have an observable that fires off values 1, 2, 3, I can make a new one that fires 2, 3, 4, by doing the following:

[.........1.......2....3].map(value => value + 1)
// gives:
[.........2.......3....4]

It turns out this is an extremely useful way of representing events, for reasons I won’t get into here… There are tons of tutorials on observables that go into detail on how and why they work so well.

The truly mind-bending part is when you consider that the events that observables fire can THEMSELVES be observables:

[......O1.....O2.....O3..] // contains the following observables:
       [...a.....b.......] // O1
              [c....d....] // O2
                     [..e] // O3

So, to reiterate, we’ve got an observable. Observables consist of a bunch of events, which themselves are just values. An observable can fire other observables, to get this two-dimensional structure. It’s incredibly neat.

You might already see where this is going: We can also flatten an observable of observables into one single observable. There are a number of ways to do this, but let’s just consider merging all these observables together by just taking all events: Any time one of the inner observables fire an event, we take it.

[......O1.....O2.....O3..].mergeAll() // from above
// gets us:
[..........a...c.b..d...e]

Okay, I know this is a lot, but let’s try a problem here: Let’s say you’re building Cookie Clicker v2, and you want to keep a total of all cookie clicks. To do this, each client that visits your site starts a websocket connection with your server, modeled like an observable:

const clientConnections = [.....Client1...Client2.....Client3..]

And we’ve got getCookieClicks, which given a client, gives us an observable containing cookie click events (notified via websocket):

getCookieClicks(client)
// gives:
[.....click.........click.....click.......]

Can we make an observable that contains all clicks from all clients? You probably can already guess the answer:

const allClicks = clientConnections.map(getCookieClicks).mergeAll();

Neato. We can then increment a counter each time that Observable fires to get a total of the number of cookie clicks.

Pulling the 3 examples together

Those look pretty related, right? In all of these examples, we got this twice wrapped value, then flattened them to be a once wrapped value. So let’s pull out the essential pieces of this common pattern:

1: All of these data structures are essentially wrappers for values.
2: Given a wrapped value, you can transform the value (via map).
3: Given a wrapper of a wrapper, you can collapse it into one wrapper.

Okay, so if something looks roughly like that, let’s call it a Monad. Let’s see if we can more solidly define what we mean by each one of these! Before we do, however, there’s one final property that we’re going to need to make explicit, later, namely that we have a constructor. Or in other words:

4: That you can always take a value and wrap it in one of these data structures

Formalizing Map

Let’s try to pin down exactly what properties we want on these rules.

Well, we kind of expect map to do the same thing as just applying a function to a value in general. If I did optional.map(a=>a+1) and that did ANYTHING other than adding 1 to the contained value(s), that would be super confusing.

One way we can formalize this is by preserving how functions compose together. Mapping over a=>a+1 twice should just give us the same result as mapping over their composition, a=>a+2. Or, more precisely:

// these should be the same
Monad(value).map(f).map(g);
Monad(value).map(x => g(f(x)));

There’s also another rule we might expect a map to follow: mapping over the identity function should not affect the value. If we consider the instances of monad we were looking at before, this is a common-sense constraint.

// these should be the same
Monad(value);
Monad(value).map(id);

Formalizing Flatten

There’s another rule that might be difficult to come up with, without having a general understanding of what makes monads useful: If you’ve got a many-times wrapped value, flattening should be associative. That is, if you’ve got a 3 layer deep monad, and you want to flatten to a single layer deep monad, the order in which you flatten the monad shouldn’t matter.

// these should be the same
Monad(Monad(Monad(value)).flatten()).flatten();
Monad(Monad(Monad(value))).flatten().flatten();

Formalizing Constructor

What we want is roughly for the constructor to be the opposite of flatten, for when flatten is defined. Since we can’t necessarily flatten a single-dimensional monad, we’ll have to work within the context of the monad itself. There are two ways to get to a 2 dimensional monad from a single-dimensional monad: One is to map over the constructor, the other is to call the constructor on the monad itself.

Monad(value).map(construct);
construct(Monad(value));

What this means is that if we map over the constructor, then flatten, we should get the exact thing back in both these cases, and we do:

// these 3 should be the same
construct(Monad(value)).flatten();
Monad(value).map(construct).flatten();
Monad(value);

All these together are the MONAD LAWS, the rules all monads have to satisfy! Was learning them like this easier than starting with monads themselves? Maybe! You tell me!

Defining Monad

We’re well equipped enough to finally define the term monad, correctly! Monad is a generic datatype for which 3 methods are defined:

Map: Map is a function on the monad that takes a function as an argument, satisfying the above monad laws

Constructor: Constructor is a function that takes a value and returns a monad that wraps that type, satisfying the above monad laws.

Flatten: Flatten is a function on the monad, defined for when the wrapped type is another instance of the monad, that flattens the monad of a monad into a single-layer monad, satisfying the above monad laws.

Matching categorical monads and monadic datatypes

It turns out that these specific rules map extremely nicely to the category theoretic definitions of monads. That is, authors of programming languages can start using definitions and techniques used in category theory as a way to help guide them to a clean and consistent codebase.

For an exercise, let’s take the formal definition of a monad in category theory as found in Wikipedia. Reminder, this is the extremely dense, extremely technical, jargony version that grad students like rattling off:

Now let’s anger some math nerds by scribbling all over it:

My goal here isn’t to show you the exact correspondence between category theory and functional programming — that would require a working knowledge of category theory, and I’m not qualified to teach that. Instead, I’m showing that:

1: Using map, constructor, and flatten as your basis for monads is the easiest way to connect the programming definition to the category theoretic definition of monads.

2: When you choose this basis for understanding, the high level pieces as presented here map directly between the two.

To flatmap, chain, and bind

Spoiler alert: these all refer to the exact same thing! I’ll use the term flatmap from this point onwards, because I think it keys nicely with the things we’ve already defined.

Lots of people describe monads based around flatmap. So where does flatmap come into play? Well, you can compose together a map and a flatten to define flatmap. Or more succinctly:

// These are exactly the same
monad.map(someFunction).flatten();
monad.flatmap(someFunction);

If we go back to our previous problems and rewrite our examples with flatmap, then our solutions get pretty concise!

// Problem 1
users.flatmap(
   user => tweets.map(tweet => `<${user.name}> ${tweet}`)
);// Problem 2
getFile(path).flatmap(getExtension);// Problem 3
clientConnects.flatmap(getCookieClicks);

It’s worth reiterating that defining monads with flatmap in place of flatten is entirely equivalent — flatmap’s just a map composed with a flatten, and flatten’s just a flatmap over the identity function.

This seems a little silly with our previous examples. Wouldn’t we want the flatten step to be explicit? Well, if we turn our gaze towards modeling computations and asynchronous actions with monads, we start seeing just why it’s useful it is to define monads this way…

Chaining asynchronous operations with the Future monad

Hopping back into deduction here: Let’s consider a new datatype: the Future.

What I’m going to show is that Future, a very generic datatype, can describe composition of arbitrary computations with only a constructor, flatmap (here called chain), and map function.

As a note, this is going to be very similar to promises, but not quite. If Promises had a monadic interface, I’d be using them, but unfortunately, they don’t. Instead, I’ll borrow the interface from Fluture.

Consider a future, with a constructor:

const future = Future(
  (reject, resolve) => somethingThatEventuallyCalls(resolve)
);// calling fork calls the passed in computation,
// and eventually outputs the result:future.fork(console.log);

The function passed into the constructor is called a computation. When fork on the appropriate future is called, the computation function fires, and eventually (or not), either rejects or resolves the computation, resulting in the function being passed to fork firing.

We’ve got a constructor over values:

Future.of(5).fork(console.log) // outputs 5

We can map over futures:

Future
  .of(5)
  .map(a => a + 1)
  .fork(console.log); // outputs 6

We can also flatmap (here called chain) over futures. Let’s imagine that a well-named function fireNetworkRequest returns a future:

const future1 = Future
  .of('/some/path.json')
  .chain(path => fireNetworkRequest(path)) // remember: flatmapfuture1.fork(console.log)

We’ve now got a future, which for all intents and purposes, wraps the result of the network request. It’s a future, just like one we’d construct with the of constructor, with the only difference that it might take a while to actually fire the fork callback.

But remember: this just resolves to another future, this time containing the result of the network request! That means we can flatmap it again:

const future2 = Future
  .of('/some/path.json')
  .chain(path => fireNetworkRequest(path))
  .chain(response => fireSecondNetworkRequest(response))future2.fork(console.log)

We have a future that will eventually resolve to another future, which will eventually resolve to another value. But importantly, the second future can depend on the first one. Similarly, because we’re chaining, we get a single layer future back.

This is pretty neat: in theory, you can sort of make that kinda anything you want, can’t you? It could be a network request as we’ve done here, it could be a prompt to the user, it could be the result of a random process, it could be something that may never even execute! In essence, we’re starting to notice that monads using flatmap are nicely suited to describe chains of computations.

Why monads represent the “computational context” metaphor

Ultimately, that this strategy should work in general isn’t inherently obvious. We had one example above where monads encapsulated the composition of network requests, but that our map / flatten interface is strong enough to capture programs in all cases is a pretty neat result.

In other words, monads don’t represent computations, but rather that computations can be represented by monads.

Eugenio Moggi’s 1991 paper Notions of Computations and Monads solidifies the connection between categorical monads and programs, going into crazy categorical math that I still struggle to understand. His primary argument boils down to “we have these 5 or so distinct notions of what a program is or can do, beyond computing computable numbers. Monads can describe all of them. Choosing monads as a way to formalize computation is therefore a pretty good choice.”

Caveats

First caveat: There’s nothing in our definition of monads that says you should be able to pull values out of monads. Sure, for lots of monads, you can pull values out of monads, but this isn’t true in general. What’s the value of an empty optional? A Future that never returns?

Second caveat: You can only flatten two monads of the same type. So if I’ve got an array of optionals, the monad rules don’t guarantee that a function exists that flattens the two. Once again, for lots of monads, there are instances where you can define a join, but we have to break outside of the monad interface to flatten them together.

Lastly, I want to re-stress that the use of Javascript to describe monads is a little odd: these patterns come from strongly typed languages, and often Javascript serves more as a metaphor rather than as an accurate representation. However, I wanted to provide an alternate path towards understanding here, rather than going through Haskell syntax.

Finishing thoughts

Do I think this this is the best tutorial on monads? Heck no, for 4 reasons:

1: Tutorials for monads have been done time and time again, and I’m not particularly good at writing tutorials. The person who teaches monads, day in, day out, is going to be far better at teaching them than me.

2: The ‘best way to learn monads’ is different for everybody, because everybody has a different background. I have no doubt that the way I learned was particularly nonstandard, and I bet many others can say the same.

3: I kind of think a smattering of different tutorials, so that you can understand the different ways to approach monads, is an incredibly effective method. It sort of takes the inductive approach I took here and broadens it even farther.

4: In this specific tutorial, I brush over exactly what “wrapper” means in a way that may end up being confusing to people.

Do I think this is at least an effective way to teach monads? Yeah, I do! I really believe in the efficacy of examples over definitions, and especially, metaphors. I’d love to see more of a focus on this style of learning in the tech community, and selfishly, I want to practice it myself.

As always, if I got anything wrong, please let me know. Cheers!