blog.darrenstruthers.net: Faux Monoids in Python

Monoids are ubiquitous in computer programming, but in spite of this, the word "monoid" is not often heard outside of the functional programming domain. So what are they?

Here is the definition Wikipedia provides:

[A] monoid is an algebraic structure with a single associative binary operation and an identity element.

Let's break that down, starting with "associative binary operation". To start, a "binary operation" is simply an operator, such as the familiar +, −, ×, or ÷, which works on two operands (hence the term binary). "Associative" refers to the associative property in mathematics which applies to operations where the order in which operations are performed does not affect the final result. For example,

1 + (2 + 3) = (1 + 2) + 3

4 × (5 × 6) = (4 × 5) × 6

Therefore, addition and multiplication are associative operations. Subtraction and division, on the other hand, are not.

1 − (2 − 3) ≠ (1 − 2) − 3

4 ÷ (5 ÷ 6) ≠ (4 ÷ 5) ÷ 6

In computer programming, another very common associative operation is string concatenation:

"foo" + ("bar' + "baz") = ("foo" + "bar") + "baz"

Now for the second half of the definition of monoids. An "identity element" refers to a value that can be applied as an operand that will have no effect on the other operand.

n + 0 = n
n × 1 = n
s + "" = s

Therefore, string values form a monoid under concatenation, where the identity is the empty string. Numbers are tricker, though. As seen above, they form a monoid over addition, with 0 serving as the identity, and another monoid over multiplication, with 1 as the identity.

Why Should Programmers Care?

So far, this just looks like a bunch of math. What does it have to do with programming? It's all about code reuse. We'll focus on a simple example just to get the point across, but the concept can apply to more complex, real-world code.

Consider the following Python functions:

def reduce_strs(strs):
    result = ''
    for s in strs:
        result += s
    return result

def reduce_ints(ints):
    result = 0
    for i in ints:
        result += i
    return result

def reduce_lists(lists):
    result = []
    for L in lists:
        result += L
    return result

Aside from the trivial difference in some variable names, these functions are identical except for their first line. It might be nice in some situations to be able to write something more general:

def reduce_values(values):
    result = ???
    for v in values:
        result += v
    return result

The problem is in that first line of the function. Which initial value should we choose for result? Python has a strong type system, meaning that if we attempt to just choose one of the three initial values from above, our function will work for that corresponding type, but will throw a TypeError at us when we attempt to use the function with other types. What we really need is a generic monoid identity value, and while Python may not come with one out of the box, the language gives us the tools we need to create it ourselves.

The Nil Type

I developed the Nil type when I encountered the "monoid problem" in real code. I wanted to create a flexible identity value with the following goals in mind:

It should function as an identity for the + operation, no matter the type of the other operand. In other words, it is a polymorphic value.
It should be possible to compare it (using ==) to structures such as list, tuple, set, etc. to test for emptiness.
It should be a virtual (albeit "empty") instance of appropriate abstract base classes, such as Iterable, Sequence, and Collection (new in Python 3.6).
It should be a singleton value, in the vein of None, True, and False. No instantiation should be required for the programmer.

I chose the name Nil (as opposed to something more descriptive such as MonoidIdentity) primarily due to its terseness. I didn't want it to be too distracting in code. In its current state, Nil achieves each of the above goals, implemented like so:

1. Identity

We can make Nil "throw itself away" when it's added to any other value, by implementing the __add__ and __radd__ special methods.

class Nil:
    def __add__(self, other):
        return other

    def __radd__(self, other):
        return other

This guarantees that x + Nil() == x and Nil() + x == x .

2. Equality

This is the most subjective part of the implementation, in my opinion. The logic I settled on boils down to:

Nil can be compared to itself (Nil == Nil), always returning True
Nil can be compared to any virtual instance of the Sized abstract base class. These instances should be equal to Nil if their length is zero.
Nil cannot be compared to any other objects. Return NotImplemented to signal to the runtime that the comparison is not supported (in effect, returning False).

from collections.abc import Sized

def __eq__(self, other):
    if type(self) == type(other):
        return True
    elif isinstance(other, Sized):
        return len(other) == 0
    else:
        return NotImplemented

This gives us results such as Nil() == [], Nil() == tuple(), Nil() == set(), etc. This part of the implementation could change over time as new abstract base classes get added to the collections.abc module.

3. Abstract Base Class Instances

By implementing the following special methods, we can make Nil an instance of Iterable, Sequence, and Collection.

def __contains__(self, _):
    return False

def __iter__(self):
    return self

def __next__(self):
    raise StopIteration

Since these abstract base classes implement __subclasshook__, we do not need to register our class manually with them, and we also don't have to worry about doing any Python version detection to avoid referencing classes that don't exist in our environment if we're not running the latest Python. Nice.

4. Singleton

We want the programmer to think of Nil more as a value than as a class instance. So we will first add __repr__ and __str__ methods.

def __repr__(self):
    return 'Nil'

def __str__(self):
    return 'Nil'

We also want the programmer to be able to use Nil as a value without instantiating the class, so after our class definition we rebind the Nil name to a single instance of the class:

Nil = Nil()

Putting It All Together

Our finished code looks like this:

from collections.abc import Sized

class Nil:
    def __add__(self, other):
        return other

    def __contains__(self, _):
        return False

    def __eq__(self, other):
        if type(other) == type(self):
            return True
        elif isinstance(other, Sized):
            return len(other) == 0
        else:
            return NotImplemented

    def __iter__(self):
        return self

    def __len__(self):
        return 0

    def __next__(self):
        raise StopIteration

    def __radd__(self, other):
        return other

    def __repr__(self):
        return 'Nil'

    def __str__(self):
        return 'Nil'

Nil = Nil()

Now we can write the polymorphic function we set out to create:

def reduce_values(values):
    result = Nil
    for v in values:
        result += v
    return result

There you have it: three (if not more) functions for the price of one! The function should work on any type that supports the + operator.

What About Associativity?

When we first defined monoids, we said they must have an identity element and an associative binary operation. When it comes to the latter, there is simply no way to enforce associativity at the language level. When I am creating my own data types and want them to exhibit properties like associativity, my strategy is to build unit tests as needed. After all, more unit tests never hurt.

def test_mytype_associativity():
    a, b, c = MyType(1), MyType(2), MyType(3)

    assert a + (b + c) == (a + b) + c

Conclusion

Is Nil "Pythonic"? Honestly, I'm not sure. Is it cool? I think so. Polymorphic values such as Nil seem rare in Python, but I don't get the impression that they are discouraged. At the end of the day, one of the things that continues to impress me about Python is how flexible the language is. In particular, I think my Nil class highlights several cool features in Python, so if you've stuck with this article until the end, hopefully you learned something, even if you don't see an immediate use for Nil in your future.

blog.darrenstruthers.net

Sunday, March 05, 2017

Faux Monoids in Python