December 30, 2024

Overloading the lambda abstraction in Haskell

Blog

Dec 10, 2022

About two years ago, I started working on a little embedded
domain-specific language (EDSL) called achille, using Haskell. Because this EDSL
has its own notion of morphisms Recipe m a b from
a to b, I was looking for a way to let users
write such morphisms using regular Haskell functions and the syntax for
lambda abstraction, \x -> ....

As recently as 5 days ago, I was still convinced that there was no
way to do such a thing without requiring a lot of engineering effort,
either through GHC compiler plugins or with Template Haskell — two
things I’d rather stay far away from.

Indeed,

Haskell’s
Arrow abstraction and the related proc
notation are supposed to enable just that, but they don’t work as
intended. It’s a difficult thing to implement
properly and if
proposals to fix it have been discussed at large, nothing made it
upstream.

But even if the notation worked properly (and was convenient to use),
it is quite common to work with morphisms k a b that cannot
possibly embed all Haskell functions a -> b,
preventing us to use this notation as we would like, simply because we
cannot define a suitable instance of Arrow k.

There are solutions to this, like the great overloaded
package, that fixes the proc notation and crucially makes
it available to many categories that are not Arrows. This
may very well be exactly what you — the reader — need to resolve this
problem.

However, I want to avoid making my library rely on compiler plugins
at all cost, and think the proc notation, while way better
than raw Arrow combinators, is still too verbose and
restrictive. I want to write lambdas, and I want to apply
my morphisms just like any other Haskell function. So no
luck.

There is also this wonderful paper from Conal Elliott: “Compiling to
Categories”.
In it, he recalls that any function in the simply typed lambda calculus
corresponds to an arrow in any cartesian closed category (CCC). What he
demonstrated with the concat GHC
plugin is that this well-known result can be used to lift any
monomorphic Haskell function into a Haskell arrow of any user-defined
Haskell CCC.

This is very cool, but also an experimental research project. It’s
very unstable and unreliable. It’s not packaged on Hackage so fairly
difficult to install as a library, and looks hard to maintain precisely
because it’s a compiler plugin.

And again, not every category is cartesian-closed, especially if
exponentials have to be plain Haskell functions — which concat
enforces.

However, 5 days ago I stumbled upon some paper by
sheer luck. And well, I’m happy to report that “overloading” the
lambda abstraction is completely doable. Not only
that: it is actually very easy and doesn’t require any advanced
Haskell feature, nor any kind of metaprogramming. No. library.
needed.

I cannot emphasize enough how powerful the approach appears to
be:

You can define your very own proc notation for
any category you desire.
That is, as soon as you can provide an instance for
Category k, you can already overload the lambda
abstraction to write your morphisms. Even if your category isn’t an
instance of Arrow because of this arr
function. You do not have to be able to embed pure Haskell functions
in your category.
The resulting syntax is way more intuitive than the
proc notation, simply because you can manipulate morphims
k a b of your category as plain old Haskell
functions, and therefore compose them and apply them to variables
just as any other Haskell function, using the same syntax.
Implementing the primitives for overloading the lambda
abstraction is a matter of about 10 lines of Haskell.

It’s in fact so simple that I suspect it must already be documented
somewhere, but for the life of me I couldn’t find anything. So
here you go. I think this will be very useful for EDSL designers out
here.

A toy DSL for flow diagrams

So let’s say our EDSL is supposed to encode flow diagrams, with boxes
and wires. Boxes have distinguished inputs and outputs, and wires flow
from outputs of boxes to inputs of other boxes.

We can encode any flow diagram using the following
operations:

{-# LANGUAGE GADTs #-}

data Flow a b where
  -- just a wire
  Id :: Flow a a
  -- putting two boxes one after the other
  Seq :: Flow a b -> Flow b c -> Flow a c
  -- putting two boxes one next to the other
  Par :: Flow a b -> Flow c d -> Flow (a, c) (b, d)
  -- box that duplicates its input
  Dup :: Flow a (a, a)
  -- box that gets rid of its input
  Void :: Flow a ()
  -- box that projects on first input
  Fst :: Flow (a, b) a
  -- box that projects on second input
  Snd :: Flow (a, b) b
  -- finally, we embed any pure function into a box
  Embed :: (a -> b) -> Flow a b

There are probably enough constructors here to claim it is a
cartesian category, but I didn’t double check and it doesn’t really
matter here.

I think we can agree that although this may very well be the right
abstraction to internally transform and reason about diagrams, what an
awful, awful way to write them. Even
if we provided a bunch of weird infix operators for some of the
constructors, only few Haskellers would be able to make sense of this
gibberish.

What I want is a way to write these diagrams down using
lambda abstractions and variable bindings, that get translated to this
internal representation.

We can give an instance for both Category Flow
and Arrow Flow:

import Control.Category
import Control.Arrow

instance Category Flow where id = Id ; g . f = Seq f g

instance Arrow Flow where
  arr     = Embed
  first f = Par f id
  second  = Par id
  (***)   = Par
  f &&& g = Seq Dup (Par f g)

{-# LANGUAGE Arrows #-}

t :: Flow (a, b) (b, a)
t = proc (x, y) -> returnA -< (y, x)

Seq (Embed(_)) (Seq (Embed(_)) (Embed(_)))

We just wrote swap. But Haskell doesn’t even
try to use the operations we’ve just defined in
Arrow Flow, and just lifts pure Haskell functions using
arr. The terms you get are not
inspectable, they are black boxes. Not. good. Granted,
morphisms Fst and Snd are not part of the
interface of Arrow, so this example is a bit unfair. But if
you’ve used the proc notation at all, you know this isn’t
an isolated incident, and arr eventually always shows
up.

What I’m gonna show you is how to enable the following,
straight-to-the-point syntax:

t :: Flow (a, b) (b, a)
t = flow \(Tup x y) -> Tup y x

t = Seq Dup (Par (Seq Id Snd) (Seq Id Fst))

Evaluating Linear Functions to
Symmetric Monoidal Categories (SMCs).

In this paper, they explain that if CCCs are models of the simply
typed lambda calculus, it is “well-known” that SMCs are models of the
linear simply-typed lambda calculus. And thus, they show how
they are able to evaluate linear functions (as in, Linear
Haskell linear functions) into arrows in any target SMC of your
choice.

They even released a library for that: linear-smc.
I implore you to go and take a look at both the paper and the library,
it’s very very smart. Sadly, it seems to have gone mostly
unnoticed.

This paper was the tipping point. Because my target category
is cartesian (ergo, I can duplicate values), I suspected that I could
remove almost all the complex machinery they had to employ to go from a
free cartesian category over a SMC to the underlying SMC. I was hopeful
that I could ditch Linear Haskell, too.

And, relieved as I am, I can tell you that yes: not only can all of
this be simplified (if you don’t care about SMCs or Linear Haskell), but
everything can be implemented in a handful lines of Haskell code.

Interface

So, the first thing we’re gonna look at is the interface
exposed to the users of your EDSL. These are the magic primitives that
we will have to implement.

First there is this (abstract) type Port r a. It is
meant to represent the output of a box (in our flow
diagrams) carrying information of type a.

Because the definition of Port r a is not
exported, there is crucially no way for the user to retrieve a
value of type a from it. Therefore, to use a “port
variable” in any meaningful way, they can only use the
operations on ports that you — the library designer —
export.

And now, the two other necessary primitives:

encode :: Flow a b -> Port r a -> Port r b
decode :: (forall r. Port r a -> Port r b) -> Flow a b

encode transforms a morphism from your category —
here a diagram Flow a b — into a Haskell functions on
ports. By exporting this function, you enable users to apply any
morphism in your category to the “port variables” they have at
their disposal. And precisely only those operations.

decode does the reverse translation. It
takes any Haskell function on ports, and converts it back into
a morphism in your category. Well, not any function. Only
functions that are closed with respect to port variables.
That’s why there is this type variable r in
Port r a. Because all the operations on ports exported by
this interface share the same r between all inputs and
outputs, there is no way to use a port variable defined outside
of a function over ports, and still have the function be quantified over
this r.

Control.Monad.ST.
This is really really neat.

encode and decode can be defined for any
Category k. But if k is also
cartesian, we can provide the following additional primitives:

pair :: Port r a -> Port r b -> Port r (a, b)
unit :: Port r ()

flow = decode

fst :: Port r (a, b) -> Port r a
fst = encode Fst

snd :: Port r (a, b) -> Port r b
snd = encode Snd

split :: Port r (a, b) -> (Port r a, Port r b)
split p = (fst p, snd p)

pattern Tup x y <- (split -> (x, y))
  where Tup x y = pair x y

void :: Port r a -> Port r ()
void = encode Void

box :: (a -> b) -> Port r a -> Port r b
box f = encode (Embed f)

(>>) :: Port r a -> Port r b -> Port r b
x >> y = snd (pair x y)

box1 :: Port r Text -> Port r (Int, Int)
box2 :: Port r Text -> Port r Bool
box3 :: Port r Int -> Port r ()
box4 :: Port r Int -> Port r ()

diag :: Flow (Text, Text) Bool
diag = flow \(Tup x y) -> Flow.do
  let Tup a b = box1 x
  box3 a
  box4 b
  box2 y

Because (>>) is defined in such a way,
box1, box3 and box4 will
appear in the resulting diagram even though their output gets
discarded. Were we to define x >> y = y, they would
simply disappear altogether. This technique gives a lot of control over
how the translation should be specified.

Finally and most importantly, box is the
only way to introduce pure Haskell functions
a -> b into a black box in our diagram. For this reason,
we can be sure that Embed will never show up if
box isn’t used explicitely by the user. If your category
has no embedding of Haskell functions, or does not export it,
it’s impossible to insert them, ever.

Implementation

And now, the big reveal… Ports r a are just morphisms
from r to a in your category!

This is the implementation of the primitives from
earlier.

newtype Port r a = P { unPort:: Flow r a }

encode :: Flow a b -> Port r a -> Port r b
encode f (P x) = P (f . x)

decode :: (forall r. Port r a -> Port r b) -> Flow a b
decode f = unPort (f (P id))

pair :: Port r a -> Port r b -> Port r (a, b)
pair (P x) (P y) = P (x &&& y)

unit :: Port r ()
unit = P Void

And indeed, if you consider Port r a to represent paths
from some input r to output a in a diagram,
then by squinting your eyes a bit the meaning (in your
category) of this interface makes sense.

For encode, surely if you know how to go from
a to b, and from r to
a, then you know how to go from r to
b: that’s just composition, hence
(.).
For decode, you have as assumption that from any
point r in a diagram, you know how to transform a path from
r to a into a path from r to
b. In particular, because this holds for any
r, this holds for a. And you know a path from
a to a, it’s the identity! This gets you a
path from a to b.

It would appear encode and decode being
inverse of one another is yet another instance of the Yoneda lemma, but
I’m not a categorician so I will not attempt to explain this any more
than that, apologies.

Still, the sheer simplicity of this technique is mesmerizing to me.
It kinda looks like doing CPS, but it isn’t CPS. Quoting the
paper:

the encoding from k a b to
P k r a ⊸ P k r b can be thought of as a transformation to
continuation-passing-style (cps), albeit reversed — perhaps a
“prefix-passing-style” transformation.

Apparently it is the dual of CPS or something.

And you can double-check that in order to implement
encode and decode you just need
id and (.). Now to allow more stuff than just
composing morphisms as functions, more primitives can be added, like
pair using (&&&), and
unit using Void :: Flow a () — which may or
may not be available, dependending on the kind of categories you’re
working with.

Some things to be wary of

Ok, so now we’ve seen how to successfully “overload” the lambda
abstraction. But there is one quirk that I think you should be aware
of.

Preventing (too
much) duplication in the diagram

If you look at the definition of split p, you can see
that the input p gets duplicated in the two output ports.
If both of them are used in a given diagram, p will appear
at least twice in this diagram. Now in the abstract setting of
category theory, duplication in your diagram doesn’t matter because of
the equivalences and laws that equate diagrams.

achille. You could imagine changing the
Embed constructor of Flow to the
following:

Embed :: (a -> IO b) -> Flow a b

copy :: Port r a %1 -> (Port r a, Port r a)

Indeed, the type system will prevent you to use a variable more than
once without explicitly inserting copy. However, everything
has to be made linear and this has huge implications on the
user-friendliness of the overall syntax. I’ve also found error messages
from Linear Haskell to be pretty uninformative — and let
bindings are not linear, what??.

Instead, I came up with a solution I’m fairly happy with. There is a
trick: we introduce a new primitive.

(>>=) :: Port r a -> (Port r a -> Port r b) -> Port r b

Now of course, you could just implement it as
x >>= f = f x, but that’s precisely what we want to
avoid. No, instead, I force the evaluation of x,
and then evaluate f applied to a box that simply
returns the pre-computed value, side-effect free. So a box gets
duplicated, sure, but this box does nothing. What I am now
noticing while writing this down, is that introducing this
Bind operator makes the diagram non-inspectable again,
oops. I can still run it of course, but this is a problem for future
me.

Anyway, the benefit of this (>>=) operator is that
it’s very convenient with QualifiedDo:

diag :: Flow FilePath Bool
diag = flow \src -> Flow.do
  Tup a b <- box1 src
  box3 a
  box4 (box2 b)

You just have to stop using let and you’re good to go.
What’s remarkable in the syntax is that these
effectful boxes are used as plain old
functions. Therefore there is no need to
clutter the syntax with <$>, >>=,
pure, -<, returnA and a bind
for every single arrow like in the proc notation: you just
compose and apply them like regular functions — even when they are
effectful.

Compile vs. Evaluate

Now the paper is called “Evaluating […]” and says there is a runtime
cost to pay for the translation. Considering how simple this
implementation is, and how the target is fully first-order (well,
without (>>=)), I wouldn’t be surprised if GHC is
actually able to fully unfold and translate to category morphisms at
compile-time. But I’m not an expert Haskell developer by any means and
have no clue how one would go about checking this, so if anyone does, please tell me!

Expressive power

I’m very curious to see which kind of categories you can “compile”
Haskell functions to using this technique. I didn’t expect converting to
cartesian category’s morphisms to be so straightforward, and I’m looking
forward to see whether there is any fundamental limitation preventing us
to do the same for CCCs.

Recursive morphisms
and infinite structures

Someone
asked on Reddit whether recursive morphisms can be written out using
this syntax, wihtout making decode loop forever.
I’ve thought about it for a bit, and I think precisely because this is
not CPS and we construct morphisms k a b rather
than Haskell functions, decode should not be a problem. It
should produce recursive morphisms where recursive occurences
are guarded by constructors. Because of laziness, looping is not a
concern.

But I haven’t tried this out yet. Since in achille recursive recipes can be
quite useful, I want to support them, and so I will definitely
investigate further.

Thank you for reading! I hope this was or will be useful to some of
you. I am also very grateful for Bernardy and Spiwack’s fascinating
paper and library, it quite literally made my week.

Feel free to go on the
r/haskell thread related to this post.

Till next time, perhaps when I manage to release the next version of
achille, using this very trick that I
had been desperately looking for in the past 2 years.

2024-12-28 13:18:34

Abstraction Haskell Lambda Overloading video dawnloader free video dawnloader free online VideoDDD

Overloading the lambda abstraction in Haskell

A toy DSL for flow diagrams

The solution

Interface

Implementation

Some things to be wary of

Preventing (too
much) duplication in the diagram

Compile vs. Evaluate

Expressive power

Recursive morphisms
and infinite structures

Leave a Reply Cancel reply

Overloading the lambda abstraction in Haskell

A toy DSL for flow diagrams

Interface

Expressive power

Recursive morphisms and infinite structures

Leave a Reply Cancel reply

Recursive morphisms
and infinite structures