1Subtyping in Haskell with canonical injections

1.1Background

In Haskell, Just is the only sound way to make a Maybe t from t. We call such unambiguous construction a canonical injection. The compiler should insert canonical injections to recover from trivial type errors. Such recovery has these interesting benefits:

  • Now we can fake non-overlapping untagged unions (Haskell has tagged unions). This reduces the bureaucracy when composing data types following the free-monad/interpreter pattern; the compiler inserts the required Pure constructors.
  • Now we can fake equirecursive types (Haskell has isorecursive types). We no longer have to type "In" and "out" when using "Fix"; the compiler inserts the required bureaucracy.
  • Now we can fake heterogeneous lists and heterogeneous collections.
  • Now we can have covariant return type and contravariant return type. This means now we can, without breaking callers, generalize an argument or specialize a return value, as in Java.

1.2What is here, and how it answers your questions

This is what I am going to do in this article, in no particular order:

  • [NOTATION] defines some notations.
  • [DEFINE] defines canonical injections.
  • [DATA] shows the rules for inferring canonical injections for inductive data types.
  • [LAMBDA] shows the rules for inferring canonical injections for lambdas.
  • [HICKEY] shows that Hickey's two wishes are special cases of canonical injections.
  • [MONAD] shows the relationship between canonical injections and Monad instances.
  • [IMPL] sketches a possible implementation, in principle.

This is how I answer your questions:

  • Your first question is answered by [DEFINE], [DATA], and [LAMBDA].
  • Your second question is answered by [HICKEY] and [LAMBDA].
  • Extra things that may be of interest to you are [MONAD] and [IMPL].

1.3[NOTATION]

Notation conventions.

By "injection", I mean an injective function.

"The canonical injection from A to B" means "the only sound injection from A to B". By "sound", I mean "not involving bottom (undefined)". I assume total functional programming, and I say "the only possible injection" to mean "the only sound injection".

1.4[DEFINE] Examples of canonical injections

"Canonical" means "there is exactly one". "Injection" means injective function. A canonical injection from A to B is the only injection from A to B.

Examples of canonical injections:

  • In the case of t -> Maybe t, Just is the canonical injection.
  • In the case of a -> Either a b where a != b, Left is the canonical injection.
  • In the case of a -> Either a a, each of Left and Right is an injection, so there is no canonical injection here.

Thus some constructors are canonical injections.

1.5[DATA] Inferring canonical injections for inductive data types

In general, the compiler infers canonical injections from a data definition using this rule: If parameter p occurs exactly once in the right-hand side of the equal sign in "data A p = …", then the only constructor with p is the canonical injection from p to A p.

For example, observe that, in the following definition, in the right-hand side of the equal sign:

  • Int occurs twice,
  • String occurs once, and
  • p occurs once.

Therefore, from the above definition, the compiler infers that:

  • A1 : String -> A p is the canonical injection from String to A p.
  • A3 : p -> A p is the canonical injection from p to A p.

1.6[MONAD] Relationship between canonical injections and Monad instances

Let INJ be the canonical injection from p to F p. Let F be an instance of Monad. Then the compiler infers "return" and one case of "bind" as follows:

Indeed I think these have to be laws: If INJ is the canonical injection from t to F t, and F is an instance of Monad, then:

  • return has to be equal to INJ,
  • INJ x >>= k must be equal to k x.

I think the above laws relate this concept of "canonical injections" with your concept of monads as "conservative extensions of spaces".

There is also a law for the other way around: the return function must be an injection, although the user is responsible for ensuring that. I think the compiler should assume that return is a canonical injection, and exploit such injection with this rule: If an x : t is found where an f t is expected, and f is an instance of Monad, then replace x : t with return x : f t.

This rule also works in a nested situation. This should typecheck:

0 :: (Monad m, Monad n) => m (Maybe (n Int))

1.7[HICKEY] Hickey's two wishes

Yes, my rule should work in both of Hickey's cases, because there is a canonical injection in each of those cases. Note that Hickey's cases correspond to wanting the compiler to automatically apply these canonical injections:

Note that the type of inj is "today -> yesterday", not "yesterday -> today". (Do you see why?)

Our rewriting the function changes its type from "yesterday" to "today", but the unchanged callers still expect that the function has the type "yesterday".

1.8[LAMBDA] Inferring canonical injections for lambdas

Here I generalize Hickey's cases to all lambdas.

Notation convention: I write the dependently-typed expression "inj A B" to mean the canonical injection from A to B, if such canonical injection exists. The type of the expression "inj A B" is A -> B.

These two rules define canonical injections for all lambdas: For all types A, B, and C:

That should also work with currying and higher-order lambdas.

1.9[IMPL] A possible implementation, in principle

Suppose that a compiler encounters a type error.

Let s : S be the supplied (actual) expression and type (what the user actually types).

Let R be the required (expected) type.

Then, if there is the canonical injection inj : S -> R from S to R, the compiler should behave as if the user had typed "inj s" from the beginning.

In principle, it is possible to write a Haskell interpreter in Prolog, and add our own inference rules, such as inferring canonical injections and inserting canonical injections. I think [DATA] and [LAMBDA] sufficiently define canonical injections for all Haskell 98 types.

1.10History

This originated as a letter to Abdullah on <2018-12-18>.

<2018-12-18>

  • First publish date.
  • First revision.

Alternative titles:

  • Faking untagged unions and equirecursive types in Haskell
    • This sounds promising.
    • But this is misleading. This feature requires modifying the compiler, so the code is not "in Haskell".
  • Inferring and inserting canonical injections in Haskell
    • This was the original working title.
  • Recovering from trivial type errors in Haskell
    • Unclear about what is being sold.
  • A monad is a way of conservatively extending all spaces

The concept I'm proposing already exists with the same name ("canonical injection"). https://en.wikipedia.org/wiki/Inclusion_map

<2018-12-14>

Questions leading to this article:

  • Does "algebraic subtyping" mean adding the following rule to the compiler: "for all x, t: everywhere an x : t is found where a Maybe t is expected, replace x : t with Just x : Maybe t"?
  • Does "algebraic subtyping" mean that the compiler "recovers" from certain (injective) type errors?

1.11Half-baked but related ideas, should be in the article

1.11.1A subtyping relation arises from canonical injections

Remember:

  • "A is a subtype of B" means an A can be supplied wherever a B is expected.
  • The existence of a canonical injection from A to B suggests that A is a "subset of" ("is contained in") B.

We can define this subtyping relation: A is a subtype B iff there is a canonical injection from A to B.

Subtyping by canonical injection obeys Liskov substitution principle.

1.11.2Subtyping in Haskell?

TODO summarize:

1.11.3A weak anthropocentric argument for automating canonical injections?

We should conflate x and Just x. Just is there only to make the compiler happy. For us humans, x and Just x have the same semantics.

1.11.4Newtypes should be replaced with proofs.

2Interpreting Haskell without types: a strong temptation

Suppose that we are writing a Haskell interpreter in Prolog, but we are too lazy to check the types, so we simply ignore all type annotations. Let's call the language HWOT (Haskell WithOut Types), pronounced like "what". What do we have then? A normal-order beta-reducer/term-rewriter? Something like the Pure language1?

2.1The temptation is strong

The temptation to ditch types is strong.

Untagged unions are trivial in HWOT. We can write this in HWOT but not Haskell:

-- is_list :: a -> Bool
is_list [] = True
is_list (_:_) = True
is_list _ = False

-- size :: List a + Tree b -> Integer
size [] = 0
size (_:x) = 1 + size x
size Leaf = 0
size (Node _ lef rig) = size lef + size rig

-- my_list :: [Integer + String]
my_list = [1, 2, "three"]

Open functions (functions defined in several modules) are also trivial in HWOT.

Overloading and multimethods are trivial if constructors are namespaced.

fun ModA.Con = 0
fun ModB.Con = 1

We can make a function argument optional backward-compatibly. We can monkey-patch our programs, although this may lead to maintenance hell.

-- We have a function f : [a] -> Int.W

f [] = 0
f _ = 1

-- Now we want to make the argument optional.

-- We simply write this somewhere else.
-- This doesn't even have to be in the same file that defines the original f.

f Nothing = 0
f (Just x) = f x

We can even write rewrite rules like GHC rewrite rules for stream fusion. The head of a pattern does not have to be a "constructor".

map f (map g x) = map (f . g) x

Data-type-a-la-carte become much shorter. No need for pattern synonyms. Parsing, location, trees-that-grow, everything becomes much shorter.

eval (Add x y) = eval x + eval y

-- define in other file

eval (Neg x) = - x

-- data Loc a = MkLoc FilePath Int Int a
-- parse :: String -> Loc Exp

-- Simply add this anywhere.
eval (MkLoc _ _ _ e) = eval e

Unsolved problems: type classes.

Type classes are replaced with implicit first arguments?

Higher-order patterns may be too much to handle. They may break confluence2.

\forall f x . f (Just x) = f x

1 + "a" is not an error; it is a value that reduces to itself.

1 / 0 is not an error; it is a value that reduces to itself.

If f is a "partial" function and f x is "undefined", then f x is not an error; it is a value that reduces to itself.

I think that type systems should be separate from languages (that is, type systems should be Curry-style, not Church-style; that is, Erlang-Dialyzer-style, not Haskell-style). Users should be able to write their own type systems. But even an interpreter has an implicit type system.

2.2Compiling HWOT to Haskell?

  1. Write "crazy" programs in HWOT.
  2. Translate it to Haskell.
  3. ???
  4. PROFIT

3What I learn from teaching Haskell to a beginner

3.1Three levels

Level 0 is value-level.

Level 1 is type-level.

Level 2 is kind-level.

3.2Substitution, equality, rules, pattern-matching

3.3GADT syntax is more intuitive than Haskell 98 data syntax

GADT syntax makes it obvious that Just is a function and that Just 1 is a value.

data Maybe a where
    Nothing :: Maybe a
    Just :: a -> Maybe a

With Haskell 98 data syntax, I have to explain to the beginner, why the equal sign does not mean equality, that is, substitutability.

Syntax should represent semantics and be consistent with semantics.

3.4Functor, Applicative, Monad, extension of spaces

Extension of spaces (a la Abdullah):

fmap  ::   (a -> b) -> (f a -> f b)
(<*>) :: f (a -> b) -> (f a -> f b)
(=<<) :: (a -> f b) -> (f a -> f b)

And then explain (>>=) as swapped (=<<).

3.5Monad is for do-notation

No need to turn off a beginner with category theory.

3.6Haskell's type system leaves some things to be desired

Why can't I have a type that is inhabited by {Nothing, 0, 1, 2, …}? Why do I have to write the Just in {Nothing, Just 0, Just 1, Just 2, …}? Are Abdullah and I the only people who think this way?

4A mess; do not see

4.1Open ADTs (algebraic data types)

4.2Can we extend Haskell to "auto-fmap"?

  • Possibilities:

    • Add rewrite rules so that the compiler "recovers" from some type "errors".
    • Extend the syntax and semantics of function application.
  • Related

    • 1989, article, Wadler, "Theorems for free!"
    • The Haskell Djinn can, given a type T, infer/construct a term having type T.
  • Recovering from some type errors

    • Idea

      • Extend Haskell with "implicit injections".
      • The compiler should try in-scope injections automatically when there is a typing error, before quitting with a type error.

        • Isn't this similar to Scala implicits and implicit conversion?

          • I forgot who, but I think somebody on the Internet said that Scala implicits are a way for the compiler to recover from type errors.
      • Can we do this on GHC?

    • Define the concept of "expected type".
    • Let e be an expression.
    • Let f : a -> b.
    • Let m be an instance of Monad.
    • If e has type a, but the compiler expects e to have type m a, then the compiler shall rewrite e to return e.
    • If e has type m a, then the compiler rewrites f e to map f e.

  • If x is a Monad, then these are two different things: x : a and return x, but they are related, in the sense that they are equivalent, in the sense that one is trivially computable/derivable from the other.
  • Can Strathclyde Haskell Enhancement (SHE) do this?

  • https://en.wikipedia.org/wiki/Bidirectional_transformation

4.3Auto-lifting (and therefore sequencing) of function application involving Monad instances

  • The standard rule is:

    • If x : a and f : a -> b, then f x : b.
  • Suppose that m has a Monad instance.

    • If x : m a and f : a -> b, then should the compiler silently translate f x to x >> return . f=?

      • Isn't it the only desirable way of putting together f and x?

        • Monad class requires that x >> return . f= be equivalent to fmap f x.

          • So there is really only one way to do it, isn't it?
        • Examples of non-desirable ways: unsafeCoerce, undefined.

    • Should the compiler also appropriately translate f x for all these combinations?

      • Possibilities for the type of x:

        • a
        • m a
      • Possibilities for the type of f:

        • a -> b
        • a -> m b
        • m (a -> b)
        • m a -> m b
        • m a -> b
  • At first glance it seems convenient, but what are the consequences?

    • Some I can think of

      • Confusing error message

        • Suppose:

          • The programmer makes a typing mistake.
          • The compiler infers the wrong type.
          • The compiler performs translation based on the wrongly inferred type.
          • The compiler produces a confusing error message.

4.4Equirecursive types?

Haskell has isorecursive types. Can we make it use equirecursive types?

  • Can we make it automatically insert roll-unroll/fold-unfold/In-out?
  • How do we compose monads seamlessly?

    • Isorecursive types?
    • True sum types (untagged unions)?
  • "System F-omega with Equirecursive Types for Datatype-Generic Programming"?

4.5<2018-12-15> Bootstrap GHC?

https://twitter.com/ErikDominikus/status/1073726987338842112

How about writing a Haskell interpreter with Prolog? I guess Haskell type checker takes ~100 lines of Prolog, and parser takes ~200 lines. It may be doable in a month. I have ~20 lines of Prolog type-checking Haskell AST but without type classes.

Add ~500 more lines of Prolog metaprogram for translating lists to arrays, ~2000 more lines for translating Prolog to optimized x86_64 native code (if not reinventing LLVM), 1 more month, and … we may beat GHC at its own game? :)


  1. https://en.wikipedia.org/wiki/Pure_(programming_language)

  2. https://en.wikipedia.org/wiki/Confluence_(abstract_rewriting)