1Requirements for the language?

1.1Interoperation

  • Must be able to work with C.
    • Must be able to preprocess/parse/read/translate/compile C code.
  • <2020-02-21> Try to target an existing intermediate representation instead of proliferating new ones:
    • LLVM IR, most probable choice
    • C11
    • Racket fully-expanded programs
    • Guile?
    • CIL/MSIL
    • JVM
    • GHC Core, System F
    • x86-64
    • Parrot
    • GraalVM

1.2Numbers

  • Numbers should be as in mathematics, unless the programmers declare that they want a variable to have a machine-optimized type.
    • Not all real numbers can be represented in a finite computer.
    • Should (sqrt 2) evaluate to a rational number or a symbolic expression?
      • Should expressions be evaluated symbolically?
      • Do we want a computer algebra system, a programming language, or a modeling-and-transformation language?
        • Should we target programmers, scientists, businesspeople, all of them, some of them, or none of them?

1.3Modeling

  • The programming language must be convenient for specifying models, model-to-model transformations, and model-to-code transformations?
  • The modeling language should not require mutation.
  • Model transformations should be stateless. Running the same model transformation with the same input should produce the same output.

1.4Names

  • Multiple programmers must be able to use the same short name without clashing.
  • Overloading must be supported. E.g. a "get" function should work with lists, arrays, and byte strings. The user should be able to extend overloads?
  • The importer must be able to rename all importees.
  • A file must not name its own package/namespace.
  • There is no global namespace. Piggyback on the operating system's file system namespace.
  • Consequence: moving a code into another file requires changing all of its users.

1.5Structured documentation

  • Document a file.
  • The language must have a form for documenting forms. Documentation should be in a structured form, not in an unstructured string, let alone comments.
  • The documenting forms should capture the intentions.
  • The Abstract Syntax Tree must include comments.
  • "comment" special form, not evaluated
  • "doc" special form, not evaluated
(comment "FOOBAR")
(comment foo bar qux)

(define (f x)
  ;; doc in definition
  (doc :summary "Print something.")
  (println x))

;; doc outside definition
(doc f :summary "Print something.")

1.6Strings

  • The source code is a byte string, not a character string.
  • The source code character encoding is 8-bit ASCII in the following sense:
    • Inside a byte string literal, the bit 7 of a byte may be one.
    • Outside a byte string literal, the bit 7 of every byte must be zero.
    • A byte string literal may contain literal (unescaped) line-feed characters.
  • Programmers should not use non-ASCII characters for identifiers in source code. See https://en.wikipedia.org/wiki/IDN_homograph_attack.
  • If programmers insist on using non-ASCII characters for identifiers, they are responsible for encoding and normalization.

1.7Parsing, read, and write

  • The abstract syntax tree must not depend on indentation.
  • For analyzability, the parser must not execute arbitrary code. The grammar must not be arbitrarily customizable. The parser must not require the evaluator. The parser should contain as little state as possible.
  • All errors, including run time errors, must blame a precise location (to the column number), unless the programmers explicitly specify that they prefer performance to traceability.
  • Line numbers begin from zero.
  • Column numbers begin from zero.
  • "read" must be an inverse of "write".
  • "read" must accept every output of "write". Unreadable things must be unwritable as well.
  • "read" must not execute arbitrary code.
  • "write" is intended for machine consumption by "read".
  • "print" is intended for human consumption.

1.8Compiler

  • The compiler must not take more than one minute clock time and 2 GB RAM to compile itself on my modest amd64 machine.
  • The bootstrapping (stage-0) interpreter should be written in a reasonably portable language implementation such as C.
  • The bootstrapping (stage-0) interpreter must be optimized for simplicity, portability, conciseness, shortness, minimality, understandability. It can be naive (mark-and-sweep GC algorithm, linked list for everything, etc.)
  • The stage-1 interpreter must be written in its own input language.
  • The compiler must be self-hosting: it must be written in its own input language.
  • The bootstrapping interpreter runs the stage-0 interpreter that runs the compiler.

1.9Polymorphism

  • For run-time speed, it must be possible to resolve some polymorphic parts when the program is compiled so that those polymorphic parts are monomorphic at run time.

1.10Security

  • There should be a way to limit the time and memory used by every program, including "read".
  • There should be a way to limit everything, including file-system access?

2Collecting the semantic design puzzle pieces

How should we define the semantics? What properties do we want? Composability/compositionality?

How to design language: Ask critical questions and find corner cases. The answers constrain the language and justify the design.

Here we collect pieces of the semantic design puzzle.

There are several starting point candidates: machine code or mathematics. Where should we begin?

Should we bet our future on the engineers, the scientists, the philosophers, or the mathematicians?

In this section, we motivate and define this:

meaning : (Expr,State) -> (Expr,State)

2.1Motivation: From a humble calculator

Computers can be thought of as calculators with state. By "state", we mean "memory" or "storage". So, we begin by modeling a stateless calculator, and then we gradually add features.

Originally, computers were invented to help people do arithmetics, which is a part of mathematics. Thus we should start from mathematics.

We begin by writing a function that evaluates calculator expressions, and then we generalize that. Let us name the function "meaning : Exp -> Val". That function may be defined like this:

Val = Int

x:Val   => x:Exp
x,y:Exp => x+y : Exp
x,y:Exp => x*y : Exp

x:Val   => meaning x = x
x,y:Exp => meaning (x+y) = meaning x + meaning y
x,y:Exp => meaning (x*y) = meaning x * meaning y
...

For example, meaning(2+3) = 5.

We add bindings/variables/substitutions. This generalizes "meaning" to "meaning : (Bindings,Exp) -> Val".

meaning bindings (Let1 name expr body) =
  let val = meaning bindings expr
      bindings_1 = bindings LEFT-UNION {name => val}
  in  meaning bindings_1 body

Because Val is a subset of Exp, we generalize "Val" to "Exp".

Then we add state. This generalizes "Bindings" to "State".

We can rearrange "meaning : State -> Exp -> Val" to "meaning : (State,Exp) -> (State,Exp)". Then we can rearrange it further to "meaning : State' -> State'" with "current-expression : State' -> Exp".

2.2An example of building a programming language

Another point of view is that programming languages are invented to make it easier to generate machine code.

Suppose that we start with an arithmetic expression evaluator.

(evaluate (* (+ 2 3) (+ 2 3)))

2.3An interpreter is a state endofunction

When you interpret a paragraph, you interpret each sentence sequentially. When you interpret a sentence, you change your internal state.

Let X be 10. Compute 2*X. Forget it. Let X be 20. Compute 3*X.

Interpreting a program fragment changes the interpreter state and produces a residual program, which is then again interpreted to change the interpreter state, and so on.

-- An implementation in Scheme?
(define (interpret s)
  (match (get-expression s)
    ((sequence x y ...)
     (define s' (interpret (set-expression s x)))
     (interpret (set-expression s (sequence y ...))))))

An interpreter state has a global mapping, a current expression, and perhaps many other things not directly exposed to the program.

The meaning of an expression E is a State endofunction.

I think I'm going to be convenient with that formalism. Let's try building a Lisp interpreter from the semantic function (Expr -> State -> (Expr,State)).

Expr is first parameter so we can write the curried expression meaning(E).

State = (Expr,Other_State)

meaning : Expr -> Other_State -> (Expr,Other_State)

meaning Current_State s = (s, s)
meaning (Calculate e) s = (calculate e, s)
meaning (Interpret e) s = meaning e s
meaning (Quote e) s = (e, s)
meaning (Define name expr) s = (Unit, ???)
meaning (Let name expr body) s = (Unit, ???)
meaning (Car e) s = car (meaning e s)
meaning (Cdr e) s = cdr (meaning e s)
meaning (Namespace e ...) = (ns???, s')
meaning (Before x y) s =
  let (_, s') = meaning x s
  in  meaning y s'

...

To find the primitives, instead ask "What is the syntax of fully expanded expressions?"

What are the primitives in McCarthy's original LISP?

From McCarthy 1959 [8]:

  • "the universal S-function 'apply' which plays the theoretical role of a universal Turing machine and the practical role of an interpreter"
  • Primitive functions: atom, eq, car, cdr, cons

Racket's syntax of its fully expanded programs

What are GHC's primitives?

Scheme? Kernel language? Qi/Shen?

John N. Shutt "Kernel is a conservative, Scheme-like dialect of Lisp in which everything is a first-class object." What does he mean by "everything"?

Problems with the Kernel language

I think we want these forms, and there is no other way to implement these forms other than by primitives:

(current-state) ;; This is where we differ from other Lisps?
(quote Expr) ;; We also differ by making (quote 1) not equal to 1
(interpret Expr) ;; How does this differ from the eval in other Lisps?
(car Expr) ;; pair-first
(cdr Expr) ;; pair-second
(cons Expr Expr) ;; make-pair
(bind Expr) ;; compute lexical binding
(let1 Name Expr Body) or (lambda1 Param Body)
(before Expr Expr) ;; for ordering/sequencing evaluation
(eq? Expr Expr)
(equal? Expr Expr)
(read-syntax)
(read-concrete)
(calculate Expr) ;; ???

The "lambda" construct does two things: it both delimits a scope and binds occurrences. Can we separate them into two separate constructs?

(λx. [(λx. x) 1]) 2

;; (let1 Name Expr Body) delimits scope
(let1 'x 1 (let1 'x 2 x))

The let1 form is used by bind for lexical scoping.

The let1 form should use maximal sharing?

Note: (define (quote one) (quote 1)), not (define one 1).

meaning (Define name expr) s0 =
  let (s1,iname) = meaning name s0
      (s2,iexpr) = meaning expr s1
      s3 = s2 LEFT-UNION { iname => iexpr }
  in  (s3,iexpr)

Another question: should (define Name Expr) normalize to unit or to the value of Expr?

We agree that (define Name Expr) modifies the interpreter state. What we don't agree on is how Expr should be evaluated: lazily, eagerly, or not evaluated at all.

Consider the difference:

Let X be 1+2.

Let X be the result of 1+2.

Let "X" be "1+2".

Let X be John.

Let X refer to John.

Define "chair" as a seat that has back rest.

Define "chair" as a "seat" that has "back rest".

Find a chair, and sit on it.

(Not: Find a "chair".)

Quotes are also used for hedging, connotation, innuendo, codewords, humor, etc.

When we encounter "let X be <a complex expression>" while reading mathematics, we do not evaluate the complex expression.

(let x be (+ 1 2))
(let (quote x) be (quote (+ 1 2)))
(define chair as (extend seat with back-rest))

Should "interpret" be called "normalize" instead? Should we provide the programmers the means to "reduce" an expression one step instead of "normalizing" the expression?

The progn form can be derived by either left-folding or right-folding before.

(progn x y z) = (before x (before y z))
(progn x y z) = (before (before x y) z)

Which of these forms should be primitive, and what should they mean?

(define Name Expr) ;; global binding if occurs on top-level
(vector Expr ...)
(delimit-scope Expr ...) ???
(namespace Expr ...) ???
(apply Func Arg ...)
(list Expr ...) ;; can be stated in terms of cons

First-class interpreter state: There is an expression whose interpretation is the interpreter state.

An expression e is a value iff meaning(e,s) = s.

A state s is terminal iff interpret(s) = s.

(define x y)
=
(hash-table-set!
  (current-interpreter-state)
  (quote x)
  (interpret y))

When should an expression be rewritten?

Perhaps we all agree that (calculate (+ 1 2)) should evaluate to 3.

2.4How should we delimit scopes?

TODO:

  • Constructs for delimiting scope/context
  • First-class scope

2.5Quotation

Quoting is not only a feature of written language, but also of spoken language, although it is more pronounced in written language. In spoken language, quoting is indicated by intonation, gestures, or additional clarifying words. For example: When we say if John says to Mary "write the name of your dog", and Mary's dog's name is "Doggy", then Mary usually interprets the utterance as "Write 'Doggy'" and not "Write 'the name of your dog'", unless Mary is joking or uncooperative. However, if John says "write as i dictate <pause> the name of your dog", then he may mean it literally. In spoken language, pragmatics plays more role than syntax. In written language, we elaborate syntax to compensate for missing intonation and gestures.

Example: Suppose Alice says to Bob, "Management hates it." If Bob writes "Alice said to me that management isn't too fond of it", he is not lying, although he could use another word like "hinted" or "suggested" instead of "said". If Bob writes "Alice said to me, 'Management isn't too fond of it'", he is lying.

Critical thoughts:

  • I think (equal? (quote 1) 1) should evaluate to #f, because "1" is not equivalent to 1.
  • I think (eq? (quote 1) 1) should evaluate to #f, because "1" is not identical to 1.
  • Indeed, I think (equal? (quote x) x) should evaluate to #f for all x, except in evaluation errors.

That is, quote should just quote, and not do anything else, let alone evaluate its argument.

The interpretation of (quote x) is x?

Quotation is not totally opaque. For example: It makes sense to say "The first letter in 'XYZ' is X", and thus the quoted "XYZ" is not an opaque entity that can only either be passed around or be unquoted.

The question is: What should (quote (1 2)) be?

It is confusing to talk about quotations. For example: (quasiquote (x (unquote y) z)) is (quote x y' z) where y' is the meaning of y.

Should equal? evaluate its arguments before comparing them? Yes, because "equal" does not mean "identical". 1+2 and 3 are equal in decimal arithmetics, but they are not identical. Two different names may refer to the same referent and thus be equal, but those different names are never identical.

Q: Calculate 1+2.
A: 3.
(calculate (+ 1 2))

Q: Calculate "1+2".
A: "1+2".
(calculate (quote (+ 1 2)))

Q: Calculate X+X where X=10.
A: 20.
(calculate (let ((x 10)) (+ x x)))
different from: (let ((x 10)) (calculate (+ x x)))
(calculate (let ((f (lambda (x) (+ x 1)))) (f (f x))))

2.6Let, binding, substitution, maximal sharing, graphs, factoring

Here we motivate why let should be lazy and should introduce maximal sharing.

We begin by the problem in which a meaningful expression L(x,1/0) has a meaningless subexpression 1/0.

Suppose we have defined L(x,y) = x. The question: What should be the meaning of L(x,1/0)? We know that 1/0 is syntactically valid but is meaningless. Solution: We add a bottom value (⊥) to the Universe. The bottom value represents errors. A bottom means an error, but the details of the error is ignored by the "meaning" function.

L x y = x
L x (1/0) = ?
let y = 1/0 in L(x,y)

I think we expect these to have the same meaning:

f x
let x' = x in f x'

That is, "f(x)" should have the same meaning as "let x' = x in f(x')". Therefore, "let" must do substitution, not evaluation. But naïve substitution is inefficient. But maximal sharing has the same semantics as substitution and is efficient.

However, we don't always want "let" to introduce maximal sharing. Sometimes we want "let" to evaluate the binding before evaluating the body. We have two choices: do strictness analysis like GHC, or let the programmers choose which one they want.

Thus, let should internalize the expression into a graph with maximal sharing. Each variable is evaluated at most once. At-most-once semantics. "before" can be used to order evaluation. Note that ordering evaluation does not mean forcing evaluation.

meaning(x/0) = ⊥
if L(x,y) = x then
  meaning(L(x,y)) = meaning(x)
  meaning(L(x,⊥)) = meaning(x)
  meaning(L) = (U2,U,SetBuilder ((x,y),x) x,y in U)?
  meaning(L(x)) = ??? Currying?
??? (normalize-in-normal-order (L x y)) ???

Church–Rosser theorem? In total functional programming, both eager and lazy evaluation produces the same result?

We don't want to repeat ourselves, so we add these features to our language: factoring, references, and substitutions.

(evaluate
  (let [(x 2)
        (y 3)
        (z (+ x y))]
    (* z z)))

2.7Printing, externalization, representation

Not only do we want the computer to compute, but we also want it to show the result:

(print (evaluate (+ 1 2)))

2.8What?

We want to "solve" a differential equation.

We create some things: a representation for differential equations, and a compiler (a translator) that, given a differential equation and its initial conditions, generates what? A procedure? A generator? A list of numbers?

;; 0 = Df(x) + 2 * f(x)
(approximate
  (differential-equation
    (= 0 (+ ([d f] x) (* 2 [f x])))
    functions (f)
    variables (x))
  method [euler
    initial-conditions [(x 0)]
    step-size 1.0e-6])

But the form is rather inflexible: What if the users want to implement their own methods? With if the users want to approximate other things, such as a system of equations?

We also want to plot the result…

We also want to implement iterative approximation algorithms, etc.

Then we want to parse.

Note the reference to my-char in the parse:interpret block.

(define-function (f port)
  (define-local-variables my-char my-string)
  (parse:interpret program (sequence
      (char) ;; read any char
      (set! my-char char) ;; read any char and store it to my-char
      (set! my-string (string of length 8))
      (char y)
      (char #\z)
      (char 0)
      (char code 32)
      (set! my-choice (choose (char x) (char y)))
      eof)
    with-input-from port))

Can we generate a pretty printer and a parser from a common description?

Informally, a printer is an inverse parser. For example: In a parser, the program (set! x char) reads a char from the stream and mutates x to refer to the char. In a printer, the same program dereferences a char from x and writes the char to the stream.

Note that in (define x 1), we do not set x to contain 1, but we set x to refer to 1.

Then we also want computers to store data, etc.

Computers are machines that help us do mathematics.

Mathematics is not limited to numbers. Mathematics is about unambiguous abstract thinking.

Computer manipulates bits; humans give meaning to computation (what a computer does).

I am impressed by how van Roy & Haridi 2004 [9] come up with alternative semantics.

We should not only make computers do something, but we should also make computers understand something, so that they can help us make them do something. When I first saw the delayed concurrent variable assignment semantics, I was amazed.

Finally, after all that hard work, we want to share our work. We want to improve our lives together.

2.9<2019-12-12> For hygiene, resolve references before expanding macros

For hygiene, references must be resolved (bound) before macros are expanded.

Example:

A = "There is Andrew."
B = "There is Bob. He is talking."
A B = "There is Andrew. There is Bob. He is talking."

Naïve syntactical concatenation of and A and B causes ambiguity.

But not if we resolve the references first.

RA = "There is Andrew."
RB = "There is Bob. He(Bob) is talking."
RA RB = "There is Andrew. There is Bob. He(Bob) is talking."

Ambiguity: The expression (f x) is ambiguous. If f refers to a procedure, the expression means "evaluate x to ex and then compute f(ex)". If f refers to a macro, the expression means "expand (f y) where y is a reference to x".

Term-rewriting rules / fexprs unify procedures and macros? The problem is we want to let the programmers how and when a fragment is expanded?

But it is possible to write an AST transformer that produces an invalid AST. For example, one can pull out a lexically scoped variable out of its scope.

We let the programmers decide. If they want hygiene, they can use AST transformers. If they don't want hygiene, they can use CST transformers.

2.10Semantics

What should a symbol mean? It usually means a hash-table lookup, where the symbol is the key and the environment is the hash table. But is there a better semantics?

In human languages, the meaning of a symbol is usually determined by agreement/consensus between the users of the symbol. For example, I can define "foobar" to mean "table" in a document, and the readers will be able to understand the document if they play along.

The meaning of a symbol may be defined in terms of the meaning of other symbols. For example, "chair" may be defined as a "seat with back rest".

In human languages, the irreducible meanings are the direct experiences (such as the concepts represented by "red", "sweet", "happy"). For other examples of irreducible meanings, see Semantic primitives and Natural semantic metalanguage.

In Assembly, the irreducible meanings are the meaning of the execution of an instruction; such meanings can be formalized as state transformers. For example, the meaning of executing inc rax is to mutate the machine state such that rax now contains the previous value of rax incremented by one, modulo \( 2^{64} \).

Perhaps we want something like Refal but in Lisp syntax?

How do we build meaning in mathematics? We may start from logic, axioms, natural numbers.

Jakobson's functions of language

In Lisps, the irreducible meanings are the meaning of the values, including the side-effects.

How do we distinguish between "Print 2 + 3" and "Print the result of calculating 2 + 3"? We use Use-mention distinction. See also B. C. Smith's PhD thesis.

A quoted word means itself.

Therefore, to design a programming language is to decide how to build meanings from a finite set of irreducible meanings. That is, how to build values.

However, meanings are inseparable from pragmatics. For example, the expected answer to "Can you pass me the salt?" is not the utterance "Yes", but the action of passing the salt.

interpret : Internal-Form -> Meaning
interpret : Abstract-Syntax -> Semantics

We must distinguish between an internal form and its external representation.

The read function transforms an external representation into an internal form?

(calculate (+ 1 2)) -> 3
(calculate (+ 1 2) into x) ???

A procedure can be thought of as a term rewriting rule (a reduction rule).

Should a define be interpreted as a hash-set! or as a rule definition?

Which syntax should we use to define a rule?

(rewrite x 1)

(rewrite (x) (f x) (+ x x))

(define-rewrite (forall (x) [(f x) (+ x x)]))

(rewrite (f :lit x :var) :to (+ x x))

(with-variables (x)
  (with-literals (f +)
    (with-undefined-symbols-as-literals
      (with-numeric-symbols-as-numbers
        (defrule (f 0) 1)
        (defrule (f x) (* x (f (- x 1))))
      ))))

(define-function (f x) (+ x x))

Should numeric symbols (symbols that look like numbers: symbols that consist of only digits) be treated as numbers? I think yes, because we have the vertical-bar syntax like |123| to mean arbitrarily named symbols, including non-number numeric symbols.

The meaning of a rule \( A \to B \) is to replace every matching occurrence of \(A\) with \(B\) in the current expression.

A function can be thought of as a rewriting rule; the function name matches literally; the function arguments match everything (are wildcards).

A symbol may be treated as a literal or a variable.

For example, in (define-function (f x) ...), the symbol f is a literal, and x is a variable.

In a function header, the pattern (head arg1 ... argN) matches every list that:

  1. has length N+1, and
  2. begins with something that has the same binding as head.

What should a list such as (x) mean?

What should a list such as (x y) mean?

2.10.1Term rewriting semantics?

For efficiency, we require that the head of a rule begins with a literal, so that we can index the rules for fast matching/retrieval.

The programmers are responsible for ensuring confluence by avoiding ambiguous/overlapping rules.

2.10.2Graph reduction semantics?

Should the semantics be formulated in terms of expression graph reductions/transformations?

An S-expression can be thought of representing a tree (or, more precisely, a graph).

A value can be thought of as an irreducible one-vertex graph.

2.11Security considerations

See file:secure.html.

The situation:

  • The programmer is who creates the program.
  • The user is who runs the program.
  • The programmer and the user may be two different people.

The problem: How does the user limit the maximum damage doable by the programmer?

If we want security, it cannot be an afterthought?

Performance considerations -> cost/performance model

Security considerations -> security/damage model

But the model is not the reality; we risk modeling the wrong thing.

Thus, in making claims about security, we prefer false negatives (the system is actually secure, but the model says it is insecure) to false positives (the system is actually insecure, but the model says it is secure).

In the end, a human has to verify whether the model's simplifying assumptions actually hold for the case at hand.

It is impractical for users to inspect the source code of every program they run. It is more practical for them to periodically backup their data periodically into an airgapped storage and periodically verify that those backups work.

However, what about data "theft": unwanted leakage of data?

You are buying a book for your child. How can you be sure that the book does not contain any material not suitable for children? We can hypothesize a language called Familyspeak with these properties:

  • Congress defines a set of allowed words in Familyspeak. Every other word is forbidden.
  • The police enforces that law: it goes to the store and verifies that every book that claims to be written in Familyspeak indeed complies to the law.

However, even though Familyspeak prevents words inappropriate for children, it does not prevent ideas inappropriate for children. For example, Familyspeak syntax may allow the words "eat", "your", and "parents", but the idea "eat your parents" is inappropriate for children. Appropriate words can be arranged to convey inappropriate ideas.

We assume that user U's running programmer P's program proves that user U trusts programmer P. (What if U runs P's program accidentally or unknowingly?)

2.12Arranging the puzzle pieces?

Now that we have the pieces of the puzzle (quotation, maximal sharing), how do we arrange them into a coherent picture?

3Run-time system

Belikov 2015 [2]

3.1On choosing the implementation language

We choose C++ as the implementation language because we don't know any better. We considered Rust and Go but we could not make up our minds. We refuse C because we want namespaces; we refuse to manually prefix every procedure name.

3.2Memory management

We use a garbage collector because we believe that that garbage collection greatly simplifies the language semantics. Also, we don't know how to implement a Lisp without garbage collection like Bone Lisp, Pre-Scheme, Carp, newLISP, Linear Lisp, and ThinLisp.

We use a copying garbage collector because we are convinced by Appel 1987 [1]1 that "[naïve copying] garbage collection can be faster than stack allocation".

The drawbacks of our simple choices are:

  • We have to overprovision physical memory if we want our programs to run at a reasonable speed.
  • We lose real-time guarantee; the program may pause for an unpredictable duration at inopportune times.

We may wish to do these later:

  • Improve the garbage collector to be generational and concurrent. Currently we stop the world while we collect garbage because we don't know how to do it concurrently.
  • Implement alternative garbage collectors and let the programmer choose.
  • Write a compiler for, say, PL-1, a language with manual memory management, and probably also static typing, on top of PL-0. Thus the real-time part of the program can be written in PL-1 while seamlessly interoperating with PL-0.

How do we trace the references?

What is a reasonably simple implementation?

class Object {
    // How do we maintain iterator state without new/malloc?
    // Can we just assume that the iterator state is always an intptr_t?
    // That holds for pair, list, vector, hash-table, but it does not hold for all types (CST)?

    What_Should_This_Be begin_tracing_references();

    // Or should we just abort when there is not enough C++ stack space?

    trace_references(Stack&);

    // Or should we invert the control?
    // In action, copy the object if it has not been visited.

    void for_each_reachable_object_do(Consumer<Object_Id> action);
};

class Pair : public Object {
    Pair_Tracing_Iterator_State tracing_iterator_state;
};

If we assume that garbage collection is single-threaded, we can put the iterator state in each instance of Object.

Perhaps it is obvious that, for simplicity, the garbage collection process itself should never allocate any heap memory.

I can't think how to do garbage collection (with depth-first search) without stack memory, so the program should just abort if it runs out of stack.

We want precise garbage collection. The price to pay is an extra level of indirection: Objects can only be indirectly accessed by passing an ObjectId to a World method, and cannot be directly accessed by raw C pointers.

3.3Converting C types

void, uintNt, intNt, intptrt, for N in {8,16,32,64}.

3.4Foreign interface, mostly C

We do not expect users to use this directly. The ideal thing for user is to make PL-0 understand C header files. That is, PL-0 should come with a C parser and preprocessor that translate signatures to PL-0 bridges. Compare: SWIG. (But why stop there; why not go all the way and write a C interpreter/compiler in PL-0?)

We should use libffi for portability.

Compare: Racket Foreign Interface.

Suppose there is a C procedure whose declaration is

Ret proc(Arg-1, ..., Arg-n)

and we want to call it from PL-0.

With power comes responsibility: The foreign interface enables users to crash the program.

We must represent the type and construct the reference.

A Type is any of these:

char
int
(unsigned int)
int32_t
uint32_t
(procedure Type (Type-1 ... Type-n))
(struct (Field-1 ... Field-k))
    where each Field-k is a list [name Type]
(union (Type-1 ... Type-n))

Reference constructors:

(ref Type Address)

Actions:

(read Ref) -> Val
(write Ref) -> Val
(call Ref) -> Val

We can obtain symbol addresses with dlsym.

3.5Values

What should the set of values (the irreducible meanings) in a programming language be?

Perhaps we all agree that the set of values must include at least some integers.

A value (an object) is any of these:

  • a representation of a mathematical object:
    • an integer (of arbitrary precision)
    • a pair (a cons cell)
    • a unit (like C void)
    • a boolean (false or true)
    • a byte string
  • a generic data structure:
    • a list
    • a vector (a heterogenous array)
    • a hash table
  • a structure used by the interpreter:
    • a namespace
    • an environment (a context)
    • a rule, function, macro, AST transformer
    • a type
  • a structure used by the parser:
    • a location
    • a concrete syntax tree (CST)
    • an abstract syntax tree (AST)
  • a structure used by the C interface
    • a C type representation
    • a C reference (a type and an address)

There are so many values; are we sure that all of them should be primitives?

Difference from common Lisps:

  • In PL-0, lists and pairs are different things.
  • PL-0 does not have nil.

3.6Do we need generic functions? The case of "append"

I want to write just append instead of list-append, vector-append, bytestring-append, etc. In other words, want append to be polymorphic.

What are my choices?

I can define append with cond.

But what if users also want to customize append?

They can define their own append using cond in their own namespaces and fall-back to the standard append.

Or I can define append to be a generic function.

But generic function becomes extremely tricky with subtyping. Julia solves this with a complete lattice of types. But do we have to deal with the unholy interaction between generics/polymorphism/multiple-dispatch and subtyping?

A combination of namespaces and cond is simpler than generic functions, and achieves closed ad-hoc polymorphism, but is it better?

3.7Representation of values

read-cst is similar to Racket's read-syntax, but read-cst reads comments, and the result of read-cst can be turned back to source code (textual representation).

read is implemented by calling read-cst and recursively discarding location information and comment nodes.

Unlike in other Lisps, in PL-0, the external representation of a pair is #pair(head tail), not (head . tail).

4Syntax and parsing

We use a recursive descent parser because we don't know any better.

4.1Reversibility, information-preservation

I insist that the parser be reversible, because I want traceability and debuggability.

Each stage must be reversible: it must either be a bijection or preserve enough information from the previous stage.

The first stage is character + location (defined later).

The next stage is tokenization.

A token has type and a list of characters.

The next stage is concrete syntax tree (CST).

The concrete syntax tree is required for formatting and refactoring, because those activities should preserve comments.

In Lisp syntax, a token coincides with an AST node.

The next stage is abstract syntax tree.

An AST node has a "main" CST node.

An AST node has a "preceding-whites" (a list of whitespace CST nodes that precede that AST node) so that the AST node can be turned back into CST node (and so on until we reach the original substring that constitutes the CST node).

The parser is a recursive descent parser because I don't know how to parse.

4.2Locations

A location is a tuple of path, line (0-based), column (0-based), byte-offset. This is like Racket srcloc.

current-location parameter

read from current location

raise-parse-error at current location

4.3Macro, reflection, reification, quoting

The language should be a model of itself.

The language should be able to describe itself.

Does that cause a paradox?

4.4Annotations: user-defined metadata attached to concrete syntax tree nodes

(Is this a good idea?)

We add these expression syntax rules:

  • If M is an expression and E is an expression, then E : M (read: data E annotated with metadata M) is an annotated expression.
    • Alternative syntax: E : M can also be written meta M E.

This generalizes type systems. With type systems, you annotate an expression with a type expression. With general annotations, you annotate an expression with another expression (some of which are type expressions).

We assume that the outermost metadata update wins:

  • meta M (meta N E) = meta M E

We add metadata extraction function symbol meta-of.

We add these beta-reduction rules:

  • reduce (meta M E) = reduce E
  • reduce (meta-of (meta M E)) = reduce M
  • reduce (meta-of E) = #<empty-record> (for expressions without metadata)

This is like Java/C# annotation but more principled?

Annotations are not types.

This is an example of type annotation that our annotation above can't handle: \ (x : T) -> y, because x is not an expression.

5<2019-11-27> Thought

It is easy to process a byte list into a token list.

The question is: How should we interpret that token list? How should we ascribe meaning to that token list? How should we map tokens to values?

The lowest layer is more like a library for manipulating tokens than a language.

A stream of bytes is translated into a stream of tokens. A token is either white or black. A token has location. A token list has location.

I want to use the same name "append" for appending lists and appending strings; I don't want "list-append" and "string-append". We can implement this with types or namespaces. I'm fine with explicitly-prefixed namespaces like this:

(define (example)
  (import list)
  (import string)
  (list:append '(1) '(2))
  (string:append "a" "b"))

Peter Van Roy's "Programming Paradigms for Dummies: What Every Programmer Should Know" https://www.info.ucl.ac.be/~pvr/VanRoyChapter.pdf

6Guide for embedding PL-0 in C++ programs

6.1PL-0 C++ conventions

The C++ namespace is stc_pl_0.

6.2Creating a virtual machine

Each instance of the Machine class is a virtual machine with operand stack, dictionary stack, return stack, and heap. The size of each memory area is fixed when the Machine is instantiated.

Machine machine;

6.3Executing programs

A program is a sequence of tokens. For example, "1" is a program that pushes the word 1 to the stack. The following is a program that consists of six tokens (1, space, 2, space, add, newline):

1 2 add
void            Machine::push_source (Token_Iterator&)
Token_Iterator& Machine::pop_source ()

A token iterator can be created from an in-memory token list or an in-disk source file. A file-based token-iterator maintains a location (path, line, column, byte offset).

A token is a byte string with location information (to keep track of its provenance).

Typically, Machine::step is called in a loop. An iteration in the execution loop goes like this, if we ignore errors:

  • read token
  • determine the executable of that token
  • execute that executable (a primitive, a value, a token, or a token list)

The step method executes at most one token. If the meaning of the token is a token list, then step creates a call frame and arranges the next step call to execute the first token of the subroutine.

The machine reads the current program from a token iterator.

6.4Creating primitives

A primitive is a foreign procedure that may mutate the machine state.

using Prim = void (Machine&);

A primitive must not throw any C++ exceptions.

6.5Quoting

The program quote W B pushes B to the operand stack where W is expected to be a white token.

6.6Macros

A macro is a procedure that transforms a prefix of the remaining program token stream.

A macro transforms a concrete syntax tree.

Important: Whitespaces are tokens too.

Macro : Cst -> Cst

6.7What?

% A B C muladd -> A*B+C

quote muladd { mul add } def

define (muladd x y z)
  x y mul z add
end

Curly braces delimit a token list?

Macros are ordinary functions.

quote reads the token right after the token currently being interpreted but does not execute it.

1 2 quote add -> 1 2 add
1 2 add -> 3

Type information can be attached to value (Scheme), variable (C++), or function (Assembly). If we want function polymorphism (Scheme display), then we must choose to attach type information at either value or variable.

Why choose? Why not attach type information everywhere (to values, variables, and functions)?

If we want read to produce a value (not a type-value pair), then values must carry type information.

In mathematics, it is natural to overload functions (such as +). Otherwise we would have +N, +Q, +R, etc. which is ugly. Do we care about what something is, or about what can we do with it?

PostScript enables the programmer to choose between early binding and late binding.

7<2019-11-28> The problem is not binding; the problem is closures

If we don't have closures, then it does not matter whether we use static (lexical) or dynamic binding; the result will be the same.

The problem is not static vs dynamic binding. The problem is: Should we have closures or not?

Why do we bother having closures if programmers can do explicit closure conversion? For example:

f x = \ y -> x + y
-- gets closure-converted to
f x = (\ x y -> x + y) x

8Bottom-up design?

8.1Example

  • Example of bottom-up language design and how each level reduces cognitive load:
    • Begin with machine code.
    • Provide mnemonics for instructions.
    • Provide the illusion of infinite custom-named registers and orthogonal operands.
    • Provide macros subroutines as extensible instructions.
    • Provide the illusion of infinite custom-named registers and orthogonal operands.
    • Provide macros and subroutines as extensible instructions.
    • Provide named locations.
    • Provide the illusion of infinite memory.
    • Abstract away processor registers.
    • Abstract away pointers.
    • Expression.
    • Infix expression syntax.
    • First-class functions.
    • The program itself is a procedural program that tells the interpreter what code to generate.
    • End up with something like Randall Hyde's High Level Assembly?

8.2Starting with assembly

PL-0 is slightly more abstract than typed assembly languages (TALs).

We may begin from x86 assembly.

First we abstract away locations, registers, memory, so that we can write something like this:

mov dword ptr [var_1], [var_2]

Macro Assembler (MASM)? TASM, NASM, what?

There does not exist a computer with infinite memory. Why do we pretend, with garbage collection, that the computer had infinite memory? Because it simplifies most problems?

What is the problem with these: High-Level Assembly, typed assembly languages such as TALx86 [3]2, LLVM IR, MSIL, JVM bytecodes?

We can add a type system to assembly language to enforce constraints like these:

  • "Add-integer" takes two integers.
  • "Add-pointer" takes a pointer of alignment N and an integer that is an integral multiple of N.
  • It is illegal to add two pointers.

For example, a type may be:

  • Integer N where N is 1, 2, 4, or 8
  • Pointer A where A is the alignment (1, 2, 4, or 8)

One difficulty is that the same register may sometimes contain an integer and sometimes contain a pointer.

We can "solve" that with Static Single Assignment (SSA) Form and automatic register allocation.

But perhaps the bigger issue is to abstract away the difference between processors; why should we care if it is an Intel processor, a Motorola processor, a Symbolics Lisp machine, or something else?

Even though the machine does not know about subroutines, we organize our programs into subroutines; we find it more convenient to work with subroutines than to work with instructions. We feel that the instructions are too finely-grained, unnecessarily detailed.

9How should programming languages be implemented?

9.1Which should we write: compilers or interpreters?

The original question was "Which should we write: compilers or interpreters?", but, it seems that the real question is "How should we implement programming languages?"

I want the answer because I am trying to implement a programming language and I do not want to waste my effort.

Should we make compilers or interpreters?

  • Fast code can only be generated by compilers, but the compiler itself may be written in an interpreted language.
  • Writing an interpreter is easier than writing a compiler, because writing a compiler requires creating representations of two languages (the source language and the target language) the in the host language, whereas writing a interpreter requires creating representation of one language (the source language).

What is their relationship? Does one subsume the other? Can we get/derive one from the other? I think this has been answered by Futamura 1999 [5]:

This paper reports the relationship between formal description of semantics (i.e., interpreter) of a programming language and an actual compiler. The paper also describes a method to automatically generate an actual compiler from a formal description which is, in some sense, the partial evaluation of a computation process. […]

To interpret is to give meaning to a form.

By "form", we mean symbols or representations.

To compile is to translate a form into another form with the same meaning.

For example, I interpret the English program "Buy food" and the Indonesian program "Beli makanan" as the same meaning: an order to buy food. On the other hand, I can compile (or translate) "Beli makanan" to "Buy food" for people who understand English but not Indonesian. My understanding of "food" is "something I can eat", but my understanding of "to eat" is a primitive that is built into me by Nature, my hardware designer. Similarly, my machine only understands machine code: the primitives that are built into it by its hardware designer.

In principle, we only need to write one compiler C from language H to machine code, and then we can write many interpreters in language H, such as an S-on-H interpreter I, and get an S-compiler by partially evaluating I(P) and C-compiling the result of the partial evaluation. See also: partial evaluation and Futamura projections. See also the book Jones, Gomard, & Sestoft 1993 [7].

The question boils down to: What is meaning? What do we mean by meaning?

Meaning is determined by convention, including context; meaning is determined by pragmatics.

Let us use mathematics to clarify what we mean by "compilers" and "interpreters".

There are three languages involved: Host H, Source S, and Target T.

A program can be thought of as a representation of a mathematical function.

An L-program is a program written in language L.

Note that (H,M,S,T) stands for (Host,Meaning,Source,Target).

An (H,M,S,T)-compiler C is an H-program that translates each S-program P to a T-program C(P) with the constraint M(P) = M(C(P)). The translation must preserve meaning, but does not have to be invertible. Almost always, we do not care about reconstructing P from C(P), except when we are reverse-engineering.

An (H,M,S)-interpreter I is an H-program that takes each S-program P and gives M(P). The result of an interpreter's interpretation of a program is then interpreted again by humans into meaning.

Both the example compiler and the example interpreter are written in the same host language H.

Good news from Gluck 2009 [6]?

Practical specializers that can perform all three Futamura projections and that can automatically convert programs into non-trivial generating extensions and compiler generators have been built for realistic programming languages such as Scheme, Prolog, and C […]

A compiler establishes an equivalence relation between its source language and its target language. (If we think of a language as a set of programs.)

Example of a tower of languages, upwards:

  • Semantics of L1 is defined in terms of the semantics of L0.
  • Semantics of Ln is defined in terms of the semantics of Ln-1.

But if we go downwards, it is a tower of mathematical models (of a physical system):

  • Semantics of L0 is defined in terms of logic circuit model.
  • Semantics of logic circuit model is defined in terms of the LEM (lumped element model)
  • Semantics of LEM is defined in terms of classical electromagnetism model.
  • etc.

Let L0, L1, …, Ln be languages.

"interpreter written in language L0 for language L1"

Compiler/Translator = Program in L1 -> Program in L0 Interpreter in L0 = Program in L1 -> Effect in L0

Programming languages (model-driven languages) have hit a limit; higher abstraction levels are impossible. AI is the highest level we will go without telepathy.

Imagine that you have to write the first assembler for the first processor. All you have are switchboards, instruction manuals, and machine code. You want to minimize your switchboarding, so you want to write the shortest program.

"hand-compile"

9.2How should we make programming tools such as compilers, interpreters, and editors?

9.3What meta-programming tools exist?

9.3.1Rascal MPL

https://www.rascal-mpl.org/

9.3.2Eclipse Xtext

9.3.3JetBrains MPS

MPS is "Meta Programming System".

A concept can have properties. Each property has a type. The property type system is limited to int, string, and regex-constrained string.

A member in a model is an instance of a concept, similar to how an object is an instance of a class in Java.

MPS is a tree editor, not a text editor.

A concept is an AST (abstract syntax tree) node type.

On 2017-08-12, MPS 2017.2 doesn't support Java 7 try-with-resources statements.

The MPS IntelliJ IDEA plugin allows you to use a language from IDEA, but not defining your own language. You need the MPS IDE for that.

External links:

9.4Meta-programming and language-oriented programming?

The Racket manifesto3: programming-language programming language

miniKanren, scheme logic programming http://minikanren.org/ https://github.com/clojure/core.logic/wiki/A-Core.logic-Primer

2000 article "Domain Specific Meta Languages" https://www-users.cs.umn.edu/~evw/pubs/vanwyk00sac/vanwyk00sac.pdf

1996 book "Advanced programming language design" 2008 article "Position paper: Practical foundations for programming languages" 2012 book "Practical Foundations for Programming Languages" Version 1.32 of 05.15.2012 http://profs.sci.univr.it/~merro/files/harper.pdf

University of Arizona, Spring 2006, CS 520 Principles of Programming Languages - Lecture 04: Types and Polymorphism https://www2.cs.arizona.edu/classes/cs520/spring06/04types.pdf from "Lecture 4: higher polymorphism" https://blog.inf.ed.ac.uk/apl16/archives/178/comment-page-1

Programming Language Foundations in Agda https://plfa.github.io/

2018 article "Logic Programming as a Service" https://arxiv.org/abs/1806.02577

Liber amicorum for Doaitse Swierstra https://www.reddit.com/r/haskell/comments/1hmc9t/pdf_liber_a_for_doaitse_swierstra_read_free/

1994 article "Efficient Self-Interpretation in Lambda Calculus" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.4382&rep=rep1&type=pdf

Lambda the Ultimate: Meta-programming http://lambda-the-ultimate.org/taxonomy/term/15

2009 article "Directly Reflective Meta-Programming" http://homepage.divms.uiowa.edu/~astump/papers/archon.pdf

Footnote F from [4]:

Language workbenches (such as Spoofax) deal with conventional syntax for DSLs but do not support the incremental modification of existing languages. A 2015 report suggests, however, these tool chains are also converging toward the idea of language creation as language modification. We conjecture that, given sufficient time, development of Racket and language workbenches will converge on similar designs.

9.5Implementing programming languages

9.6Should we use Prolog?

We should prototype our language in Prolog.

We should implement our language in Prolog.

Erlang started out as a DSL in Prolog. See 1992 article "Use of Prolog for developing a new programming language".

<2018-10-20> Change of opinion: we should write the language in Prolog instead of Haskell.

  • "Ott is a tool for writing definitions of programming languages and calculi. It takes as input a definition of a language syntax and semantics, in a concise and readable ASCII notation that is close to what one would write in informal mathematics." https://www.cl.cam.ac.uk/~pes20/ott/

<2018-12-11> My current answer: Prolog.

My previous answers:

  • Haskell
  • Racket
  • Scheme
  • Java
  • C
  • C++

Other people?

9.7Write abstract interpreters, not compilers?

<2018-12-30>

The same code fragment can be interpreted in several ways.

The most common interpreter executes the program with the intended semantics. Example: a Python interpreter interprets the Python program "print 'foo'" as printing the string.

Write an abstract interpreter that emits code when interpreting. An interpreter that interprets the Python program "print 'foo'" as "emit a Ruby statement that prints 'foo' to screen when executed".

9.8Begin with an interpreter, not a compiler

  • Don't make a compiler? Make an interpreter instead, and stage it? Turn an interpreter into a compiler for free?
  • "To stage an interpreter" is to add staging annotations to the code of the interpreter.
  • Staging is similar to quoting in Lisp/Scheme.
  • 2004 article "A Gentle Introduction to Multi-stage Programming" pdf
  • multi-stage programming for Scala https://scala-lms.github.io/
  • 2006 article "A Verified Staged Interpreter is a Verified Compiler" pdf

9.9Making compilers?

Every compiler does name resolution / symbol table. Is there a compiler that doesn't do that? Forth?

9.10Piggybacking a host language

9.11How should lambda-calculus be implemented?

9.11.1What is an operational semantics of lambda calculus?

9.11.2How?

Normal-order reduction enables us to write fixed points. Should we let the programmer choose the evaluation strategy? Currying simplifies reasoning but complicates implementation (because applications may then nest deeply to the left). What is optimal reduction?45

Lambda-calculus is unsound.6 What does that imply about programming languages containing lambda calculus?

Let \( A[B := C] \) mean \(A\) but with each free occurrence of \(B\) replaced with \(C\). Let \( eval(A,B) \) means that \(A\) normalizes to \(B\).

Applicative-order evaluation is the easiest to implement.

Where do these things fit in the big picture of lambda-calculus implementations? G-machine, STG, GRIN7.

Reading queue:

10All programming is maintenance?

A point of view: All programming can be thought of as modifying an existing program. The act of creating a new program can be thought of as modifying the empty program.

11Bibliography

[1] Appel, A.W. 1987. Garbage collection can be faster than stack allocation. Information Processing Letters. 25, 4 (1987), 275–279.

[2] Belikov, E. 2015. Language run-time systems: An overview. 2015 imperial college computing student workshop (iccsw 2015) (2015).

[3] Crary, K. et al. 1999. TALx86: A realistic typed assembly language. 1999 acm sigplan workshop on compiler support for system software atlanta, ga, usa (1999), 25–35.

[4] Felleisen, M. et al. 2018. A programmable programming language. Commun. ACM. 61, 3 (Feb. 2018), 62–71. DOI:https://doi.org/10.1145/3127323. url: <https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-programmable-prog-lang/paper.pdf>.

[5] Futamura, Y. 1999. Partial evaluation of computation process–an approach to a compiler-compiler. Higher-Order and Symbolic Computation. 12, 4 (1999), 381–391.

[6] Glück, R. 2009. Is there a fourth futamura projection? Proceedings of the 2009 acm sigplan workshop on partial evaluation and program manipulation (2009), 51–60.

[7] Jones, N.D. et al. 1993. Partial evaluation and automatic program generation. Peter Sestoft. url: <http://www.itu.dk/~sestoft/pebook/jonesgomardsestoft-a4.pdf>.

[8] McCarthy, J. 1959. Recursive functions of symbolic expressions and their computation by machine, part i. Commun. ACM. 3, (1959), 184–195.

[9] Van-Roy, P. and Haridi, S. 2004. Concepts, techniques, and models of computer programming. MIT press.


  1. via Basile Starynkevitch

  2. <2019-11-04> https://www.cis.upenn.edu/~stevez/papers/MCGG99.pdf

  3. http://felleisen.org/matthias/manifesto/index.html

  4. https://stackoverflow.com/questions/31223539/is-it-possible-to-evaluate-lambda-calculus-terms-efficiently

  5. https://en.wikipedia.org/wiki/Lambda_calculus#Optimal_reduction

  6. https://en.wikipedia.org/wiki/Fixed-point_combinator

  7. https://github.com/grin-tech/grin