Object language embedding

Standard ML is often used to implement another language L, for example, the syntax of the HOL logic in hol90 or the syntax of CCS for the Concurrency Workbench. Typically, one defines the abstract syntax of L by a datatype declaration. Then useful functions over the datatype can be defined (such as finding the free variables of a formula when L is a logic). Soon afterwards, one concludes that concrete syntax is easier for humans to read than abstract syntax, and so writes a parser and prettyprinter for L.

In the situation just outlined, ML is called the metalanguage, and L is called the object language, or OL. (Edinburgh/INRIA/Cambridge ML, the precursor to Standard ML, was originally a programming metalanguage for a particular object language, the LCF logic.) The purpose of a quotation/antiquotation mechanism is to allow one to embed expressions in the object language's concrete syntax inside of ML programs, and to mix the object language expressions with ML expressions.

Quotation and Antiquotation

The quote/antiquote mechanism is enabled by setting
Compiler.Control.quotation : bool ref
to true. Then the backquote character ` ceases to be legal in symbolic identifiers, and takes on a special meaning.

A quotation is a special form of literal expression that represents the concrete syntax of an OL phrase. The backquote character ` is used to delimit quotations.

For a running example, suppose our OL is a simple propositional logic with propositions represented as values of abstract type prop. We might wish to write propositional expressions such as A/\B/\C.

The most common approximation to quotation is strings. This is not pleasant at times, especially when dealing with backslashes and newlines. Still, strings are bearable. Strings are not adequate, however, for the following idea.

The ML-OL relationship invites a notion of antiquotation: the temporary abandonment of parsing so that an ML value can be spliced into the middle of a quotation. Operations like this have cropped up under various names in various places: antiquote is due to Milner; Quine had a version called quasi-quotation in his 1940 book; Carnap used a notation much like it. It also closely resembles the Lisp backquote facility.

Using backquote, we write

- val f = `A /\ B \/ C`;
val f = [QUOTE "A /\\ B /\\ C"] : 'a SMLofNJ.frag list
The compiler interprets this as a list of "fragments", using the frag data type from the SMLofNJ structure:
structure SMLofNJ = struct
  . . .
  datatype 'a frag = QUOTE of string | ANTIQUOTE of 'a

More commonly, we invoke an OL parser to parse, enforce precedence, etc. By naming the parser something concise, such as %, we can use the syntax

- val % = my_proposition_parser;
val % : prop frag -> prop
- val p = %`A /\ B \/ C`;
val p = -: prop

An antiquote is written as a caret ^ followed by either an SML identifier or a parenthesized SML expression. Antiquotation can be used to conveniently express contexts, which are often used as a descriptive tool for syntax. A context could be defined as a function taking a prop and directly placing it at a location in a quotation.

- fun foo a = %`^a ==> A`;
val foo : prop -> prop
In this case, foo p would denote the same proposition as
    %`(A /\ B \/ C) ==> A`

Antiquotations can have nested quotations (which may contain antiquotes of their own, etc.):

- let val K x y = x
      val I x = x
  %`A /\ ^(K (%`B`) (I (%`C`)))
      \/ C`
gives the same prop as that denoted by p. We note in passing that the power of the OL parser is completely up to its author: for example, in the framework offered here, one could write an OL ``parser'' for Scheme that parses program plus arguments, evaluates the program on the arguments, and finally prints the returned value.

Implementation of OL parsers

A concrete syntax quotation is mapped by the SML compiler into a frag list. Intuitively, a frag is a contiguous part of a quotation: `A /\ B` maps to [ QUOTE "A /\\ B" ] while `^x /\ ^y` maps to
(Note that the names frag, QUOTE, and ANTIQUOTE need to be qualified by the SMLofNJ structure-id, in practice.)

In this approach, the value of a quotation has type ol frag list where ol is the type of object language expressions; the type of the OL parser is ol frag list -> ol.

The OL parser (in our example, %) must handle these lists and insert the antiquoted ML values in the right places.


Often one wants to parse stratified languages, such as first order logic, or typed lambda calculus, which requires a trick. Also, there is a bit of trickery when one wants to deal with ML-Yacc and ML-Lex, especially when functorizing the parser.


| SML/NJ Home Page |
| SML/NJ Documentation Home Page |

Send your comments to sml-nj@research.bell-labs.com
Copyright © 1998, Lucent Technologies; Bell Laboratories.