ANTLR Adder Tutorial

Extent Tracking, Tokens with Values, and Error Reporting

version 1.3

by Dan Bornstein, danfuzz@milk.com

3-Feb-2001


Introduction

This is a tutorial covering three different things that one often wants to do with an ANTLR parser:

  1. Maintain full extent information (start position and end position as well as file of origin) with tokens. In this case, it is used to make error reports more informative. As shipped, ANTLR only automatically keeps track of the start positions of tokens and doesn't know what file a token originally came from.
  2. Associate an arbitrary value with each token. In this case, they are used to hold the parsed values of literals and the intermediate results of computation.
  3. Label ASTs with tokens, rather than just with text and a token type. In this case, this functionality is used for a couple reasons, including using the token associated with an AST to hold the result of computing the expression it denotes and using the tokens associated with an AST and all of its children in order to report the full extent of an error.

The code presented here was written to use ANTLR 2.7.1, so things may not work if you are using an older version, or may work better if you are using a newer version.

The Example

This tutorial is mostly embodied by the comments in the source code itself, but it will help to know what this program actually accomplishes.

The program, as its name implies, knows how to add. In particular, it can add integers by performing a mathematical sum operation, and it can add strings by performing concatenation. The same operator, "+" suffices for both operations.

Additionally, it can convert numbers into strings, denoted with the "$" prefix operator, and it can convert strings into numbers, denoted with the "#" prefix operator. And, finally, one may group operations with parentheses.

Number literals are simple a sequence of digits, optionally preceded by a minus sign. String literals are a sequence of arbitrary 7-bit ASCII characters, enclosed in double quotes, with several standard C-like escapes (e.g., "\n") supported.

A program in this language consists of a series of expressions, each followed by a semicolon. The result of executing a program is that the value of each expression is written to standard output, one per line. String values are printed with double quotes around them, numeric values are printed as-is, and any expression that contained an error is printed out as "(error)".

If any expression printed out as "(error)", then before any expression values are printed, there will be one or more error lines. Each one is of the following form, which is identical to the form of a Caml error (which the Emacs error parser already understands):

File "file-name", line[s] start[-end], character[s] start[-end]

Below the error message, is the snippet of source that contains the error, "underlined" by using dashes on the following line. For example:

File "test.txt", line 14, columns 8-17: Type mismatch for add operator.
  1 + ("foo" + 4);
       ---------

The error messages are as follows:

Here is an example input file:

1;
  (2);
    #"3";
      $4;
        2 + 3;
      "s" + "ix";
    #"4" + 3;
  "e" + $(-1 + 2) + "ght";
#("4" + "2") + -33;
  #10;
    $"eleven";
      12 + "twelve";
    "thirteen" + 13;
  1 + ("foo" + 4);
(1 +
2 +
3 +
4 +
5)
+
("foo" +
 "bar" +
 "baz");

Here is the output it generates (assuming its name is test.txt):

File "test.txt", line 10, columns 3-6: Can only numberify strings.
  #10;
  ---

File "test.txt", line 11, columns 5-14: Can only stringify numbers.
    $"eleven";
    ---------

File "test.txt", line 12, columns 7-20: Type mismatch for add operator.
      12 + "twelve";
      -------------

File "test.txt", line 13, columns 5-20: Type mismatch for add operator.
    "thirteen" + 13;
    ---------------

File "test.txt", line 14, columns 8-17: Type mismatch for add operator.
  1 + ("foo" + 4);
       ---------

File "test.txt", lines 15-23, columns 2-7: Type mismatch for add operator.
(1 +
 ---
...
 "baz");
------

1
2
3
"4"
5
"six"
7
"e1ght"
9
(error)
(error)
(error)
(error)
(error)
(error)

Building It

In order to build this program:

  1. Make sure your Java CLASSPATH includes both an ANTLR installation and the directory where the example source code lives.
  2. Run ANTLR on the file "add.g".
  3. Compile all of the Java source files, including the ones just generated by ANTLR.
  4. Run the class Adder (in the anonymous, top-level package), giving it a source file as its commandline argument.

For example, on my machine, the ANTLR distribution is located in /usr/local/lib/antlr, and I run the bash shell, so I can do this:

CLASSPATH=/usr/local/lib/antlr:.
export CLASSPATH
java antlr.Tool add.g
javac -d . *.java
java Adder test.txt

A Makefile is provided with this tutorial, which may work for you after a modicum of tweaking.

Brief Tour of the Source

Here is a list of all of the source files contained in this example, and a brief description of each. See the comments in the files themselves for more details.

Tarball

A tarball of this document and all of the source may be found in the MILK Kodebase downloads directory.

Copyright

I hereby place this tutorial, including all source code, in the public domain. However, it is my fervent hope that if you find this of use, you will see fit to give me a modicum of credit.

Thanks.

Dan Bornstein
danfuzz@milk.com