danfuzz@milk.comThis is a tutorial covering three different things that one often wants to do with an ANTLR parser:
The code presented here was written to use ANTLR 2.7.1, so things may not work if you are using an older version, or may work better if you are using a newer version.
This tutorial is mostly embodied by the comments in the source code itself, but it will help to know what this program actually accomplishes.
The program, as its name implies, knows how to add. In particular, it
can add integers by performing a mathematical sum operation, and it can add
strings by performing concatenation. The same operator, "+"
suffices for both operations.
Additionally, it can convert numbers into strings, denoted with the
"$" prefix operator, and it can convert strings into numbers,
denoted with the "#" prefix operator. And, finally, one may
group operations with parentheses.
Number literals are simple a sequence of digits, optionally preceded by
a minus sign. String literals are a sequence of arbitrary 7-bit ASCII
characters, enclosed in double quotes, with several standard C-like
escapes (e.g., "\n") supported.
A program in this language consists of a series of expressions, each
followed by a semicolon. The result of executing a program is that the
value of each expression is written to standard output, one per line.
String values are printed with double quotes around them, numeric values
are printed as-is, and any expression that contained an error is printed
out as "(error)".
If any expression printed out as "(error)", then before any
expression values are printed, there will be one or more error lines. Each
one is of the following form, which is identical to the form of a Caml
error (which the Emacs error parser already understands):
File "file-name", line[s]
start[-end], character[s]
start[-end]
Below the error message, is the snippet of source that contains the error, "underlined" by using dashes on the following line. For example:
File "test.txt", line 14, columns 8-17: Type mismatch for add operator.
1 + ("foo" + 4);
--------- |
The error messages are as follows:
"Can only numberify strings.""#" operator
to an expression whose value is already a number."Can only stringify numbers.""$" operator
to an expression whose value is already a string."Type mismatch for add operator.""+" operator
to a string and a number, as opposed to a pair of strings or a pair
of numbers.Here is an example input file:
1;
(2);
#"3";
$4;
2 + 3;
"s" + "ix";
#"4" + 3;
"e" + $(-1 + 2) + "ght";
#("4" + "2") + -33;
#10;
$"eleven";
12 + "twelve";
"thirteen" + 13;
1 + ("foo" + 4);
(1 +
2 +
3 +
4 +
5)
+
("foo" +
"bar" +
"baz"); |
Here is the output it generates (assuming its name is
test.txt):
File "test.txt", line 10, columns 3-6: Can only numberify strings.
#10;
---
File "test.txt", line 11, columns 5-14: Can only stringify numbers.
$"eleven";
---------
File "test.txt", line 12, columns 7-20: Type mismatch for add operator.
12 + "twelve";
-------------
File "test.txt", line 13, columns 5-20: Type mismatch for add operator.
"thirteen" + 13;
---------------
File "test.txt", line 14, columns 8-17: Type mismatch for add operator.
1 + ("foo" + 4);
---------
File "test.txt", lines 15-23, columns 2-7: Type mismatch for add operator.
(1 +
---
...
"baz");
------
1
2
3
"4"
5
"six"
7
"e1ght"
9
(error)
(error)
(error)
(error)
(error)
(error) |
In order to build this program:
CLASSPATH includes both an ANTLR
installation and the directory where the example source code lives."add.g".Adder (in the anonymous, top-level
package), giving it a source file as its commandline argument.For example, on my machine, the ANTLR distribution is located in
/usr/local/lib/antlr, and I run the bash shell,
so I can do this:
CLASSPATH=/usr/local/lib/antlr:. export CLASSPATH java antlr.Tool add.g javac -d . *.java java Adder test.txt |
A Makefile is provided with this tutorial, which may
work for you after a modicum of tweaking.
Here is a list of all of the source files contained in this example, and a brief description of each. See the comments in the files themselves for more details.
add.g: The ANTLR grammar file.Adder.java: This just has the
main() method for the program. I suggest that you start your
tour of the code by reading this file.ExtentLexerSharedInputState.java:
This is a subclass of the ANTLR-provided class
LexerSharedInputState, which is needed in order to expose
accessors for the current line and column and relay information about
the name of the file being parsed.ErrorFormatter.java:
This class just contains a static method that does the full formatting
of error messages, including grabbing the text in error out of source
files.ExtentToken.java: This is
a subclass of the ANTLR-provided class CommonToken, which
has been augmented to maintain full extent information (start and
end position, as well as file of origin).Makefile: A simple Makefile,
which works for me but probably won't for you, at least unless you edit
the ANTLR_HOME definition.test.txt: A simple test program
to test the system with.TokenAST.java: This is a
subclass of the ANTLR-provided class BaseAST, which, instead
of storing text and type directly (as ANTLR's CommonAST
does), simply points at a Token instance and refers to it as
needed.ValueExtentToken.java: This is a
subclass of ExtentToken (above), which adds a field to
contain an arbitrary value.A tarball of this document and all of the source may be found in the MILK Kodebase downloads directory.
I hereby place this tutorial, including all source code, in the public domain. However, it is my fervent hope that if you find this of use, you will see fit to give me a modicum of credit.
Thanks.
Dan Bornstein
danfuzz@milk.com