This is a tutorial covering three different things that one often wants to do with an ANTLR parser:
The code presented here was written to use ANTLR 2.7.1, so things may not work if you are using an older version, or may work better if you are using a newer version.
This tutorial is mostly embodied by the comments in the source code itself, but it will help to know what this program actually accomplishes.
The program, as its name implies, knows how to add. In particular, it
can add integers by performing a mathematical sum operation, and it can add
strings by performing concatenation. The same operator,
suffices for both operations.
Additionally, it can convert numbers into strings, denoted with the
"$" prefix operator, and it can convert strings into numbers,
denoted with the
"#" prefix operator. And, finally, one may
group operations with parentheses.
Number literals are simple a sequence of digits, optionally preceded by
a minus sign. String literals are a sequence of arbitrary 7-bit ASCII
characters, enclosed in double quotes, with several standard C-like
A program in this language consists of a series of expressions, each
followed by a semicolon. The result of executing a program is that the
value of each expression is written to standard output, one per line.
String values are printed with double quotes around them, numeric values
are printed as-is, and any expression that contained an error is printed
If any expression printed out as
"(error)", then before any
expression values are printed, there will be one or more error lines. Each
one is of the following form, which is identical to the form of a Caml
error (which the Emacs error parser already understands):
File "file-name", line[s] start[-end], character[s] start[-end]
Below the error message, is the snippet of source that contains the error, "underlined" by using dashes on the following line. For example:
File "test.txt", line 14, columns 8-17: Type mismatch for add operator. 1 + ("foo" + 4); ---------
The error messages are as follows:
"Can only numberify strings."
"#"operator to an expression whose value is already a number.
"Can only stringify numbers."
"$"operator to an expression whose value is already a string.
"Type mismatch for add operator."
"+"operator to a string and a number, as opposed to a pair of strings or a pair of numbers.
Here is an example input file:
1; (2); #"3"; $4; 2 + 3; "s" + "ix"; #"4" + 3; "e" + $(-1 + 2) + "ght"; #("4" + "2") + -33; #10; $"eleven"; 12 + "twelve"; "thirteen" + 13; 1 + ("foo" + 4); (1 + 2 + 3 + 4 + 5) + ("foo" + "bar" + "baz");
Here is the output it generates (assuming its name is
File "test.txt", line 10, columns 3-6: Can only numberify strings. #10; --- File "test.txt", line 11, columns 5-14: Can only stringify numbers. $"eleven"; --------- File "test.txt", line 12, columns 7-20: Type mismatch for add operator. 12 + "twelve"; ------------- File "test.txt", line 13, columns 5-20: Type mismatch for add operator. "thirteen" + 13; --------------- File "test.txt", line 14, columns 8-17: Type mismatch for add operator. 1 + ("foo" + 4); --------- File "test.txt", lines 15-23, columns 2-7: Type mismatch for add operator. (1 + --- ... "baz"); ------ 1 2 3 "4" 5 "six" 7 "e1ght" 9 (error) (error) (error) (error) (error) (error)
In order to build this program:
CLASSPATHincludes both an ANTLR installation and the directory where the example source code lives.
Adder(in the anonymous, top-level package), giving it a source file as its commandline argument.
For example, on my machine, the ANTLR distribution is located in
/usr/local/lib/antlr, and I run the
so I can do this:
CLASSPATH=/usr/local/lib/antlr:. export CLASSPATH java antlr.Tool add.g javac -d . *.java java Adder test.txt
Makefile is provided with this tutorial, which may
work for you after a modicum of tweaking.
Here is a list of all of the source files contained in this example, and a brief description of each. See the comments in the files themselves for more details.
add.g: The ANTLR grammar file.
Adder.java: This just has the
main()method for the program. I suggest that you start your tour of the code by reading this file.
ExtentLexerSharedInputState.java: This is a subclass of the ANTLR-provided class
LexerSharedInputState, which is needed in order to expose accessors for the current line and column and relay information about the name of the file being parsed.
ErrorFormatter.java: This class just contains a static method that does the full formatting of error messages, including grabbing the text in error out of source files.
ExtentToken.java: This is a subclass of the ANTLR-provided class
CommonToken, which has been augmented to maintain full extent information (start and end position, as well as file of origin).
Makefile: A simple Makefile, which works for me but probably won't for you, at least unless you edit the
test.txt: A simple test program to test the system with.
TokenAST.java: This is a subclass of the ANTLR-provided class
BaseAST, which, instead of storing text and type directly (as ANTLR's
CommonASTdoes), simply points at a
Tokeninstance and refers to it as needed.
ValueExtentToken.java: This is a subclass of
ExtentToken(above), which adds a field to contain an arbitrary value.
A tarball of this document and all of the source may be found in the MILK Kodebase downloads directory.
I hereby place this tutorial, including all source code, in the public domain. However, it is my fervent hope that if you find this of use, you will see fit to give me a modicum of credit.