HABA parser

The parser used to parse HABA grammars can also parse the grammar it outputs. Rather, it outputs JavaScript programs that can be parsed by itself.

How to use

The parser is contained in a single file, parser.js, which can be downloaded and installed on your computer or server. To call it from HTML, write the following line in the <head> block to load the file.

<script src="./parser.js"></script>

You can also load your own grammar program (e.g. multi.js) in the same way.

<script src="./multi.js"></script>

The grammar program requires a Grammar object and a Converter object.

You can then parse the strings by passing them to the Parser object with the <script> tag or in an external JavaScript file. For example, a program that performs lexical and syntactic analysis of an input string would look like this:

<script>
const parser = new Parser(Grammar, Converter);
const result = parser.tokenize(input);
const outcome = parser.parse(result.tokens);
</script>

Objects for parsing

When parser.js is loaded, the following objects are defined in the global namespace. Parser and Tree objects are used externally; all others are used internally only.

Parser: This object performs lexical and syntactic analysis.
StateStack: This object is a stack that holds the state and parse tree simultaneously.
Token: This object holds the token after lexical analysis.
Tree: This object holds the parse tree after syntactic analysis.

Parser

The constructor creates a new Parser object.

new Parser(grammar, converter)

grammar: A grammar object; requires flag, terminals, dummies, rules, and table properties.; Normally, the Grammar object output by HABA is used as is.
converter: A syntax converer; requires methods with the same name as the non-terminal symbol that appears in the grammar.; Normally, the Converter object output by HABA is processed and used.

Parser.prototype.parse()

This is an instance method that performs syntactic analysis.

parse(tokens)

tokens: An array of tokens.; Normally, the tokens property of the return value of tokenize() is passed as is.

Returns an object with the following properties.

invalid: This property exists when parsing fails, and contains the token string after the point of failure.
tree: If parsing succeeds, the resulting parse tree is stored, otherwise null is stored.
valid: This property exists when parsing fails, and contains the token string from the beginning to the successful portion.

Parser.prototype.tokenize()

This is an instance method that performs lexical analysis.

tokenize(text)

text: Input string to be analyzed.

Returns an object with the following properties.

invalid: This property exists when tokenization fails, and contains the input string after the point of failure.
tokens: If tokenization succeeds, the resulting list of tokens is stored, otherwise null is stored.
valid: This property exists when tokenization fails, and contains the token string from the beginning to the successful portion.

Tree

Syntax tree. This is not created anew, but is the result of a call to the parse() method.

Tree.prototype.children

An array of the direct children of this element. All child elements are Tree objects.

Tree.prototype.label

A terminal or non-terminal symbol that is defined by grammar.

Tree.prototype.text

If the label property is a terminal symbol, its input string is stored, otherwise it is an empty string.

Other properties

Depending on the implementation of the Converter object, new properties are added to the Tree object or existing properties are modified. For the Num method contained in the Converter object of the addition grammar shown in the explanation,

// Num ::= "[0-9]+" ;
"Num": function(tree) {
},

the argument tree is as follows:

tree = {
    children: [
        0: tree = {
            children: [],
            label: "'[0-9]+'",
            text: "1"
        }
    ],
    label: "Num",
    text: ""
}

Since the parent tree represents the non-terminal symbol Num, its label property is "Num" and its text property is an empty string. Its children property is the definition part of the Num rule, which in this case contains only the terminal symbol "[0-9]+". The value "1" for the text property is the string that was actually entered.

The addition process requires only the text "1", which is converted to the number 1 and added as the result property of the parent tree.

// Num ::= "[0-9]+" ;
"Num": function(tree) {
    tree.result = parseInt(tree.children[0].text, 10);
},

Next, the Multi method,

// Multi ::= Num ('+' Num)* ;
"Multi": function(tree) {
},

the argument tree is as follows:

tree = {
    children: [
        0: tree = {
            children: [...],
            label: "Num",
            result: 1,
            text: ""
        },
        1: tree = {
            children: [],
            label: "'+'",
            text: "+"
        },
        2: tree = {
            children: [...],
            label: "Num",
            result: 2,
            text: ""
        }
    ],
    label: "Multi",
    text: ""
}

Since the method of the child element Num is called first, its tree already has the result property.

Here we calculate 1 + 2. Addition can continue as 1 + 2 + 3, in which case the number of elements in the children increases. Thus, adding all the values of the 0th, 2nd, 4th, ... of the child elements are all added together to obtain the overall result.

// Multi ::= Num ('+' Num)* ;
"Multi": function(tree) {
    tree.result = tree.children[0].result;
    for (let i = 2; i < tree.children.length; i += 2) {
        tree.result += tree.children[i].result;
    }
},

The result is now stored in outcome.tree.result in the return value of the parse() method.