HABA notation
HABA handles context-free grammar. Grammar is generally expressed in terms of terminal symbols, non-terminal symbols, production rules, and a start symbol.
Production rules
A production rule has the following format:
name ::= definition ;
The rule name is one non-terminal symbol. A rule definition consists of zero or more terminal and non-terminal symbols.
When there are multiple production rules, the rule name of the first rule is used as the start symbol. For example, if there is the following sequence of rules,
Plus ::= Num '+' Num ; Num ::= "[0-9]+" ;
the start symbol is Plus. Please note that the wrong start symbol will not produce the intended grammar.
You can explicitly specify epsilon transitions because the rule definition can contain zero symbols.
Epsilon ::= ;
Production rules are free format. White spaces (spaces, tabs, and line breaks) are ignored. For example, the rule
Plus ::= Num '+' Num ;
can also be written as
Plus::=Num'+'Num;
or
Plus ::= Num '+' Num ;
All produce the same production rule.
Terminal symbols
There are two ways to write it.
type | how to write | example |
---|---|---|
fixed value | enclose in single quotes | '+' |
regular expression | enclose in double quotes | "[0-9]+" |
You can use all JavaScript regular expressions for regular expression type. If you want to use a special character as it is, put \ before it and escape it. If you want to use the \ itself, put a \ in front of it and say \\. However, only the double quotes are "" instead of \".
When writing a single quotation mark in a fixed value, overlap the single quotation mark to make ' '. There is no other special way to write a fixed value. When writing a double quotation mark in a fixed value, write only one ".
Non-terminal symbols
A non-terminal symbol is a string that begins with an alphabet or underscore, followed by alphabets, underscores, and numbers. The alphabet is case-sensitive.
Non-terminal symbols begin with an uppercase letter on this page, but this is not required. Name, name, NAME, etc. are all valid non-terminal symbols.
Concatenation
If two or more symbols appear in sequence, write them consecutively.
Concatenation ::= Term1 Term2 ;
For example, the production rule that represents a character string in which a number, a plus sign, and another number appear in order is:
Plus ::= Num '+' Num ;
There must be at least one white space between non-terminal symbols and between the same type of terminal symbols.
Alternatives
If any of the two or more symbols appear, connect them with a vertical bar.
Alternatives ::= Term1 | Term2 ;
For example, the production rule that expresses one of the four arithmetic operators is as follows.
Operator ::= '+' | '-' | '*' | '/' ;
You can rewrite to multiple production rules with the same rule name as follows:
Alternatives ::= Term1 ; Alternatives ::= Term2 ;
Note that "you can rewrite" means that the accepted languages are the same, but the results of parsing are not exactly the same.
Repetition
You can use the following repeating symbols if the symbol is appeared, not appeared, or can appear any number of times.
symbol | meaning |
---|---|
? | 0 times or 1 time |
* | 0 or more times |
+ | 1 or more times |
For example, the production rule that a positive number may or may not be preceded by a plus sign is:
Positive ::= '+'? Absolute ;
And the production rule that a sentence consists of one or more words is:
Sentence ::= Word+ ;
Each repeating symbol can be rewritten as:
// Expr ::= Term? ; Expr ::= Term ; Expr ::= ; // Expr ::= Term* ; Expr ::= Expr Term ; Expr ::= ; // Expr ::= Term+ ; Expr ::= Expr Term ; Expr ::= Term ;
Grouping
If you want to change the interpretation priority in the rule definition, enclose the symbols in parentheses. The order of priority when not grouping is as follows.
priority | element | example |
---|---|---|
1 | Repetition | Term* |
2 | Concatenation | Term1 Term2 |
3 | Alternatives | Term1 | Term2 |
For example,
Expr ::= A | B C ;
represents "A or BC", and
Expr ::= (A | B) C ;
represents "A or B" followed by C, or "AC or BC". Another example,
Expr ::= D E+ ;
represents "DE, DEE, DEEE, ...", and
Expr ::= (D E)+ ;
represents "DE, DEDE, DEDEDE, ...".
You can also separate the symbols as another production rule without using the parentheses.
// Expr ::= (A | B) C ; Expr ::= Term C ; Term ::= A | B ; // Expr ::= (D E)+ ; Expr ::= Term+ ; Term ::= D E ;
Comments and dummy symbols
You can add a comment in the middle of the production rule.
// 1 line comment /* Multi- line comment */
Line comments are from // to the end of the line, block comments are from /* to */. Only one line comment can be specified per line, and it can not span multiple lines. On the other hand, a block comment can be written over multiple lines, or multiple block comments can be written in the middle of a line. However, you can not nest comments.
If a production rule is defined, but the rule name is neither a start symbol nor appears in the definition of other rules, the rule is ignored before syntactic analysis. Since lexical analysis is done, however, the terminal symbols that appear in the rule will be ignored as dummy symbols. For example,
Space ::= "\s+" ;
If the rule name Space does not appear anywhere, one or more white spaces are ignored. In other words, adding this rule allows the grammar to be free format. Comments can be defined in the same way.
HABA definition of HABA
For reference, I describe the specifications of HABA grammar in the description method of HABA itself.
Gram ::= Rule+ ; Rule ::= Name '::=' Expr? ';' ; Name ::= "[a-zA-Z_][a-zA-Z_0-9]*" ; Expr ::= List ('|' List)* ; List ::= Term+ ; Term ::= Fact Rept? ; Fact ::= Fixd | Flex | Name | Quot ; Fixd ::= "'(''|[^'])+'" ; Flex ::= """(""""|[^""])+""" ; Quot ::= '(' Expr ')' ; Rept ::= '?' | '*' | '+' ; Spac ::= "\s+" ; Line ::= "//[^\n]*(\n|$)" ; Bloc ::= "/\*((?!\*/)[\s\S])*\*/" ;
Spac is a white space, Line is a line comment and Bloc is a block comment, all ignored before syntactic analysis. [\s\S] in the Bloc definition represents any single character, including a line break.