Both the regular as well as the context free grammar of the language is combined into one source. Keyword tokens need not be defined separately within the lexical grammar, but are instead extracted from the context free part of the definition. All grammatical information is contained in one file with the extension ".sty".

To give a small example how this looks like, see the calculator language below.

; [calc.sty] Grammar "Calculator" Language calc Regular Grammar ign Ign = ' \n\r' ; "white" characters tok Tok = '()+-*/' ; one character tokens tok Int = ('0'..'9')+ ; Integer tok Wrd = "end" Context Free Grammar start Cmd :exp: Exp :end: "end" let Exp :ign0: Exp1 :add : Exp "+" Exp1 :sub : Exp "-" Exp1 let Exp1 :ign0: Exp2 :mlt : Exp1 "*" Exp2 :div : Exp1 "/" Exp2 let Exp2 :neg : "-" Exp2 :ign0: "(" Exp ")" :int : Int

Notes on the source above.

- Overall, the source consists of three parts. The first, naming the language, the second, providing the regular sets and the third, defining the context free grammar.
- In the regular grammar, the single quoted strings denote sets of characters, while the double quoted strings denote strings. It specifies the terminals of the language.
- The context free grammar consists of a list of definitions of
non-terminals each followed by their productions. The productions are named
by the word between the two colons. Double quoted strings within the
production rules denote terminals (
*keywords*), while names are used for both terminals and non-terminals. - Although the language defined by the
*start*productions is either an expression or the word "end", this example language was designed for a typical calculator tool, so one could enter an arbitrary list of expressions, and terminate the session by "end". This is not mentioned within the grammar. Instead, the Styx parser can be instructed to separate prefixes from a source stream, so one can read in one expression after the other. Of course, a start production normally will describe a whole file.

Applying Styx, we derive the follow depth grammar (transformed to abstract
types) from *calc.sty*

; [calc.abs] Types of 'calc' Terms LANGUAGE calc TOKENS Int TYPES calc = Start_Cmd(Cmd) Cmd = end; exp(Exp) Exp = mlt(Exp, Exp); int(Int); neg(Exp); sub(Exp, Exp); div(Exp, Exp); add(Exp, Exp)

Some notes apply to this:

- First of all, note that a transition from grammatical to algebraic notions have been done. While we talk in the .sty file about regular and context free productions, we have the signature of typed term algebras in the .abs file.
- Beside "Int", all regular grammar productions have been removed automatically. This is both possible and necessary, since they only contribute to the surface grammar.
- The surface grammar, which knows about three different non-terminals for expressions, necessary to express the binding strength of the operations, has been mapped onto one type (Exp). This congruence was hinted by the use of the ":ign0:" productions.

Even without writing down a single line of C code, one can already test the language. With the following test string given in a file, we can test both the scanner and the parser separately, yielding the following results.

1+2*(3-4)/5

calc-example:000001:001 Int : 1 calc-example:000001:002 Tok : + calc-example:000001:003 Int : 2 calc-example:000001:004 Tok : * calc-example:000001:005 Tok : ( calc-example:000001:006 Int : 3 calc-example:000001:007 Tok : - calc-example:000001:008 Int : 4 calc-example:000001:009 Tok : ) calc-example:000001:010 Tok : / calc-example:000001:011 Int : 5

Derivation Tree from Source : calc-example [calc.Start_Cmd (1,1) [Cmd.exp (1,1) [Exp.add (1,1) [Exp.ign0 (1,1) [Exp1.ign0 (1,1) [Exp2.int (1,1) [Int (1,1) "1"]]]] [Keyword (1,2) "+"] [Exp1.div (1,3) [Exp1.mlt (1,3) [Exp1.ign0 (1,3) [Exp2.int (1,3) [Int (1,3) "2"]]] [Keyword (1,4) "*"] [Exp2.ign0 (1,5) [Keyword (1,5) "("] [Exp.sub (1,6) [Exp.ign0 (1,6) [Exp1.ign0 (1,6) [Exp2.int (1,6) [Int (1,6) "3"]]]] [Keyword (1,7) "-"] [Exp1.ign0 (1,8) [Exp2.int (1,8) [Int (1,8) "4"]]]] [Keyword (1,9) ")"]]] [Keyword (1,10) "/"] [Exp2.int (1,11) [Int (1,11) "5"]]]]]]

As one can see from the parser test result, full source information is maintained. Not only that the keywords are preserved, but also the starting positions of both the productions and the terminals are kept in the derivation tree for later reference to the source, may be for diagnostics, may be for other purposes.

Note that the source tree shown by the parser test is the internal representation, which is bound to the concrete surface grammar as specified in the calc.sty file. One may have access to this representation, of course, but usually, the compiler writer will give preference to the abstract (depth) grammar as given by the (generated) calc.abs above.

Styx provides a proper C language interface for this abstract grammar, by means of a mapping convention. As soon one knows the mapping by heart, the C interface (header) file is of few use. One will typically prefer working with the .abs file for reference purposes. This becomes clear when having a look at the C interface file below, which is much longer then the (content-identical) .abs file:

/* ------------------------------------------------------------------------ */ /* */ /* [calc_int.h] Language Interface */ /* */ /* ------------------------------------------------------------------------ */ /* File generated by 'ctoh'. Don't change manually. */ #ifndef calc_int_INCL #define calc_int_INCL #include "ptm.h" #include "gls.h" #ifdef __cplusplus extern "C" { #endif /* --------------------- symbol objects - init & quit --------------------- */ void calc_initSymbols(); /* */ void calc_quitSymbols(); /* */ /* -------------------------- Types & Constants --------------------------- */ AbstractType( calc ); AbstractType( calcCmd ); AbstractType( calcExp ); /* --------------------------- Access to Tokens --------------------------- */ c_bool Tcalc_Int(GLS_Tok x); /* */ /* --------------------------- Access to Terms ---------------------------- */ c_bool calc_calc(PT_Term x, calc* x1); /* */ c_bool calc_Cmd(PT_Term x, calcCmd* x1); /* */ c_bool calc_Exp(PT_Term x, calcExp* x1); /* */ /* --------------------------------- calc --------------------------------- */ c_bool calc_Start_Cmd(calc x, calcCmd* x1) #define calc_Start_0 calc_Start_Cmd ; /* --------------------------------- Cmd ---------------------------------- */ c_bool calcCmd_end(calcCmd x); /* */ c_bool calcCmd_exp(calcCmd x, calcExp* x1); /* */ /* --------------------------------- Exp ---------------------------------- */ c_bool calcExp_mlt(calcExp x, calcExp* x1, calcExp* x2); /* */ c_bool calcExp_int(calcExp x, GLS_Tok* x1); /* */ c_bool calcExp_neg(calcExp x, calcExp* x1); /* */ c_bool calcExp_sub(calcExp x, calcExp* x1, calcExp* x2); /* */ c_bool calcExp_div(calcExp x, calcExp* x1, calcExp* x2); /* */ c_bool calcExp_add(calcExp x, calcExp* x1, calcExp* x2); /* */ #ifdef __cplusplus } #endif #endif

The interface will not be explained in full length in this walk-through, instead only two relevant sections will be highlighted:

- The "AbstractType"s introduce the types of parse, mainly Cmd and
Exp.
`AbstractType`

simply expands to`void*`

, and is introduced to hide the implementation. - The most interesting part is the section in the end with the "Exp" header above it. This section gives access to the expressions in the derivation tree.

To each variant within the discriminated union of the Exp terms, one destructor (in notions of algebra, not of C++) is provided. The naming convention of them is "LanguageType_Variant". Their first argument is the term to rip apart. The remaining arguments are variables for the parts of the decomposition. The result of the functions is a boolean value that becomes true if the destructor applies to the considered term, i.e. if we have used it on the right variant.

With this preparation we can look at the source of the evaluator. This is pretty straight forward. Speaking in notions of abstract algebra, the following C function is the canonical evaluation homomorphism on Exps. It maps the type of Exps to C integers by assigning a corresponding C function of the C integer algebra to every function of the term algebra of Exps. Additionally, since an integer literal is also provided in the language, the integer denotation is mapped onto it's meaning.

int evalExp(calcExp ex) { calcExp x1; calcExp x2; GLS_Tok x3; if (calcExp_mlt(ex, &x1, &x2)) return evalExp(x1) * evalExp(x2); if (calcExp_div(ex, &x1, &x2)) return evalExp(x1) / evalExp(x2); if (calcExp_add(ex, &x1, &x2)) return evalExp(x1) + evalExp(x2); if (calcExp_sub(ex, &x1, &x2)) return evalExp(x1) - evalExp(x2); if (calcExp_neg(ex, &x2)) return - evalExp(x2); if (calcExp_int(ex, &x3)) return atoi(GLS_Tok_string(x3)); BUG; }

Together with a few lines to initiate and apply the Styx parser, the above function forms the desired calculator.

Please note that the full advantage of Styx over yacc is not expressed by this walk-trough. One can do this trivial example likely efficient with yacc's semantic actions. Note especially that the above evaluator is in contrary to yacc not hooked into the parser, but instead applied on the result of the parsing process. When the term was evaluated by evalExp the parser has already been gone, having done it's purpose by leaving a term derived from the source.

The advantage of Styx's design immediately becomes apparent as soon as one
adds functions to the example grammar and defines an interpreter on top of it.
Using Styx one would then find everything prepared as needed, while lex/yacc
would require to do all the things that Styx offers implicitly.
*Perhaps we will extend the example appropriately in the next version of the
document. Meanwhile we leave it as an exercise to the reader.
(Hint: Use the symbols and finite maps described below to move along easily.)*

Next Previous Contents