diff options
-rw-r--r-- | README | 27 |
1 files changed, 14 insertions, 13 deletions
@@ -35,13 +35,12 @@ converted to HTML (via SXML) or any other format for rendering. * Implementation details - Very simple monadic parser combinators (purposely lacking support - for recursive grammars currently) are used to tokenize the - characters within a string or port and return a list consisting of - two types of values: strings and two element tagged lists. A tagged - list consists of a symbol designating the type of the text (symbol, - keyword, string literal, etc.) and a string of the text fragment - itself. + Very simple monadic parser combinators (supporting only regular + languages) are used to tokenize the characters within a string or + port and return a list consisting of two types of values: strings + and two element tagged lists. A tagged list consists of a symbol + designating the type of the text (symbol, keyword, string literal, + etc.) and a string of the text fragment itself. #+BEGIN_SRC scheme ((open "(") @@ -65,12 +64,14 @@ converted to HTML (via SXML) or any other format for rendering. (close ")")) #+END_SRC - This means that the parsers are *not* intended to produce the - abstract syntax-tree for any given language. They are simply to - attempt to tokenize and tag fragments of the source. A "catch all" - rule in each language's parser is used to deal with text that - doesn't match any recognized syntax and simply produces an untagged - string. + The term "parse" is used loosely here as the general act of reading + text and building a machine readable data structure out of it based + on a set of rules. These parsers perform lexical analysis; they are + not intended to produce the abstract syntax-tree for any given + language. The parsers, or lexers, attempt to tokenize and tag + fragments of the source. A "catch all" rule in each language's + highlighter is used to deal with text that doesn't match any + recognized syntax and simply produces an untagged string. * Requirements |