summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README27
1 files changed, 14 insertions, 13 deletions
diff --git a/README b/README
index dfed354..28df55d 100644
--- a/README
+++ b/README
@@ -35,13 +35,12 @@ converted to HTML (via SXML) or any other format for rendering.
* Implementation details
- Very simple monadic parser combinators (purposely lacking support
- for recursive grammars currently) are used to tokenize the
- characters within a string or port and return a list consisting of
- two types of values: strings and two element tagged lists. A tagged
- list consists of a symbol designating the type of the text (symbol,
- keyword, string literal, etc.) and a string of the text fragment
- itself.
+ Very simple monadic parser combinators (supporting only regular
+ languages) are used to tokenize the characters within a string or
+ port and return a list consisting of two types of values: strings
+ and two element tagged lists. A tagged list consists of a symbol
+ designating the type of the text (symbol, keyword, string literal,
+ etc.) and a string of the text fragment itself.
#+BEGIN_SRC scheme
((open "(")
@@ -65,12 +64,14 @@ converted to HTML (via SXML) or any other format for rendering.
(close ")"))
#+END_SRC
- This means that the parsers are *not* intended to produce the
- abstract syntax-tree for any given language. They are simply to
- attempt to tokenize and tag fragments of the source. A "catch all"
- rule in each language's parser is used to deal with text that
- doesn't match any recognized syntax and simply produces an untagged
- string.
+ The term "parse" is used loosely here as the general act of reading
+ text and building a machine readable data structure out of it based
+ on a set of rules. These parsers perform lexical analysis; they are
+ not intended to produce the abstract syntax-tree for any given
+ language. The parsers, or lexers, attempt to tokenize and tag
+ fragments of the source. A "catch all" rule in each language's
+ highlighter is used to deal with text that doesn't match any
+ recognized syntax and simply produces an untagged string.
* Requirements