1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
|
-*- mode: org -*-
Guile-syntax-highlight is a general-purpose syntax highlighting
library for GNU Guile. It can parse code written in various
programming languages into a simple s-expression that can be easily
converted to HTML (via SXML) or any other format for rendering.
* Supported Languages
- Scheme
- XML
* Example
#+BEGIN_SRC scheme
(use-modules (syntax-highlight)
(syntax-highlight scheme)
(sxml simple))
(define code
"(define (square x) \"Return the square of X.\" (* x x))")
;; Get raw highlights list.
(define highlighted-code
(highlight scheme-highlighter code))
;; Convert to SXML.
(define highlighted-sxml
(highlights->sxml highlighted-code))
;; Write HTML to stdout.
(display (sxml->xml highlighted-sxml))
(newline)
#+END_SRC
* Implementation details
Very simple monadic parser combinators (supporting only regular
languages) are used to tokenize the characters within a string or
port and return a list consisting of two types of values: strings
and two element tagged lists. A tagged list consists of a symbol
designating the type of the text (symbol, keyword, string literal,
etc.) and a string of the text fragment itself.
#+BEGIN_SRC scheme
((open "(")
(special "define")
" "
(open "(")
(symbol "square")
" "
(symbol "x")
(close ")")
" "
(string "\"Return the square of X.\"")
" "
(open "(")
(symbol "*")
" "
(symbol "x")
" "
(symbol "x")
(close ")")
(close ")"))
#+END_SRC
The term "parse" is used loosely here as the general act of reading
text and building a machine readable data structure out of it based
on a set of rules. These parsers perform lexical analysis; they are
not intended to produce the abstract syntax-tree for any given
language. The parsers, or lexers, attempt to tokenize and tag
fragments of the source. A "catch all" rule in each language's
highlighter is used to deal with text that doesn't match any
recognized syntax and simply produces an untagged string.
Most syntax highlighters use lots of regular expressions to do their
magic, but guile-syntax-highlight uses a purely functional, monadic
parser combinator interface instead. This makes it easy for
developers to build complex parsers by creating compositions of many
simpler ones. Additionally, rather than working with raw strings or
Guile's file ports, the input code is represented as a lazy stream
of characters using the SRFI-41 streams library. By using streams,
parsers do not have to worry about things like reverting the
character index from which a file is being read upon a failure, or
any other state management. Parsers simply return the thing they
parsed, if any, and the remaining stream to be consumed by another
parser.
* Requirements
- GNU Guile >= 2.0.9
* Building
Guile-syntax-highlight uses the familiar GNU build system and
requires GNU Make to build.
** From tarball
After extracting the tarball, run:
#+BEGIN_SRC sh
./configure
make
make install
#+END_SRC
** From Git
In addition to GNU Make, building from Git requires GNU Automake
and Autoconf.
#+BEGIN_SRC sh
git clone git@dthompson.us:guile-syntax-highlight.git
cd guile-syntax-highlight
./bootstrap
./configure
make
make install
#+END_SRC
* License
LGPLv3 or later. See =COPYING= for the full license text.
|