Implementing Scheme in Python

code python lisp

John Jacobsen
Friday, September 13, 2019

Home Other Posts

Earlier post A Daily Journal in Org Mode writing code emacs

Later post Common Lisp How-Tos code lisp

Background

Lately I’ve been a bit of a Lisp kick again, reading On Lisp and Practical Common Lisp, working problems and re-casting my Clojure-based blog software into Common Lisp to see how that goes. The urge to do these things caught me somewhat by surprise: in the past few years I haven’t done much programming outside of my day job because I’ve been focused more on painting and drawing. It seems, though, that these things are cyclical and right now I’m in the grips of Lisp mania again.

This cycle goes back to the mid-1980s when I was a college student studying physics and learning how to program. It’s hard to say how I got the Lisp bug initially. I remember reading Douglas Hofstadter’s explanation of recursive Lisp functions; hanging out near the AI workstations in the basement of the Madison Academic Computing Center where I worked (the sans-serif Lisp program printouts looked way better than anybody else’s output); and reading about AI and Lisp on Usenet. At any rate, I managed to pick up a copy of PC-Lisp, a port of Franz Lisp to IBM PCs which apparently is still a thing after all these years. Somewhat astonishingly, the interpreter ran on my underpowered IBM PCjr, loading from floppy disk in half a minute or so. I remember how even fairly small programs would pause for several seconds while the garbage collector did its thing. This interpreter, along with a Lisp textbook I picked up somewhere, was enough to get me started learning about atoms, lists, quote, car and cdr, symbolic computing, and simple recursive algorithms.

I also wrote a small s-expression parser in Turbo Pascal on the same PCjr as a way of graphing arbitrary math functions of two variables. I had no compiler class but my roommate taught me enough about finite state machines, lexers, parsers, etc. to be dangerous… enough knowledge to get my grapher up and running. (I wish I still had some of these old programs, but they vanished when I “lent” the computer and all my diskettes to a friend; they are probably landfill somewhere.)

This all subsided for several years while I got sucked into physics, then painting, then capoeira, then physics, then painting, then software engineering in C, Perl, and Python. In the early aughts I got interested in Paul Graham’s writing, which led to a few skirmishes with Common Lisp every couple of years, and ultimately to me getting totally hooked on Clojure starting in 2011.

I intend to write more about what makes Lisps in general compelling for me1 and why I’ve returned to it over and over throughout my career. In the mean time….

Scheming in Python

Happily, my Lisp mania has crested coincidentally with Chicago Python teacher David Beazley announcing another class on The Structure and Interpretation of Computer Programs, a classic MIT computing text based on a dialect of Lisp called Scheme, and I managed to get on the list before it filled up. Since it’s been awhile since I’ve written much Python, I thought I’d warm up for the class by trying to write a small Scheme implementation in Python. My hope was to refresh my memory of the basics of Scheme, improve my slightly-dusty Python, and have fun.

Prior Art

Researching this post (after most of the code was done), I noticed that there are a ton of small Lisps written in Python (including two I worked on the better part of a decade ago). One that stuck out in my memory was Peter Norvig’s hundred-odd-line Lispy, which is particularly striking in its elegance (in style, it reminds me of the Lisp programs in PAIP).

The Stages

My plan for writing the interpreter was to follow the actual REPL stages: Read, Eval, and Print. Read would involve parsing input expressions and turning them into data structures to be evaluated; Eval would be a (presumably recursive) step whereby, for function calls, arguments would be Eval’ed first, with the results handed to the appropriate code implementing the actual function, implemented either in Python (for a few primitive functions) or, eventually, through user-defined functions. An exception to the argument evaluation rule would have to be made for a few special forms, which would have to be handled on a case-by-case basis.

Parsing

My first approach for the parser was to use PLY, which I’ve used before. Having been spoiled by the excellent Instaparse library for Clojure, upon returning to PLY I found the library’s interface relatively clumsy and abstruse.

Then I thought I’d try a regex to parse everything, since Scheme’s base syntax and grammar are so simple. The following regex actually worked well for awhile:

pat = """^\s*\((?P<list>.*)\)$   # Something beginning/ending with parens, or
         |
         (?P<bool>(\#t|\#f))\s*  # A boolean true/false value, or
         |                       # An atom starting w/ a letter or arithmetic op:
         (?P<atom>[a-zA-Z\+\-\*\/]+[a-zA-Z0-9\+\-\*\/]*)\s*
         |
         (?P<num>[0-9]+)\s*"""   # Or, a positive natural number.

(I had not gotten to negative or floating point numbers at this stage of the project.)

Expressions of the form (+ 1 2 (+ 1 1 1)), for example, parsed correctly. However, the regex didn’t handle lists that nested in other than the last position, for example, (+ (* 2 3) 4). It might be possible to do the whole problem with regular expressions, but at this point I decided to look for a nice modern parser library that let me write down the grammer in EBNF notation and simply returned a parse tree when given a valid expression. I found Lark, which is roughly as featureful as Instaparse and worked well. Here’s the Lark grammar as it currently stands:

start  : _exprs
_exprs : _e* _e
_e     : ATOM
       | _num
       | BOOL
       | list
TRUE   : "#t"
FALSE  : "#f"
BOOL   : TRUE | FALSE
list   : "(" _exprs? ")"
INT    : /[-+]?[0-9]+/
ATOM   : /[a-zA-Z]+[a-zA-Z0-9\-\?]*/
       | /[\*\/\=\>\<]/
       | /[\-\+](?![0-9])/
FLOAT  : /[-+]?[0-9]+\.[0-9]*/
_num   : INT | FLOAT
%import common.WS
%ignore WS

Note that while libraries like Lark and Instaparse are a nice way to turn out a powerful parser quickly, Norvig’s lis.py just splits the inputs on open/close parens and uses them to increase/decrease nesting level as they appear. The approach to parsing numbers is also elegant: try to parse it as an integer; if that fails, as a float; otherwise, it’s a symbol (what my grammar calls ATOM). I suspect that using a “real” parser makes it easier to add more syntax to your language if needed, though.

Data Model

Once Lark parses an expression, I convert to my own internal representation before evaluation (function convert_ast). I did not use objects to represent lists, atoms, etc., but just simple tuples and lists:

Data Type Example
Atom ('atom', 'foo')
Boolean ('bool', True)
Int ('int', 123)
Float ('float', 3.14)
List ('list', [('atom', 'define'), ('atom', 'x'), ('int', 3)])

It might be more idiomatic Python to use objects, but I’m more used to thinking in terms of nested structures of base data types (lists, tuples, dictionaries), which have the advantage of being easily printable while debugging.

Here’s an example expression, parsed with Lark and then converted into my internal representation:

Scheme raw source

(define (abs x)
  (if (< x 0)
      (- x)
      x))

Lark output

Tree(start,
     [Tree(list,
           [Token(ATOM, 'define'),
            Tree(list,
                 [Token(ATOM, 'abs'),
                  Token(ATOM, 'x')]),
            Tree(list, [Token(ATOM, 'if'),
                        Tree(list,
                             [Token(ATOM, '<'),
                              Token(ATOM, 'x'),
                              Token(INT, '0')]),
                        Tree(list,
                             [Token(ATOM, '-'),
                              Token(ATOM, 'x')]),
                        Token(ATOM, 'x')])])])

Internal representation used in the interpreter

[('list', [('atom', 'define'),
           ('list', [('atom', 'abs'),
                     ('atom', 'x')]),
           ('list', [('atom', 'if'),
                     ('list', [('atom', '<'),
                               ('atom', 'x'),
                               ('int', 0)]),
                     ('list', [('atom', '-'),
                               ('atom', 'x')]),
                     ('atom', 'x')])])]

Evaluation

Evaluation turned out, not surprisingly, to be where most of the work is; specifically, the individual cases for special forms. These were:

  • quote
  • cond / if (I implemented these separately but, had I implemented macros, I might implement one in terms of the other)
  • define
  • lambda
  • or
  • and

Aside from these, there was code for normal function application rules, with a slight variance between internally-defined functions and functioned defined by the user with define.

I did not, as Paul Graham did, attempt to find the minimal subset of language elements needed to bootstrap a complete Lisp, though that is certainly an interesting exercise. My goal was to get my tests green in a straightforward way, as described below.

Printing

The Print stage needed to convert the internal representation of the evaluated result to a string, both for a user running a REPL, and for my unit tests. This turned out to be simple enough that I can put the entire function here:

def printable_value(ast):
    k, v = ast
    if k == 'int' or k == 'float':
        return str(v)
    if k == 'bool':
        return {True: "#t",
                False: "#f"}.get(v)
    if k == 'intproc':
        return "Internal procedure '%s'" % v
    if k == 'atom':
        return v
    if k == 'list':
        return '(' + ' '.join([printable_value(x)
                               for x in v]) + ')'
    if k == 'nop':
        return ''
    if k == 'fn':
        (fn_name, _, _) = v
        if fn_name == 'lambda':
            return "Anonymous-function"
        return "Function-'%s'" % str(fn_name)
    raise Exception('Unprintable ast "%s"' % str(ast))

The primitive data types (int, float, bool and atom) are obvious enough; intproc and fn are for internally-defined and user-defined functions, respectively; finally, nop is for when you don’t want any value shown to the user (i.e., after one enters a define expression, there is no output printed before the next prompt):

scheme> (define a 3)
scheme>

Testing Approach

I used a fairly strict test-driven development workflow for this project: write just enough new test code to fail; write just enough production code to get the test to pass; then refactor. This workflow generally serves me well for harder problems where I’m unlikely to see the best way to do things from the outset and I expect to make many improvements: I know the tests will have my back when I make changes. This turned out to be the right approach, since I had to pivot several times.

I didn’t just use tests drive the code forward, but let SICP drive the tests forward as well. When I needed the next thing to work on, all I had to do was turn forward a few pages in the book, and the page count gave me a measure of my progress.

A few of the tests, with the corresponding page numbers:

# Adapted from SICP p. 8:
t("(define pi 3.14159)")
t("(define radius 10)")
t("(* pi (* radius radius))", "314.159")
...
# p. 19:
t("(define x 7)")
t("(and (> x 5) (< x 10))", "#t")
...
# p. 37
t("""(define (fib n)
       (cond ((= n 0) 0)
             ((= n 1) 1)
             (else (+ (fib (- n 1))
                      (fib (- n 2))))))""")
t("(fib 5)", "5")

Here t (a very short function name since it’s used so many times) is a testing function which evaluates the code supplied in the first argument, using the current “environment” (scope). If a second argument (a string) is supplied, it also asserts that the code in the first argument evaluates to the second.

The Code Comes To Life

One of my favorite things about this project was when I started writing nontrivial tests which already passed, because enough of the language had been implemented already. This happened more often as the project progressed, and it made me feel like I’d implemented a “real” language, an oddly compelling feeling.

The code, in its entirety, can be found here.

Next Steps

Though I have already gone far past my goals for the project, there are a few directions which I could take this.

Implement more of Scheme

For example, I don’t have strings yet in my implementation!

Extend it Further

Implement Python interop and shell features. This could be genuinely fun and useful, but there’s already a fairly full-featured Lisp skin for Python called Hy.

Go Low Level

Do it again, but in a faster, lower-level language. Python is relatively slow compared to Clojure for long-running programs, and slower than Common Lisp or C for program startup. I’ve been wanting to dust off my C chops, and this would be an excuse to do so, though C is a bit primitive compared to, say, Rust. (Go is not to my taste so I wouldn’t pick it for a fun project like this one.) So, either an excuse to learn Rust or to improve my C chops. Which leads me to the final question: would I rather implement Lisps, or program in them? Up until now it has been the latter, but after the past few weeks I can definitely taste the appeal of the former.

Thoughts Upon Revisiting Python

This was the biggest Python project I’ve done in five years, so it was interesting to check my experience of the language after some time away in Clojure-land. I was a big Python fan for many years but my enthusiasm for the language has definitely cooled. This is partly due to the performance comparison I just mentioned, but the two other things I really missed were REPL integration and structural editing.

In Lisps such as Clojure and Common Lisp I can send expressions to the REPL directly from Emacs and see the results instantly. It’s difficult to describe how helpful this short feedback cycle is until you’ve experienced it. Writing the interpreter, I got sort of close by using conttest plus nosetests: my unit tests ran every time I saved a source file, and if I wanted to inspect some object I could add a print or two temporarily to the code I was working on. But this is definitely more clumsy than a Lisp REPL directly integrated with my editor.

Structural editing (via Paredit) is another thing that is hard to explain the utility of but which is hard to live without once you’re used to it. The basic idea is that you are no longer editing strings of text, but the actual tree of nested lists comprising your code - killing, splitting, joining, absorbing (“slurping”) into and expelling (“barfing”) from those lists. Working with the code at the tree level is a powerful way to edit your code, and watching this video might help may give a hint of it. Once you’re used to the affordances of Paredit or its cousins, it feels clunky and primitive to go back to editing plain old text.

Other value propositions of Clojure, such as macros, immutable data structures and concurrency, weren’t really an issue for this project, though I would certainly miss them for bigger projects.

Conclusion

Implementing your own Lisp is surprisingly do-able (and fun!). Python is certainly a fine tool for the job (though if you want to skip straight to programming in a modern Lisp for your day job, please contact me).

Postscript

While digging through my digital and mental archives for Lisp things I remembered something I hadn’t seen in a long time: the use of closing square brackets to terminate a deeply-nested collection of s-expressions (i.e. ] instead of )))))))). Did I imagine it? While not necessarily a good idea, it’s an interesting one. Then, while reading through Gabriel and Steele’s Evolution of Lisp this morning, I came across the following snippet:

“One of the minor, but interesting, syntactic extensions that Interlisp made was the introduction of the superparenthesis, or superbracket. If a right square bracket ] is encountered during a read operation, it balances all outstanding open left parentheses, or back to the last outstanding left square bracket [.”

DEFINEQ((FACTORIAL
  (LAMBDA (N)
   (COND [(ZEROP N) 1]
         (T (TIMES N (FACTORIAL (SUB1 N]

Bingo! I wasn’t imagining it. This suggests those workstations in the basement of MACC at UW-Madison were probably running Interlisp.

Footnotes:

1
So compelling that I left a rather fruitful science project partly in order to do Lisp (Clojure) programming full time.

Earlier post A Daily Journal in Org Mode writing code emacs

Later post Common Lisp How-Tos code lisp

Blog Posts (170)

Select from below, view all posts, or choose only posts for:art clojure code emacs lisp misc orgmode physics python ruby sketchup southpole writing


Home


From Elegance to Speed code lisp clojure physics

Common Lisp How-Tos code lisp

A Daily Journal in Org Mode writing code emacs

Show at Northwestern University Prosthetics-Orthotics Center art

Color Notations art

Painting and Time art

Learning Muscular Anatomy code clojure art emacs orgmode

Reflections on a Year of Daily Memory Drawings art

Repainting art

Daily Memory Drawings art

Questions to Ask art

Macro-writing Macros code clojure

Time Limits art

Lazy Physics code clojure physics

Fun with Instaparse code clojure

Nucleotide Repetition Lengths code clojure

Updating the Genome Decoder code clojure

Getting Our Hands Dirty (with the Human Genome) code clojure

Validating the Genome Decoder code clojure

A Two Bit Decoder code clojure

Exploratory Genomics with Clojure code clojure

Rosalind Problems in Clojure code clojure

Introduction to Context Managers in Python code python

Processes vs. Threads for Integration Testing code python

Resources for Learning Clojure code clojure

Continuous Testing in Python, Clojure and Blub code clojure python

Programming Languages code clojure python

Milvans and Container Malls southpole

Oxygen southpole

Ghost southpole

Turkey, Stuffing, Eclipse southpole

Wind Storm and Moon Dust southpole

Shower Instructions southpole

Fresh Air and Bananas southpole

Traveller and the Human Chain southpole

Reveille southpole

Drifts southpole

Bon Voyage southpole

A Nicer Guy? southpole

The Quiet Earth southpole

Ten southpole

The Wheel art

Plein Air art

ISO50 southpole art

SketchUp and MakeHuman sketchup art

In Defense of Hobbies misc code art

Closure southpole

Takeoff southpole

Mummification southpole

Eleventh Hour southpole

Diamond southpole

Baby, It's Cold Outside southpole

Fruition southpole

Friendly Radiation southpole

A Place That Wants You Dead southpole

Marathon southpole

Deep Fried Macaroni and Cheese Balls southpole

Retrograde southpole

Three southpole

Transitions southpole

The Future southpole

Sunday southpole

Radio Waves southpole

Settling In southpole

Revolution Number Nine southpole

Red Eye to McMurdo southpole

That's the Way southpole

Faults in Ice and Rock southpole

Bardo southpole

Chasing the Sun southpole

Downhole southpole

Coming Out of Hibernation southpole

Managing the Most Remote Data Center in the World code southpole

Ruby Plugins for Sketchup art sketchup ruby code

The Cruel Stranger misc

Photoshop on a Dime art

Man on Wire misc

Videos southpole

Posing Rigs art

Metric art

Cuba southpole

Wickets southpole

Safe southpole

Broken Glasses southpole

End of the Second Act southpole

Pigs and Fish southpole

Last Arrivals southpole

Lily White southpole

In a Dry and Waterless Place southpole

Immortality southpole

Routine southpole

Tourists southpole

Passing Notes southpole

Translation southpole

RNZAF southpole

The Usual Delays southpole

CHC southpole

Wyeth on Another Planet art

Detox southpole

Packing southpole

Nails southpole

Gearing Up southpole

Gouache, and a new system for conquering the world art

Fall 2008 HPAC Studies art

YABP (Yet Another Blog Platform) southpole

A Bath southpole

Green Marathon southpole

Sprung southpole

Outta Here southpole

Lame Duck DAQer southpole

Eclipse Town southpole

One More Week southpole

IceCube Laboratory Video Tour; Midrats Finale southpole

SPIFF, Party, Shift Change southpole

Good things come in threes, or 18s southpole

Sleepless in the Station southpole

Post Deploy southpole

IceCube and The Beatles southpole

Midrats southpole

Video: Flight to South Pole southpole

Almost There southpole

The Pure Land southpole

There are no mice in the Hotel California Bunkroom southpole

Short Timer southpole

Sleepy in MacTown southpole

Sir Ed southpole

Pynchon, Redux southpole

Superposition of Luggage States southpole

Shortcut to Toast southpole

Flights: Round 1 southpole

Packing for the Pole southpole

Goals for Trip southpole

Balaklavas southpole

Tree and Man (Test Post) southpole

Schedule southpole

How to mail stuff to John at the South Pole southpole

Summer and Winter southpole

Homeward Bound southpole

Redeployment southpole

Short-timer southpole

The Cleanest Air in the World southpole

One more day (?) southpole

One more week (?) southpole

Closing Softly southpole

More Photos southpole

Super Bowl Wednesday southpole

Night Owls southpole

First Week southpole

More Ice Pix southpole

Settling In southpole

NPX southpole

Pole Bound southpole

Bad Dirt southpole

The Last Laugh southpole

First Delay southpole

Nope southpole

Batteries and Sheep southpole

All for McNaught southpole

t=0 southpole

The Big (Really really big...) Picture southpole

Giacometti southpole

Descent southpole

Video Tour southpole

How to subscribe to blog updates southpole

What The Blog is For southpole

Auckland southpole

Halfway Around the World; Dragging the Soul Behind southpole

Launched southpole

Getting Ready (t minus 2 days) southpole

Home

Subscribe: RSS feed ... all topics ... or Clojure only / Lisp only