Sunday, November 21, 2010

Getting down to a new compiler

Thanks with a great job by one of my students (Cody), we have a complete DEFSTRUCT working! Now we can get out of dealing with those pesky lists and starting building structures that are easier to handle. I'll claim a bit of fixing - more like band-aiding - of the old compiler. If I'm careful, it can do most of what I need.

Unlike when we built the first compiler (most in Java - do you know just how many lines of code it took just to get to the old one?), it was intended to get us working. And it did, but with a lot of things not working. For 6 years we've being patching it. Any of you that have been around a bit understand the result. So, now we have enough Lisp to build a real compiler.

This time we're starting with the environment as the first component. Building a lot of DEFSTRUCTs is always a good way to understand your system (everyone remember your old data structures class?). Here's a look at the current data structures. It will likely have some changes as we go, but it gets us off in a good footing. You'll also a parameter-printer macro. It generates the printers for the structs.

I should have a good set of functions that use these structs before the end of the year. After that comes the structural analysis part...

Happy Thanksgiving!

BTW, Blogger will insert gratuitous para breaks between lines. Or it may just be Safari.

-------------

;;; Here is where all of the environment starts. 
;;; There are a set of substructs that
;;; extend the general environment. The basic environment are made up of
;;; LAMBDA, LET, FLET, LABELS and any parameters from 
;;; LAMBDA, LET, FLET, and LABELS
;;; Note: this environment uses the lambda calculus definition of functions: 
;;; one or one parameter
(defstruct (environment (:print-object print-environment))
  ; range is LAMBDA, LET, FLET, LABELS, :PARAMETER, :BLOCK, :TAGBODY, :SYMBOL
  (kind :environment :read-only t) 
  (parent nil :read-only t :type environment)
  (children  nil :type list))
(parameter-printer environment (kind) (children))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;; SYMBOL-BINDING ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; This struct is holds common facets of named environment
(defstruct (named-binding (:include environment 
                            (kind :named-binding)) 
                            (:print-object print-named-binding))
  (name nil :type symbol :read-only t))
(parameter-printer named-binding (kind name) (children))


;;; This struct handles the binding of the value part of the symbol. It works
;;; with parameters (LAMBDA), and LET forms.
(defstruct (symbol-binding (:include named-binding 
                                (kind :symbol-binding)) 
                                (:print-object print-symbol-binding))
  (scope :local) ;; :local, :closure, :special, :reference (accessing a closure)
  (type t)
  (init-form nil)
  (allocation nil))
(parameter-printer symbol-binding (kind name scope type init-form allocation) (children))


;;; This struct handles the binding of the function part of the symbol. It works
;;; with FLET and LABELS forms.
(defstruct (function-binding (:include named-binding ) 
                             (:print-object print-function-binding))
  (ftype :function :read-only t :type function-type)) ; or :macro
(parameter-printer function-binding (kind name ftype) (children))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;; PARAMETER STRUCTURES ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; This section deals with making structures for the various parameter types


;;; This is the core of every parameter type in CL
(defstruct (parameter-env (:include symbol-binding (kind :parameter)) 
                          (:print-object print-parameter-env))
  (usage nil :read-only t) ;; making a parameter must include the usage
  (ignore nil :type boolean)
  (ignorable nil :type boolean))
(parameter-printer parameter-env 
     (kind name scope type init-form allocation usage ignore ignorable) 
     (children))


;;; This is an intermediate structure that extends the init form to 
;;; handle supplied-p
(defstruct (supplied-p-parameter (:include parameter-env (kind :supplied-p))
                                 (:print-object print-supplied-p-parameter))
  (value nil :read-only t :type boolean))
(parameter-printer supplied-p-parameter 
   (kind name scope type init-form allocation usage ignore ignorable value) 
   (children))


;;; REQUIRED - uses only the basic PARAMETER struct
(defstruct (required-parameter (:include parameter-env (kind :required)) 
                               (:print-object print-required-parameter)))
(parameter-printer required-parameter 
   (kind name scope type init-form allocation usage ignore ignorable) (children))


;;; OPTIONAL - adds the init-form and supplied-p slots
(defstruct (optional-parameter (:include supplied-p-parameter (kind :optional))))
(parameter-printer optional-parameter 
  (kind name scope type init-form allocation usage ignore ignorable) (children))


;;; REST or fake rest parameter made for handling &keys
(defstruct (rest-parameter (:include parameter-env (kind :rest))))
(parameter-printer rest-parameter 
  (kind name scope type init-form allocation usage ignore ignorable) (children))


;;; KEY - adds the init-form, supplied-p and keyword-name
(defstruct (key-parameter (:include supplied-p-parameter (kind :key)))
  (keyword-name nil :read-only t :type symbol))
(parameter-printer key-parameter 
  (kind name scope type init-form allocation usage ignore ignorable keyword-name)
  (children))


;;; ALLOW-OTHER-KEYS - this is a cypher found in the lambda list
(defstruct (allow-other-keys-parameter) )


;;; AUX - uses the init-form parameter
(defstruct (aux-parameter (:include parameter-env (kind :aux)) ))
(parameter-printer aux-parameter 
  (kind name scope type init-form allocation usage ignore ignorable) (children))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;; STRUCTURES THAT MAKE UP THE ENVIRONMENT ;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


(defstruct (let-env (:include environment (kind :let)))
  ;; the code that makes this has to set the environment type to :LET
  (symbol-bindings nil :type list)) ; this is a list of symbol-binding
(parameter-printer let-env (kind name) (symbol-bindings children))


(defstruct (lambda-env (:include named-binding (kind :lambda))
                       (:print-object print-lambda-env))
  ;; the code that makes this has to set the environment type to :LAMBDA
  (parameter-bindings nil :type list))
(parameter-printer lambda-env (kind name) (parameter-bindings children))


(defstruct (macro-env (:include lambda-env (kind :macro))))
  ;; the code that makes this has to set the environment type to :MACRO


(defstruct (labels-env (:include lambda-env (kind :labels)))
  ;; the code that makes this has to set the environment type to :LABELS
  (fn-bindings nil :type list))
(parameter-printer lambda-env (kind name) (fn-bindings children))


(defstruct (flet-env (:include lambda-env (kind :flet)))
  ;; the code that makes this has to set the environment type to :FLET
  (fn-bindings nil :type list))
(parameter-printer lambda-env (kind name) (fn-bindings children))


(defstruct (block-env (:include named-binding (kind :block)) 
                      (:print-object print-block-env)) )
(parameter-printer block-env (kind name) (children))


(defstruct (tagbody-env (:include environment (kind :tagbody)) 
                        (:print-object print-tagbody-env))
  (tags nil :type list))
(parameter-printer tagbody-env (kind) (tags children))


(defstruct (tagbody-across-lambda (:include tagbody-env 
                                     (kind :tagbody-across-lambda))
                                  (:print-object print-tagbody-across-lambda))
  (boundary nil :type environment :read-only nil))
(parameter-printer tagbody-across-lambda (kind name boundary) (tags children))


#|
(defstruct (macrolet-env (:include lambda-env))
  ;; the code that makes this has to set the environment type to :LAMBDA
  TBD
)
|#


(defstruct (expression-info (:include environment (kind :expression))
                            (:print-object print-lambda-env)) 
  (type nil :type t))  ;; the declared type by the THE
(parameter-printer expression-info (kind type) (children))




-------------
;;; Here's a first stab about a macro for building the environment

(defmacro with-new-environment (kind (&rest args) &body body)
  (let ((the-function
           (case kind
             (:LAMBDA       'make-lambda-env) ; this has a chain of parameters
             (:LET          'make-let-env)    ; this has a list of variables at 
                                              ; the same level
             (:FLET         'make-flet-env)
             (:LABELS       'make-labels-env)
             (:REQUIRED     'make-required-env)
             (:OPTIONAL     'make-optional-env)
             (:REST         'make-rest-env)
             (:KEY          'make-key-env)
             (:AUX          'make-aux-env)
             (:BLOCK        'make-block-env)
             (:TAGBODY      'make-tagbody-env))))
             (:MACRO        'make-macro-env)
             (:MACROLET     'make-macrolet-env)
             (:SYMBOL-MACRO 'make-symbol-macro-env)
             (:LOCALLY      'make-locally-env)
             (:THE          'make-the-env)


  `(let ((*current-environment* 
            (,the-function :kind ,kind :parent *current-environment* ,@args)))
     (add-child-env-to-parent 
         *current-environment* (environment-parent *current-environment*))
     ,@body)))

-------------

Thursday, November 4, 2010

Taking stock

I went through the project to find the features and other code to get to Version 1. You can check with our Wiki at the  List of Items To Complete. We have a ways aways to get to go... Need more students... Any who wants to help out, let me know.

LABELS work!

Whew! This is making our old (Java) compiler more inscrutable, but it works. I can make FLET simply by making a minor tweak in the LABELS code. A bonus for doing all that work was that several other bugs disappeared. With the new DEFSTRUCT working well, I can get on with making the new compiler. Being able to make a Lisp compiler in Lisp is a small (or bigger?) milestone.

Sunday, October 24, 2010

LABELS do flow...

On the flight(s) from the ILC meeting (more on that later...), I made a big start on re-writing the code for LABELS (FLET is similar). The original code was just flat wrong (sigh). It would work as long that there were no lexical external bindings. I've redesigned that part of the environment system to properly connect LABELS-defined functions to the rest of the environment. This was a bit tricky due to the rather inelegant "design" of the current compiler. A more elegant environment is in the works, but... I have to put another patch over this gaping hole.

A "smart" environment would have parallel functions for creating lexical bindings (and mechanisms) for values and functions. But the current environment is incapable of working that way. So, here's the kludge (well, it only becomes a kludge if it works, otherwise it's a crock). Example:

(let ((x 42))
  ...
  (labels ((foo (a b) (foo (/ (+ a b) x) nil))
           (bar (x) (* (foo x 13))))
     (foo 1 2))
  ...)

I can't just turn it into a named function by changing the symbol-function slot of foo. Even if I had a mechanism to replace the original function foo, that would be a dynamic binding of foo's function slot. The CL spec requires that the binding is lexical. Making a "real" binding would be very difficult at this point.  But I can use a trick from lambda calculus: renaming.  I change the  form into:


(let ((x 42))
  ...
  (labels ((labels_foo_2883486 (a b) 
             (labels_foo_2883486 (/ (+ a b) x) nil))
           (labels_bar_47927743 (x) (labels_foo_2883486 x 13)))
     (labels_foo_2883486 1 2))
  ...)


The compiler runs through all of the uses of foo and substitutes them with the symbol labels_foo_2883486 d. This is risky because I might miss one. But it's a risk I'll take for now. Note that both foo and bar were renamed, as were all other their uses.

Having renamed all the uses of foo and bar, the compiler is free to treat the renamed foo and bar as top-level functions. Note in both cases, the variable x is lexically visible and correctly used.

Here's what the renamed version of foo and bar if they were built in the scope of an FLET.


(let ((x 42))
  ...
  (flet ((flet_foo_2883486 (a b) (foo (/ (+ a b) x) nil))
         (flets_bar_47927743 (x) (foo x 13)))
     (flet_foo_2883486 1 2))
  ..


Note that the scoping in FLET differs from LABELS.

This kludge works at least in these examples. Nested FLET and LABELS would be problematic.

Saturday, October 16, 2010

Other distractions...

On the good side, one of my students has finished the work on DEFSTRUCT. That's a very good milestone, since it's one of the hybrid features (built in a mix of Java and Lisp) and supports the features in both Java and Lisp. Excellent work there.

I've been fixing several problems with the current compiler. All of our test for lambda lists work correctly. On the other hand, the (function (function ...) ...) problems seem to boil down to an erratic binding problem in FLET and LABELS.  My goal was to have all of this fixed by the time of the conference. But alas, mid-term exams and grading have pushed Lisp into the back seat for now. Well, I have a couple of long flights...

Thursday, October 14, 2010

Oh for the old days...

My work on the (funcall (funcall... )..) was derailed by a problem in &key and &optional lambda lists. Not for all of them. Only the first argument of an &key or &optional (with no required) would drop out. After a couple of days of intense (and futile) debugging, one of my students found the problem while looking into another problem. The LENGTH function had a bug. Quick fix and we're back on track. Some bugs are best worked from another vantage.

Saturday, October 9, 2010

Oh 2 Big Bugs

Found 2 big (well, one bigger than the other...) bugs that will keep us from switching to the new compiler.

1. This one is a bit stupid.  RETURN-FROM doesn't work correctly with LABELS and FLET. For other reasons, the current compiler renames functions defined in LABELS and FLET. But I didn't let RETURN-FROM in on the change. Not hard to fix; just a bit of digging into this old code (holding nose... it's in the Java part...).

2. The other one is a bit harder. This involves nested FUNCALLs. Forms like:
  (funcall some (funcall another arg...) ..more args)
  (funcall (funcall a-fn ..args..) ..other-args...)
The symptoms point to the transformations of args to work in the correct function application, e.g. dealing with &optional or &key parameters. Getting this right will help the design/implementation of the new compiler.

More as I bore into the problems...

Thursday, October 7, 2010

Now we have a DEFSTRUCT

Cody, with his usual speed and critical eye for details, has finished the work on DEFSTRUCT. The major addition is the :include option. While we're stomping bugs that crept in (not many), this seems to be solid. As usual in CLforJava, this is a hybrid of Lisp and Java coding. For example, the instances themselves are instances of Java classes, where each slot is a Java field. There are Java methods to access and alter instance values via Java (as well, of course, via Lisp). General information about a struct class, for example the Lisp names of struct slots,  is stored in Java Annotations.

Structs with included components are consist across redefinitions. For example, a struct BAR that includes FOO such that the type is BAR and FOO. If FOO is redefined, a BAR instance will still be a FOO instance. New instances of BAR will use the redefined FOO struct. Both the new and old BAR instances are FOOs. All access functions will still work for both instances. It is up the user to take care of the use of accessors under redefinition. A FOO-A accessor that worked on the old FOO may not work with an instance of a redefined BAR that uses the changed FOO.

Tuesday, October 5, 2010

One Last Patch...

With regards to Compiler Birdie...

Ok, I swore that I'd leave the current compiler be (hoping it be trampled by the new Lisp-based compiler). But in looking (I shouldn't have) in how I handle bindings for special variables in LET, I found a way to stuff in lexical bindings for special parameters. It's not the right way, but it's good enough to let me compile the LOOP macro.

Here's I'm going to do. Let's say that I have a form (lambda (x) (declare (special x)) ..do stuff..). A function application of this lambda would look like:

(let ((#:g42 (some form))) 
  (%function-application the-lambda-reference #:g42)))

At this point, the compile ignores the 'special' declaration - because I didn't bother in the early days. The current compiler would turn this into a Java method call to the function instance with one argument. I propose to add one more transform in the current compiler. When it notices the 'special' declaration, it will the form into this:

(lambda (x) (let ((x x)) (declare (special x)) ..do stuff..)).

The current compiler does know how to deal with a LET binding for special variables. Easy. What's the catch? Not so easy. The elements in the parameter are not being evaluation at the wrong time. If the lambda list looked like:

(lambda (&optional (x 42) (y (- (incf x 42)))) 
  (declare (special x y)) 
  (+ x y))

The function application form would be:

(let ((#:g42 42) (#:g43 (incf x 42))) 
  (%function-application the-lambda-reference #:g42 #:g43)))

The new transform would make the lambda:


(lambda (&optional (x 42) (y (- (incf x 42)))) 
  (let ((x x) (y y)) 
    (declare (special x y)) 
    (+ x y)))


In this instance, x would not be bound until the LET form in the body. So, a quick hack. But not a long-term fix.

Sunday, October 3, 2010

What's left?

As of now, we have a working Lisp system, although it's missing some big pieces: CLOS, Conditions, and a full Type system (it doesn't handle subtypes). The new compiler will fix problems with we have with compiling Loop and the Pretty Printer. It will also cut down the amount of JVM code by at least a factor of 2 and make it run much faster than that. The new compiler is built around Structure Analysis algorithms. This enables of the good things a compiler should have: tail-calls, better closure handling, dead code elimination, loop optimization, etc.

CLOS is the key to access to Java libraries. The package system is extended to handle Java packages. Later the FIND-CLASS function will locate Java classes directly as will FIND-SYMBOL and Reader will locate Java fields and methods.

Other pieces are to bring CLforJava up to the current version of Unicode (it is at Unicode 4.0 now), better documentation, and a help system based on Java Help.

And lots and lots and lots of tests!

What's going now

This semester I have 3 students working on some intricate components.

  • A full implementation of DEFSTRUCT. This is a hybrid component of Lisp code and some interesting JVM code. The JVM code also uses Annotations in a rather novel way to link included structs.
  • A TRACE function using the Java Proxy system.
  • A useful GUI. This brings the issue of threading since the GUI is built in Swing. As we like to say: non-trivial.
For my part, I fixed a rather intricate set of bugs in our current compiler. Took from July to make it work, but it's amazing that worked that well. This was our original bootstrap compiler, never intended for heavy work. But we've patched, and patched, and patched. Now we're a the point where we can make a new, modern compiler in Lisp (work in process).

Welcome to CLforJava

Finally, we have a blog set up. It's only taken us 8 years, but some things take some amount of aging.

For those of you who don't know, CLforJava is a project to create a new implementation of Common Lisp from the ground up and using CS undergraduates at the College of Charleston. CLforJava differs from other CL implementations that run on the Java Virtual Machine in that it is intertwined with the Java language. It's major goals are that a CL user can access any Java library via CLOS with no Foreign Function Interface needed and a Java programmer can access CL as a Java library. Since it is being built in a CS department, it has the side-effect of teaching students how to work in a development environment that mimics that of modern software engineering processes.

CLforJava is an Open Source project and under the MIT license. For more information, go to http://clforjava.org/. There you can find more information about the project in the Wiki. There is also a discussion forum you are welcome to join in as well as read several reviewed papers.