Sunday, April 27, 2008

Getting structure from behaviour

Suppose you wrote the following (contrived) code snippet and tried to evaluate it.

(let ((point (make-instance 'point :xval 1 :yval 2)))
  (print (xval point))
  (print (yval point)))

Without any definition for the class point or function values associated with xval or yval, it will fail. But can you learn something useful from trying to run it? (Maybe even get it to print 1 2?)

Looking at the code, it's reasonable (but not necessarily correct!) to think that is says something like, "I want to make an instance of a class called point. Assume it has two slots called xval and yval because I tend to name the slots in accordance with the initialization arguments. And since I tend to associate accessor functions with symbols bearing the same name as the slots, the calls to xval and yval can be considered accessors."

There are an awful lot of assumptions being made here based on conventions, but in my experience, it's not really that far from the intent. So, what if the system could take advantage of these kinds of conventions to help build structural definitions based on the use of the code? That is something I find really interesting and is what I've been working on for the last little while (in addition to that whole teaching thing).

The goal I set out for myself was to build class definitions dynamically based on the use of the code. Essentially, I want to watch the execution of the code to make some reasonable guesses as to what the classes look like without having to define them ahead of time. It's a bit like programming by example, except I'm not interested in generalizing behaviour; instead, I want to derive structure from behaviour.

This presents a bit of a chicken-and-egg problem: How can you run code that relies on structure without having that structure? One answer is to make the structure as you go along. This means harvesting information from errors while taking advantage of convention and adjusting the definition of correctness.

The last point may be cringe-inducing, but is pragmatically necessary. Getting such code to be correct means defining what "correct" even means in this context. Usually, correct is taken to mean "produces the desired output". Here, the desired output is not what the program would produce given a suitable structure, but a suitable structure itself that may lead to the desired output.

Thus, we trust that what is being said as what is intended. Nothing is to be verified because we have nothing to verify against. All that can be done is to present a structure that in turn is verified by some external agent, such as the developer.

This is really a search through a very large space given some information to go on. The output produced can be put into the program text to help refine the program, but the idea of deriving structure in this manner should not be seen as a kind of operational semantics in which production code can be run on. There are just too many unknowns.

A reasonable question at this point is, "Why bother?" First and foremost, I think it's an interesting challenge and I'm curious to see how far it can be taken. More practically, if you could get reasonable structural definitions from somewhat complicated behavioural descriptions, it might be interesting to incorporate it with testing in some fashion, such as writing tests to produce programs (or significant parts of programs) that pass those tests.

This stems from my annoyance with having to write and amend structural definitions all the time during program development. (This is basically why I stopped using OCaml and its ilk in my day-to-day work.) When writing programs I tend to specify the behaviour first, then fill in the structure because the act of figuring out what I want the program to do helps me figure out what the data should look like.

Over the next few days, I'll be posting more about this, including how I've gotten it to work in a somewhat limited (although mildly interesting) context. It's just too much to put in one post. :)

8 comments:

Anonymous said...

DWIM redux!

Anonymous said...

Looks cool! I can imagine Emacs looking at my current top-level form, and prepending to it the definitions that it guessed.

malkia said...

Aren't all prototype-based OOP languages work in such way (kind of) - for example Lua or Javascript?

Pretty cool though

Anonymous said...

Looks promising, thanks.

Anonymous said...

Makes sense to me. It's similar to treating a variable as an integer because you just assigned an integer to it.

D Herring said...

So are you developing generic "duck-typed" classes that accumulate slots as they are accessed, and then query them at the end of a run for what was used? Or do you have other tricks up your sleeve...

Curious to learn more.

jtra said...

Hi. I have used something like that in my diploma thesis back in 2003/4. It was done in Common Lisp and part of it was recursive (Packrat-like) parser. The parsing grammar and semantic actions were specified in DSL which looked like this:


(matrixaccess (repeat1+ symlbrack expression symrbrack => $2))
(matrix symlbrace (optional (delimited (expression) (symcomma))) symrbrace
<new> matrix :value (when $2 (first $2)))
(number (optional symminus) (choice (doubleconst) (intconst))
<new> number :value (if $1 (- $2) $2))


The <new> meaning was: create CLOS class with name like RULECLASS/MATRIX or RULECLASS/NUMBER with slots like VALUE. And when parsing was taking place, the semantic meaning of <new> was to make instance of such class and fill slots with result of expressions. Dollar sign and number was reference to results of parse subtrees.

Anonymous said...

Interesting, typically when I code, I work in exactly the opposite way that you describe. I tend to specify structure first, skeleton classes and such, followed by some basic behavior. For many problems, the specification of types and their relationships helps me grasp a problem domain better. Maybe it has something to do with being left handed :P

Nevertheless, this is interesting stuff. Good luck.