Investigating slot access
My curiosity has a way of making me check some seemingly innocuous things. I've been thinking about implementations of maps and associative arrays. Besides hash tables and association lists, structures and classes can act as maps. This got me wondering about the speed of slot access, where a "slot" is defined as a mapping of one element to another. In a hash table or association list, this is the association of an element with a key. In structures and instances (where "instance" means a direct instance of a class defined with defclass), it's the usual definition of labelling a particular element that will be accessed frequently.
I wrote a test to see the difference between these different map implementations across multiple Lisp implementations. The code I used to test access times is given below. The access test consists of setting the slot to a random number, then reading the slot and detecting some trivial condition. I hypothesized that slot access in structure instances would be slightly faster than other instances due to the lack of generic function overhead. Indirect access to class instances using slot-value would probably be similar to hash tables, which would be considerably slower than direct access.
(defstruct struct-example f0) (defclass class-example () ((f0 :accessor class-example-f0))) (defmacro run-access-test (creator-form accessor-form times) (let* ((example (gensym "EXAMPLE-")) (aform (substitute example '* accessor-form))) `(let ((,example ,creator-form)) (format t "~%Testing ~S with ~S~%" (quote ,creator-form) (quote ,accessor-form)) (time (loop repeat ,times do (setf ,aform (random 100)) counting (= 0 (mod ,aform 13))))))) (defun run-tests (times) (run-access-test (make-struct-example) (struct-example-f0 *) times) (run-access-test (make-instance 'class-example) (class-example-f0 *) times) (let ((slot 'f0)) (run-access-test (make-instance 'class-example) (slot-value * slot) times)) (run-access-test (make-hash-table :test 'eq) (gethash 'f0 *) times))
I compiled the code with default compiler settings and executed (run-tests (expt 10 7)), recording the times, not including any system time or garbage collection. The results are given in the graph and table below. To my surprise, things didn't work in quite the fashion I was expecting. The difference between direct and indirect access was narrower than I thought and there was one case where my hypothesis did not hold up.
| Access type | SBCL | Allegro | LispWorks | CLISP |
|---|---|---|---|---|
| Hash tables | 3.131 | 2.560 | 2.884 | 7.880 |
| Indirect (using slot-value) | 3.079 | 2.019 | 2.984 | 7.541 |
| Direct (using :accessor) | 0.837 | 1.530 | 2.240 | 12.423 |
| Structure | 0.624 | 1.200 | 1.586 | 6.927 |
The first thing to notice is that CLISP has awful direct access times on my setup. I ran the test several times just to be sure the result wasn't an outlier — it wasn't. The difference between direct access in instances versus structures is similar across implementations, save for CLISP; each differ by a factor of around 1.3 or 1.4 (CLISP is about 1.8). However, the difference in indirect access versus direct access varies from a factor of 3.7 for SBCL to about 1.3 for both Allegro and LispWorks. Again, CLISP is the freak in that indirect access is 1.6 times faster than direct access.
Part of the reason I'm looking into this is that I'm looking at automatic generation of representations for maps. This simple test gives me an idea as to what I should be factoring into my decisions.

5 comments:
As with many/most/all of these things its quite easy to inadvertently benchmark more than you intend to.
I'd recommend replacing the call to
(random 100)
with
(load-time-value (random 100))
it changes the numbers substantially (especially for poor clisp which got battered in this benchmark)
Quite so. When benchmarking it is
often a good idea to first populate
test-vectors with your data.
Also: while
(SLOT-VALUE X 'SLOT-NAME)
is likely to be a lot faster then
(LET ((N 'SLOT-NAME))
(SLOT-VALUE X N))
since the first is easyish to optimize with a compiler-macro,
the latter _can_ be optimized just
the same -- and if there is an inline cache in play,
(SLOT-VALUE X Y)
can seem much faster then it really is if Y remains constant: a better test would use multiple slot names.
Slot access optimizations can also
be sensitive to your class hierarchy (including slot positions), and possible non-accessor methods on the accessor generics.
This was very interesting. I'd love to see you follow up on it as you have time, particularly taking into account Sean Ross' suggestion and dialing up the optimization settings.
Also, benchmarking for different uses case would be interesting especially since your test structure is really a degenerate case. What happens when you have many slots? And when you have many instances?
Don't forget about "freeze-type" - if you thusly finalise structure, the static-checking aspect of the CMUCL or SBCL compiler can make them faster again in some situations.
Post a Comment