Developing a Unit Test Framework (Part 2)

In Part 1 [1], we discussed the main features of what makes a good unit testing framework. This article documents the step by step design process and key decisions made during the development of the unit testing framework.

I will only cover the overarching design goals. If you would like to see a more detailed example of a basic unit testing framework, I recommend trying out the detailed step by step practical in Peter Seibel’s book, “Practical Common Lisp” [2], which is available free of charge. Thanks Peter!

Round 1 – Friendly Syntax

The first step was to establish a friendly, easy to use syntax for writing tests. I believe that the names in any API should strive to summarize what they do before you even look at the documentation.

I took a cue from the general syntax that was used in the other unit testing frameworks and settled on the following syntax:

(defsuite name (Suite1 Suite2 ... SuiteN))
  1. Create a test suite called NAME.
  2. Define the test suite as a sub-suite of the test suites: Suite1 Suite2 … SuiteN
(deftest name (suite1 suite2 ... suiteN) . body)
  1. Create a test case called NAME.
  2. Create a reference to the test case in the test suites: Suite1 Suite2 .. SuiteN
(run-suite name)
  1. Execute a test suite called NAME.
(run-test name)
  1. Execute a  test case called NAME.

Round 2 - Hierarchy Implementation

After establishing the syntax, I needed to decide on how to implement a multiple inheritance unit test framework. To make the rest of the discussion easier, we will use the unit test example shown below.

The NumberSuite contains a set of test suites that are used to test integer and floating point operations.

The BooleanSuite contains tests that evaluate boolean operations such as < (less than) etc. The actual values that are used in the boolean operations are dependent on the calling test suite. So for example if the BooleanSuite tests the operation (< x y). It is the responsibility of the calling suite to set values for x and y. That will be the job of fixtures [3]. We will discuss them shortly.

My initial thought was to take advantage of the Common Lisp Object System (CLOS) [4], and define test suites as standard classes that inherit from their parents. The idea was to write a generic function called EXECUTE-SUITE which accepts one argument, the test suite object.

The next step would be to specialize methods of the generic function on different classes (i.e. test suites). However this clearly does not work, because while an object of the IntegerSuite class can invoke the method applicable to the parent class, there is no easy way for the parent class to invoke methods applicable to its sub-classes short of first creating instances of those classes.

Therefore in the initial design, I settled on a TestSuite class with slots for test suites and test cases. The process of defining a test suite was then as follows:

(defsuite name (Suite1 Suite2 ... SuiteN))

  1. Create a TestSuite object.
  2. Store it in a global hash table under the key NAME.
  3. Set its child_suites slot to the names of its child suites i.e. Suite1 Suite2 .. SuiteN.

Test cases where defined as specialized methods of the generic function EXECUTE-TEST. The generic function takes a single argument, a symbol. It dispatches the applicable method based on the name of the test.

(deftest name (suite1 suite2 ... suiteN) . body)

  1. Define the method EXECUTE-TEST specialized on the symbol NAME.
  2. Retrieve the test suite objects called Suite1 … SuiteN from the global hash table.
  3. Add the reference of the test case NAME to the test_cases slot of each test suite object.
The final implementation uses classes for multiple inheritance and a not too complicated method for figuring out how to traverse up and down the directed acyclic graph for any given test suite hierarchy without storing a tree structure.

Round 3 – Fixture Implementation

The figure below shows the execution of a test suite. It shows the execution paths of a test suite with and without fixtures.


From the diagram, I could see that to define a fixture, what I needed was a way to define an execution context that gets wrapped around a test case or test suite  by a parent suite. Since Common Lisp already provides macros as an easy way to dynamically wrap a piece of code around another, all I had to do now was to establish a user friendly way of doing just that.

(deffixture suitename (@plug) . body)

  1. Define a fixture for the test suite called SUITENAME.
  2. At runtime when the test suite is called, wrap the code BODY around executed tests and suites.
  3. Place the child test or suite execution call at the position identified by @PLUG.

To end the article I will show an example of how to use this feature.

;; Test Suite Definitions
(defsuite NumberSuite ())

(defsuite FloatSuite (NumberSuite))

(defsuite IntegerSuite (NumberSuite))

(defsuite BooleanSuite (FloatSuite IntegerSuite))

;; Fixture Definitions
(deffixture IntegerSuite (@body)
 (let ((x 0) (y 1) (z 2))

(deffixture FloatSuite (@body)
(let ((x 0.0) (y 1.0) (z 2.0))

;; Test Case Definition
(deftest test-bool1 (BooleanSuite)
 (assert-true (< x y z)))


Depending on whether you execute TEST-BOOL1 from the IntegerSuite or from the FloatSuite, the value of x, y, and z will be either floating point numbers or integers.

This freshly “out of the oven” unit testing framework is now out and ready for testing. You can get it here [5] from Github.


  1. Developing a Unit Test Framework (Part 1)
  2. Chapter 9. Practical: Building a Unit Test Framework
  3. Test Fixture
  4. Common Lisp Object System (CLOS)
  5. CLUnit


  1. 1
    Faré on Saturday 10 November, 18:08 PM #

    If your tests and suites aren’t functions, please at least make them funcallable instances, with interactive use by default, and non-interactive use with proper parameters.

    I *really* appreciate not having to type a stupid “run-test” or “run-suite”.

    • 2
      Tapiwa on Monday 12 November, 14:52 PM #

      The stupid functions “run-test” and “run-suite” bind condition handlers while the test functions and assertion forms bind condition restarts.

      The assertion forms signal assertion conditions which are caught by the handlers. By taking advantage of this condition signalling the aforementioned stupid functions can aggregate results from different suites and invoke the restarts in these tests in order to continue with the unit test.

      If you were to run the tests directly or bind the handlers in each test suite/case you would lose the ability to aggregate results.

      • 3
        Patrick Stein on Monday 12 November, 20:16 PM #

        Certainly, if I want aggregated results, I’d expect to have to take some steps to either clear out the last batch of results that I made or to use the aggregation. But, if I just want to keep trying one test until I get code that works for it, it would be nice if I could just invoke that test. I’m not so tied to this though. C-p in the REPL is usually what I want… so having to type run-test one time isn’t killer for me.

        There was already a CLUnit package around. I like yours better, but would have preferred it had a new name.

        Love your documentation. Would loved to have seen a bit about how to run and get the success/fail programatically instead of printed. I was looking to see how easy it would be to use your test package in cl-test-grid, so I was looking at the current report-format functions. I noticed that in the assert-condition formatting function the same form is prefixed with #-clisp and on the following line prefixed with #+clisp.

        • 4
          Tapiwa on Tuesday 13 November, 08:02 AM #

          Fyi, the idea of being able to call test functions as normal functions has not been completed ruled out.

          One of the reasons I like the “run-test” and indeed most of the other forms in the api, is because anyone who looks at your test code will be able to easily differentiate between calls that belong to the unit test and those that do not.

          I tried hunting down for an active development branch of CLUnit but I could not find one. So instead selecting a name in my namespace I decided to take over that one.

          CLisp’s pretty printing is a bit broken so I had to treat is specially :(

          I am looking into further splitting assert-condition into assert-failed, assert-error and assert-passed. That will simplify the internals a bit and maybe I can write a programmatic example using the new conditions.

          As for rerunning failed tests, Chris Riesbeck the author of lisp-unit suggested the following to me:
          “The other thing I like about TestNG is that one option for running is “just run the tests that failed the last time.” This is good for making the turnaround to fix some bugs fast. Then when the failed tests are fixed, you do the normal run to make sure all tests pass. No extra work for the user, just a convenient extra function for quick checking if you solved the problem.”

  2. 5
    levy on Wednesday 14 November, 09:08 AM #

    You don’t need run-test/run-suite to be able to aggregate results. You could simply wrap the test functions with necessary code that checks and optionally creates the dynamic environment. You can also include test functions in multiple test functions, so there’s no need for multiple inheritance. I don’t even know why do people need class like things for writing tests? Why don’t tests use plain old functions? BTW, the Stefil test library have these properties.

    • 6
      Tapiwa on Wednesday 14 November, 20:35 PM #

      If you have a few test functions sure its not a problem. But if you have a lot of them it quickly becomes unwieldy having to keep track of all the places you call the test functions and manually adding a call everytime you add a new test function.

      If you still not convinced see

      • 7
        levy on Thursday 15 November, 09:09 AM #

        (deftest foo (a b) …)
        (deftest bar (a) …)
        (deftest baz () …)

        (deftest foobarbaz (a)
        (foo a 42)
        (foo a 43)
        (bar a)
        (bar (+ a 1))

        (deftest alternative-foobarbaz (b)
        (foo 1 b)
        (foo 2 b)
        (bar 1)

        You can call any of that to test the desired parts.

        How is this more difficult than writing the to be tested program in the first place?

        BTW, stefil has automatic test inclusion in test suites, and yes they are functions too with automatically generated bodies.

        So you can write:

        (in-suite foobarbaz)

        (deftest foo …)
        (deftest bar …)
        (deftest baz …)

        and call the (foobarbaz) function.

  3. 8
    Ralph Möritz on Thursday 15 November, 10:54 AM #

    Nice, this design seems both logical and intuitive to me.

  4. 9
    Max Mikhanosha on Thursday 15 November, 12:35 PM #

    I’m reasonable happy with Stefil, but as an experiment had converted one of my projects to 5am, and after fe[nl]ix fixed some bugs, (specifically added named lambda around each test), it works just as well in practice, if not better.

    My few pet peeves with Stefil are:
    1. The “defining test for suite in diff package” warning is a bit annoying, and makes it difficult to have suites and tests that cross packages.

    2. There is some weird forever recursion case, also related to cross-package stuff, that is only triggered for me for SBCL with debug 3, compilation-speed 3, and it blows up stack with infinite recursion (apparently somehow a test suite inheritance chain ends up being circular)

    3. Maintainer moved on to hu.dwim project, and new Stefil from there pulls entire hu.dwim integrated stack with it, which is very tightly bound together (if you looked at pjb’s cesarum tree, imagine the same, but without any documentation)

    4. Some tricky stuff if you are debugging GC and finalization, stefil tests retain the arguments that were last passed to them, took me half a day of debugging to realize that it was not a bug I had in my c++ shared ptr lisp bridge.

    Some other disorganized thoughts from top of my head:

    Note it took me approximately 1 day of work to convert large test suite from Stefil to 5am.. It really would help if everyone generally followed STEFIL like syntax and methodology.

    One thing that is a showstopper for me, is case safety. One of my projects is using case-sensitive syntax (with :invert readtable), and I can’t use any libraries that are broken by this.. Generally if one uses alexandria:symbolicate for generating symbol names, you should be safe

    Would be nice to have a large pie chart, of which systems in quicklisp use which framework.. My wild guess would be 50% stefil, 20% 5am, 30% others

    What is wrong with 5am and what it does not do comparing to your framework?

    • 10
      Tapiwa on Thursday 15 November, 15:30 PM #

      My primary goal when I set to writing this framework was to pull in all the most useful features from other unit frameworks and add what was lacking but still keep things simple.

      1. You can define symbols in different packages without any issues as far a I can tell.

      2. All test suite inheritance is done via CLOS is circularity is flagged there.

      3. Trying to keep things simple and well documented. Complexity only in places were we haven’t found an easier way.

      4. I think I would like a specific example on this, but in this framework all your test code is dynamically re-evaluated at runtime to pull in any new inline function and macro defnitions.

      On 5AM, nothing is wrong with it, as I pointed out this project is about consolidation. All the interactive debugging from Stefil is in, the informative reporting from lisp-unit is in etc. There are obviously still things missing, but I think we have at least nailed down the major ones so far.

  5. 11
    Max Mikhanosha on Saturday 17 November, 11:13 AM #

    I’ll give it a go once you have workable beta out. I’d rather fe[nl]ix concentrate on fixing iolib and cffi bugs then tinker around with 5am.

Leave a comment

Leave a Reply