Young, or inexperienced, scientists and engineers often struggle with designing experiments and the accompanying data management. This leads to time unnecessarily spent on redoing experiments due to some oversight. The problem is compounded when thorough design is thrown out the window altogether. Tears soon follow.
Yet, designing experiments and managing the resulting data can be relatively simple, if the correct process is followed. I shall look at some of the typical problems encountered, and then suggest a methodology to follow when designing experiments that help to minimise these problems. Much of what I’ll discuss here is a combination from bits of information and tips that I’ve collected over time from many people (especially my supervisors, thank you!) and articles posted all over the internet, which, sadly, I’ve long forgotten the titles and authors of.
There are several problems that crop up when experiments are not properly designed and managed. A typical problem is the fragmentation of data over directories and devices, which makes revisiting the results difficult at best. Another problem is ad hoc change made to the programs or scripts that generated the data when the realisation hits that some information has been previously omitted. This makes performing the same experiment difficult, and usually leads to even more ad hoc hacking. You can see the snowball effect. Many times when a project is finished the metadata (information regarding the data, such as date and author) is lost, and without the metadata the actual data is often useless to the author or any interested party. In the worst case the data is actually lost.
To combat these problems I propose we follow a proper design methodology. The process is much like the process patterns used in software design. The figure below shows the typical process followed when performing experiments, which can be broken down into four phases.
The four phases can be summarised as follows:
- Hypothesis — In this phase we form our hypothesis, which is a proposed explanation to the problem we are investigating with our experiment. The experimental design problems we want to address occur after this phase is complete.
- Requirements analysis — In this phase we analyse and collect the requirements needed to test the hypothesis. Requirements include: the input data needed to perform your experiment, tools such as equipment and software used to perform the experiment, and the relevant output data needed to evaluate the hypothesis.
- Construction — In this phase the requirements are obtained or implemented, after which the experimental setup is performed.
- Execution and evaluation — In the last phase the experiment is performed with the parameters identified in the requirements analysis phase. The evaluation of the results then helps us to accept or refute our hypothesis. In the latter case the hypothesis is either completely rejected, or otherwise revised.
Now this all sounds like common sense to most people familiar with the scientific method. However, too often one or more of these phases are rushed, and even more often there is little or no documentation to fall back on afterwards. Thus, it always pays off to take a little time and plan these phases properly.
The first step that helps to alleviate many of the above-mentioned problems is to perform and document the requirements analysis phase thoroughly, since the act of writing and reviewing the documentation uncovers requirements that have been misinterpreted or overlooked, and confirms that your logic and methodology are correct. Additionally, as an added bonus, the workload at later stages of the project becomes less, since the documentation written whilst designing the experiment can be used with minimal editing when you are covering the experiment in your thesis, paper or report. Documenting the requirement can be daunting or seem like a waste of time. However, it will likely save you time in the long run.
The SINAD group at the MIH Media Lab has now begun a project to formalise our data management process in order to better design our experiments and produce better data sets. The experimental design methodology and requirements analysis documentation suggestions I introduced in this post is used as the basis of this project. The resulting data storage and documentation framework will then be used as a digital library by group members, which will combine all our data sets. Some documentation of digital library models can be found here and here. In a follow-up article I shall give an overview of the design of our system.