Below are the programs for the new eyeCode experiment (#2 on Mechanical Turk). Each program type has 2-3 versions, one of which a programmer will be randomly assigned.
You may try the experiment at experiment.synesthesiam.com.
basketball programs come from an eye-tracking experiment designed by
Teresa Busjahn (Freie Universität Berlin, Germany). The program filters a list
of heights and places those above 180 in a list called
team. With minor
changes, we transform the original recursive program to an iterative one.
bball_sub is only called once. A
inside this method terminates once 5 heights have been examined.
From the programming education literature, we assume that recursion imposes an
additional mental burden for novices (even though the
recursive version is
tail recursive). Thus, we expect novice programmers to take less time on this
iterative_flipped version is identical to 'iterative' except that the
t methods are defined in the opposite order. We expect to see
a difference between the two versions because programmers will be introduced to
aspects of the program in a different order (assuming they read from top to
bottom). We expect this difference to modulate response times, but error rates
could be affected as well.
bball_sub is called 5 times (4 times by itself)
team with the filtered heights. This version's output is identical
We expect this version to take more time for novice programmers due to the additional mental burden imposed by recursion.
between programs test the effect of inlined versus pulled-out
functionality. Both versions perform the same task: filter and intersect two
lists of integers.
functions version, the intersection (
common) and filtering
between) operations have been pulled out into helper functions in a separate
We expect this version to take less time because the necessary abstraction has already been done (i.e., the operations have been identified and named).
inline version, the filtering and intersection operations have been
We expect this version to take more time because identifying the commonality between operations must be manually identified by the programmer.
boolean programs test hypotheses about what makes boolean expressions
more or less complex. Some research suggests that mixing operators (e.g., AND,
OR) and using more parentheses is detrimental to readability.
easy version contains boolean expressions with fewer mixed operators and
parentheses. An intermediary variable is also used in the second expression.
We expect programmers to take less time with this version, and for novices to make fewer errors.
hard version contains boolean expressions with with more mixed operators
We expect programmers to take more time, and make more errors in this version (especially novices).
counting programs test the effect of whitespace on the grouping of loop
statements. In the previous experiment, programmers were likely to assume that
However, it was not clear if this effect was being magnified by the fact that
the words "Done counting" were in the
counting_twospaces version as
counting_done and add a nearly
identical version where the
done version is identical to the
counting_twospaces version of the
previous experiment. In that experiment, programmers were at chance when
guessing whether the final
other version differs from
done only by a few letters in the final
but there is a chance that programmers may make fewer errors if the nonsense
words encourage them to see the
order programs test the effect of definition/use order on programmer
efficiency. The previous experiment had a similar set of programs, but we did
not see a significant difference between the two versions, presumably because
the programs were too short. The new programs are longer and more complex,
hopefully eliciting an effect.
inorder version defines and uses the
functions in the same order as they are called. We expect this version to take
less time because of congruence between the visual and functional layout of the
shuffled version defines and uses the
functions in different orders. We expect this version to take longer because
programmers cannot make strong assumptions about the visual location of the
functions based on their use.
shuffled_colors version is identical to
shuffled except that each
whitespace-separated block has a different text color. We expect programmers to
be faster in this version than the original
shuffled because the colors will
facilitate quicker visual search when locating methods. It's not known,
however, whether the hypothesized speed increase will bring them up to the
expected performance of
nanotech programs test the effect of comments (good and bad) on program
comments version contains comments for each part of the program. While
the first two comments are helpful, the last one is incorrect (the molecule or
formula name is printed, not the atom). We expect novices to produce more
errors in this version because of a heavier reliance on the comments.
nocomments version is identical to
comments except that the comments
are removed. We expect novice programmers to perform better on this version
because they will not be led astray by the final comment.
overload programs test the effect of having different semantic priming
for the same operator (e.g.,
+) within a program. The previous experiment had
three versions: (
plusmixed) numeric and string
* and string
+, and (
strings) only string
strings version was found to take longer than the others, it was
not possible to know if this was due to there simply being more characters in
the output (full words were used). In this experiment, both versions of
overload have the same number of characters in their output.
numbers version primes the programmer with the numeric version of
and then ends with a string concatenation. We expect programmers to make more
errors in this version because they will be primed to think of
+ as being
addition, and accidentally compute 5 + 3 = 8.
words version only uses
+ for string concatenation. We expect
programmers to make fewer errors in this version, and to correctly predict the
final line as "53".
rectangle programs test the effect of domain knowledge on expectations.
In the previous experiment, programmers did very well on all versions of
rectangle despite differences in representation. In this experiment, we
introduce bugs into the code and see if long or short identifier names help in
The two bugs are:
areafunction mistakenly computes
top - bottom
- The first call to
area()flips the order of the
long version uses
bottom for the names of
the rectangle sides in congruence with the definition of
area(). While long
identifiers are negatively correlated with readability (Buse, 2010), we expect
them to help in catching the two bugs. Thus, programmers should make fewer
errors on this version.
short version uses
y2 for the names of the
rectangle sides. Although the identifers are shorter, this is incongruent with
the definition of
area(). We expect programmers to make more errors in this
version; specifically, we expect them to not catch the two bugs.
scope programs violate one of Soloway's Rules of Discourse: don't include
code that does nothing. All three versions contain two functions that do not
(and cannot) modify their integer parameter.
In the previous experiment, participants were at chance when predicting whether the output was 4 or 22 regardless of experience.
justreturn version, the
twice functions do not "pretend"
x; instead, they simply return a new value.
We expect this version to have the fewest errors because programmers will more readily see that the return values of each function call are not saved or manipulated.
noreturn version updates
x without returning a value (closely
diffname version of
scope from the previous experiment).
We expect programmers to make the most errors with this version because of (1)
the strong expectation that included code should do something, and (2)
without a return statement,
twice can never do anything.
return version combines the
justreturn versions by
x and then returning its value.
We expect this version to produce similar results to
justreturn, but it is
possible that the modification of
x will trigger an additional expectation
added has changed (despite the fact that Python is effectively
pass-by-value for non-mutable types such as
whitespace programs test the effect of horizontal and vertical whitespace
on programmer performance. In the previous experiment, we tested whether having
code aligned by operators impacted order-of-operations errors. We did not find
a significant effect, prompting a more extreme investigation.
normal version is spaced as most programmers would expect (though not
aligned by operator). We expect this version to take the least time and have
the fewest errors.
nospace version is identical to
normal except that all whitespace has
been removed (obviously not in front of the
nospace_highlight version is identical to
nospace but includes some
light syntax highlighting of keywords, numbers, and operators. We expect this
version to be easier than
nospace (i.e., less time and fewer errors), but it
is unknown whether the highlighting will allow programmers to achieve the same