Skip to main content

New Programs

Below are the programs for the new eyeCode experiment (#2 on Mechanical Turk). Each program type has 2-3 versions, one of which a programmer will be randomly assigned.

You may try the experiment at


The basketball programs come from an eye-tracking experiment designed by Teresa Busjahn (Freie Universität Berlin, Germany). The program filters a list of heights and places those above 180 in a list called team. With minor changes, we transform the original recursive program to an iterative one.


In the iterative version, bball_sub is only called once. A while loop inside this method terminates once 5 heights have been examined.

From the programming education literature, we assume that recursion imposes an additional mental burden for novices (even though the recursive version is tail recursive). Thus, we expect novice programmers to take less time on this version.


The iterative_flipped version is identical to 'iterative' except that the bball_sub and t methods are defined in the opposite order. We expect to see a difference between the two versions because programmers will be introduced to aspects of the program in a different order (assuming they read from top to bottom). We expect this difference to modulate response times, but error rates could be affected as well.


In the recursive version, bball_sub is called 5 times (4 times by itself) to update team with the filtered heights. This version's output is identical to the iterative version.

We expect this version to take more time for novice programmers due to the additional mental burden imposed by recursion.


The between programs test the effect of inlined versus pulled-out functionality. Both versions perform the same task: filter and intersect two lists of integers.


In the functions version, the intersection (common) and filtering (between) operations have been pulled out into helper functions in a separate file (

We expect this version to take less time because the necessary abstraction has already been done (i.e., the operations have been identified and named).


In the inline version, the filtering and intersection operations have been inlined.

We expect this version to take more time because identifying the commonality between operations must be manually identified by the programmer.


The boolean programs test hypotheses about what makes boolean expressions more or less complex. Some research suggests that mixing operators (e.g., AND, OR) and using more parentheses is detrimental to readability.


The easy version contains boolean expressions with fewer mixed operators and parentheses. An intermediary variable is also used in the second expression.

We expect programmers to take less time with this version, and for novices to make fewer errors.


The hard version contains boolean expressions with with more mixed operators and parentheses.

We expect programmers to take more time, and make more errors in this version (especially novices).


The counting programs test the effect of whitespace on the grouping of loop statements. In the previous experiment, programmers were likely to assume that the final print statement was not part of the loop when there were two blank lines above it (counting_twospaces).

However, it was not clear if this effect was being magnified by the fact that the words "Done counting" were in the print statement. In this experiment, we retain the counting_twospaces version as counting_done and add a nearly identical version where the print statement prints something nonsensical.


The done version is identical to the counting_twospaces version of the previous experiment. In that experiment, programmers were at chance when guessing whether the final print statement belonged inside or outside the loop. We expect those results to be replicated here.


The other version differs from done only by a few letters in the final print statement. We expect no difference between this and the done version, but there is a chance that programmers may make fewer errors if the nonsense words encourage them to see the print statement as part of the loop.


The order programs test the effect of definition/use order on programmer efficiency. The previous experiment had a similar set of programs, but we did not see a significant difference between the two versions, presumably because the programs were too short. The new programs are longer and more complex, hopefully eliciting an effect.


The inorder version defines and uses the swim, jump, fly, and skip functions in the same order as they are called. We expect this version to take less time because of congruence between the visual and functional layout of the functions.


The shuffled version defines and uses the swim, jump, fly, and skip functions in different orders. We expect this version to take longer because programmers cannot make strong assumptions about the visual location of the functions based on their use.


The shuffled_colors version is identical to shuffled except that each whitespace-separated block has a different text color. We expect programmers to be faster in this version than the original shuffled because the colors will facilitate quicker visual search when locating methods. It's not known, however, whether the hypothesized speed increase will bring them up to the expected performance of inorder.


The nanotech programs test the effect of comments (good and bad) on program understanding.


The comments version contains comments for each part of the program. While the first two comments are helpful, the last one is incorrect (the molecule or formula name is printed, not the atom). We expect novices to produce more errors in this version because of a heavier reliance on the comments.


The nocomments version is identical to comments except that the comments are removed. We expect novice programmers to perform better on this version because they will not be led astray by the final comment.


The overload programs test the effect of having different semantic priming for the same operator (e.g., +) within a program. The previous experiment had three versions: (plusmixed) numeric and string +, (multimixed) numeric * and string +, and (strings) only string +.

While the strings version was found to take longer than the others, it was not possible to know if this was due to there simply being more characters in the output (full words were used). In this experiment, both versions of overload have the same number of characters in their output.


The numbers version primes the programmer with the numeric version of +, and then ends with a string concatenation. We expect programmers to make more errors in this version because they will be primed to think of + as being addition, and accidentally compute 5 + 3 = 8.


The words version only uses + for string concatenation. We expect programmers to make fewer errors in this version, and to correctly predict the final line as "53".


The rectangle programs test the effect of domain knowledge on expectations. In the previous experiment, programmers did very well on all versions of rectangle despite differences in representation. In this experiment, we introduce bugs into the code and see if long or short identifier names help in catching them.

The two bugs are:

  1. The area function mistakenly computes height as top - bottom
  2. The first call to area() flips the order of the top and bottom arguments


The long version uses left, right, top, and bottom for the names of the rectangle sides in congruence with the definition of area(). While long identifiers are negatively correlated with readability (Buse, 2010), we expect them to help in catching the two bugs. Thus, programmers should make fewer errors on this version.


The short version uses x1, x2, y1, and y2 for the names of the rectangle sides. Although the identifers are shorter, this is incongruent with the definition of area(). We expect programmers to make more errors in this version; specifically, we expect them to not catch the two bugs.


The scope programs violate one of Soloway's Rules of Discourse: don't include code that does nothing. All three versions contain two functions that do not (and cannot) modify their integer parameter.

In the previous experiment, participants were at chance when predicting whether the output was 4 or 22 regardless of experience.


In the justreturn version, the add_1 and twice functions do not "pretend" to update x; instead, they simply return a new value.

We expect this version to have the fewest errors because programmers will more readily see that the return values of each function call are not saved or manipulated.


The noreturn version updates x without returning a value (closely resembling the diffname version of scope from the previous experiment).

We expect programmers to make the most errors with this version because of (1) the strong expectation that included code should do something, and (2) without a return statement, add_1 and twice can never do anything.


The return version combines the noreturn and justreturn versions by modifying x and then returning its value.

We expect this version to produce similar results to justreturn, but it is possible that the modification of x will trigger an additional expectation that added has changed (despite the fact that Python is effectively pass-by-value for non-mutable types such as int).


The whitespace programs test the effect of horizontal and vertical whitespace on programmer performance. In the previous experiment, we tested whether having code aligned by operators impacted order-of-operations errors. We did not find a significant effect, prompting a more extreme investigation.


The normal version is spaced as most programmers would expect (though not aligned by operator). We expect this version to take the least time and have the fewest errors.


The nospace version is identical to normal except that all whitespace has been removed (obviously not in front of the print keywords, though). We expect this version to be the most difficult, taking more time and producing more errors than normal or nospace_highlight.


The nospace_highlight version is identical to nospace but includes some light syntax highlighting of keywords, numbers, and operators. We expect this version to be easier than nospace (i.e., less time and fewer errors), but it is unknown whether the highlighting will allow programmers to achieve the same performance as normal.