%run prelude.ipy

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

import statsmodels.formula.api as sm
from statsmodels import graphics
from statsmodels.api import families
from patsy import Treatment

def program_names(df):
    """Returns a list of program_version names from a pandas DataFrame."""
    return list(df.apply(lambda r: "{0}_{1}".format(r["base"], r["version"]), axis=1).values)

def names_distance(df1, df2):
    """Returns the normalized edit distance between program_version names in the two pandas DataFrames."""
    from nltk.metrics import edit_distance
    names1 = program_names(df1)
    names2 = program_names(df2)
    idxs1 = np.arange(len(names1))
    idxs2 = np.array([names1.index(n) for n in names2])
    return edit_distance(idxs1, idxs2) / float(len(idxs1))

def round_fmt(x):
    return str(np.round(x, 2))
    
trials["norm_duration_ms"] = trials.duration_ms / trials.code_lines
trials["response_ge_half"] = trials.response_proportion >= 0.5
trials["keystroke_ge_one"] = trials.keystroke_coefficient >= 1

duration_meds = {}
for base, version in programs[["base", "version"]].values:
    ts = util.filter_program(trials, base, version)
    duration_meds[(base, version)] = ts.duration_ms.median()
    
duration_all_med = trials.duration_ms.median()
#trials["duration_ge_med"] = trials.apply(lambda t: t["duration_ms"] >= duration_meds[(t["base"], t["version"])], axis=1)
trials["duration_ge_med"] = trials.duration_ms >= duration_all_med

complexity_cols = ["code_lines", "cyclo_comp", "hal_effort", "hal_volume"]
performance_cols = ["grade_value", "duration_ms", "norm_duration_ms", "keystroke_coefficient","response_proportion"]
demo_cols = ["age", "py_years", "prog_years", "degree_num", "cs_major_num"]

Programs and their Complexity Metrics

Let's start off by looking at metrics for each of the programs. The following metrics are shown below:

code_chars - number of characters in the code
code_lines - number of lines in the program (includes blank lines)
cyclo_comp - McCabe's Cyclomatic Complexity (computed with PyMetrics 0.8.1)
hal_effort - Halstead's Effort (computed with PyMetrics 0.8.1)
hal_volume - Halstead's Volume (computed with PyMetrics 0.8.1)
output_chars - number of characters in the correct output
output_lines - number of lines in the correct output

show_dataframe(programs)

Of the 7 metrics listed above, only 4 commonly used as measures of complexity: lines of code, Cyclomatic Complexity, Halstead Effort, and Halstead Volume. We will focus on these 4 metrics for the rest of the analysis.

Correlations Between Metrics

Although the 4 complexity metrics measure different aspects of a program's text, they are often found to be highly correlated in large repositories of program code. Let's see how correlated they are in our 25 programs.

fmts = { c: round_fmt for c in complexity_cols }
print programs[complexity_cols].corr("pearson").to_latex(formatters=fmts)

\begin{tabular}{lrrrr}
\toprule
{} & code\_lines & cyclo\_comp & hal\_effort & hal\_volume \\
\midrule
code\_lines &        1.0 &       0.46 &       0.78 &       0.86 \\
cyclo\_comp &       0.46 &        1.0 &       0.72 &       0.77 \\
hal\_effort &       0.78 &       0.72 &        1.0 &       0.94 \\
hal\_volume &       0.86 &       0.77 &       0.94 &        1.0 \\
\bottomrule
\end{tabular}

We can see strong correlations between the Halstead metrics and everything else. There is also a medium correlation between lines of code and Cyclomatic Complexity (usually higher in larger code bases). Because the two Halstead metrics are so highly correlated with each other, we will only consider Halstead Effort moving forward.

Visualizing the Complexity Space

With 3 complexity metrics remaining, we can visualize our programs as points in a 3D complexity space. Below is a series of plots from different angles of this space.

xs, ys, zs = programs.code_lines, programs.cyclo_comp, programs.hal_effort / 1000.0
def plot_metrics(ax):
    ax.set_xlabel("Lines of Code")
    ax.set_ylabel("Cyclomatic Complexity")
    ax.set_zlabel("Halstead Effort")
    ax.scatter3D(xs, ys, zs)
    
fig = plot.misc.plot3d_views(plot_metrics, figsize=(20, 10))
fig.tight_layout()
fig.suptitle("Code Metrics Scatter Plot (all programs)")
fig

From the plots, we can see that our programs are fairly spread out along the lines of code and Cyclomatic Complexity axes. Only a few have high Halstead Effort, but this is not surprising given only a few have many lines of code (and these two metrics are strongly correlated).

Predicting Performance

If our 3 complexity metrics (lines of code, Cyclomatic Complexity, and Halstead Effort) are truly measuring the cognitive complexity of their respective programs, then we should be able to predict participants' performance using them.

For example, lines of code is often used as a proxy for cognitive complexity by assuming that human short-term memory is limited. More lines of code places a cognitive burden on the reader and, therefore, longer programs should be harder to understand.

We will attempt to use each complexity metric to predict performance in 4 different categories:

Grade
- A grade of 7 or higher (out of 10) is correct
- More complex programs should result in a lower grade
Trial duration
- Time from start to finish (reading + responding)
- More complex programs should take longer to read and respond to (higher duration)
Keystroke coefficient
- Number of actual keystrokes / required keystrokes
- More complex programs should require more keystrokes due to mistakes/corrections (higher coefficient)
Response proportion
- Time spent responding / trial duration
- More complex programs should require more reading time up front (higher proportion)

To predict performance, we will order the list of programs using the complexity metric and performance metric. The normalized edit distance between these two orderings will low (close to 0) if the orderings are very close and high (close to 1.0) otherwise.

# Compute median trial duration by program (all trials)
med_duration_all = pandas.DataFrame(
    trials.groupby(["base", "version"])\
    .apply(lambda f: (f["duration_ms"].median() / 1000.0) / f["code_lines"].mean())
    .order(ascending=False), columns=["sec_per_line"])\
    .reset_index()
    
# Compute median trial duration by program (correct trials only)    
med_duration_correct = pandas.DataFrame(
    trials[trials.grade_correct].groupby(["base", "version"])\
    .apply(lambda f: (f["duration_ms"].median() / 1000.0) / f["code_lines"].mean())
    .order(ascending=False), columns=["sec_per_line"])\
    .reset_index()
    
# Compute median keystroke coefficient by program
med_keystroke_coeff = pandas.DataFrame(
    trials.groupby(["base", "version"])\
    .apply(lambda f: f["keystroke_coefficient"].median())
    .order(ascending=False), columns=["key_coeff"])\
    .reset_index()
        
# Compute median response proportion by program
med_response_prop = pandas.DataFrame(
    trials.groupby(["base", "version"])\
    .apply(lambda f: f["response_proportion"].median())
    .order(ascending=True), columns=["resp_prop"])\
    .reset_index()

By Lines of Code

Grade

trials[["code_lines", "grade_value"]].corr()

Trial Duration

trials[["code_lines", "duration_ms"]].corr()

trials[["code_lines", "norm_duration_ms"]].corr()

trials[trials.grade_correct][["code_lines", "norm_duration_ms"]].corr()

Keystroke Coefficient

trials[["code_lines", "keystroke_coefficient"]].corr()

Response Proportion

trials[["code_lines", "response_proportion"]].corr()

By All Metrics (Regressors)

rows = []

for comp_col in complexity_cols:
    for perf_col in performance_cols:    
        for corr_alg in ["pearson", "spearman"]:
            for correct_only in [False, True]:
                ts = trials[trials.grade_correct] if correct_only else trials
                result = ts[[comp_col, perf_col]].corr(corr_alg).values[0, 1]
                rows.append([comp_col, perf_col, corr_alg, "correct" if correct_only else "all", result])
                
corr_df = pandas.DataFrame(rows, columns=["Complexity Metric", "Performance Metric",
                           "Algorithm", "Trial Filter", "Correlation"])

# Sort descending by absolute value
corr_df = corr_df.reindex(corr_df["Correlation"].abs().order(ascending=False).index)
show_dataframe(corr_df)

rows = []

for comp_col in complexity_cols:
    for perf_col in [c for c in performance_cols if c != "norm_duration_ms"]:    
        result = trials[[comp_col, perf_col]].corr().values[0, 1]
        rows.append([comp_col, perf_col, result])
                
corr_df = pandas.DataFrame(rows, columns=["Complexity Metric", "Performance Metric", "Correlation"])

# Sort descending by absolute value
#corr_df = corr_df.reindex(corr_df["Correlation"].abs().order(ascending=False).index)
#show_dataframe(corr_df)
fmts = [round_fmt] * (len(performance_cols) - 1)
print corr_df.pivot("Complexity Metric", "Performance Metric", "Correlation").to_latex(formatters=fmts)

\begin{tabular}{lrrrr}
\toprule
Performance Metric & duration\_ms & grade\_value & keystroke\_coefficient & response\_proportion \\
\midrule
Complexity Metric &             &             &                       &                     \\
code\_lines        &        0.44 &        -0.1 &                  0.03 &                0.03 \\
cyclo\_comp        &        0.34 &       -0.24 &                 -0.03 &                  -0 \\
hal\_effort        &        0.46 &       -0.21 &                 -0.07 &                0.08 \\
hal\_volume        &        0.48 &       -0.17 &                 -0.04 &                0.03 \\
\bottomrule
\end{tabular}

from eyecode import classify
cp_cols = performance_cols #["duration_ms", "response_proportion"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance
    fi_df = classify.feature_importances(trials, complexity_cols, perf_col, regressor=True)
    ax = plot.misc.feature_importances(fi_df, ax=axes[i, 0])
    ax.set_title("Feature Importances ({0})".format(perf_col))

    # Cross-validation
    cv_df = classify.cross_validation(trials, complexity_cols, perf_col, regressor=True)
    ax = plot.misc.cross_validation(cv_df, ax=axes[i, 1])
    ax.set_title("$R^2$ ({0}, CV=10)".format(perf_col))
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
fig.tight_layout()
fig

grade_value 0.0801560340656
duration_ms 0.185165864585
norm_duration_ms 0.144450190949
keystroke_coefficient 0.0542231991986
response_proportion 0.237702229863

By All Metrics (Classifiers)

from eyecode import classify

importances = classify.feature_importances(trials, complexity_cols, "grade_correct")
cross_val = classify.cross_validation(trials, complexity_cols, "grade_correct", repeat=10)

axes = plot.misc.importances_and_crossval(importances, cross_val, "grade_correct",
                                          figsize=(12, 5), repeat=10)
axes[0].figure.tight_layout()
axes[0].figure

importances = classify.feature_importances(trials, complexity_cols + demo_cols, "grade_correct")
cross_val = classify.cross_validation(trials, complexity_cols + demo_cols, "grade_correct", repeat=10)

axes = plot.misc.importances_and_crossval(importances, cross_val, "grade_correct",
                                          figsize=(12, 5), repeat=10)
axes[0].figure.tight_layout()
axes[0].figure

stats.wilcox_test(cross_val[cross_val.classifier == "dummy"].score, classify.cross_val_performance(cross_val))

1.6437797589705496e-48

scores = classify.one_at_a_time(trials, complexity_cols + demo_cols, "grade_correct", norm=True)
print zip(complexity_cols + demo_cols, scores)

[('code_lines', 1.0), ('cyclo_comp', 0.97194176133322308), ('hal_effort', 0.9983173390954958), ('hal_volume', 0.98232860410350886), ('age', 0.84522499606593615), ('py_years', 0.78965941806827289), ('prog_years', 0.79526529224529796), ('degree_num', 0.78376720938573352), ('cs_major_num', 0.75866836611437904)]

cp_cols = ["grade_value", "duration_ms", "response_proportion", "keystroke_coefficient"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, complexity_cols, perf_col, regressor=True)
    cv_df = classify.cross_validation(trials, complexity_cols, perf_col, repeat=10, regressor=True)
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10, regressor=True)
    
fig.tight_layout()
fig

grade_value 0.0801560340656
duration_ms 0.185165864585
response_proportion 0.237702229863
keystroke_coefficient 0.0542231991986

cp_cols = ["grade_value", "duration_ms", "response_proportion", "keystroke_coefficient"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, demo_cols, perf_col, regressor=True)
    cv_df = classify.cross_validation(trials, demo_cols, perf_col, repeat=10, regressor=True)
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10, regressor=True)
    
fig.tight_layout()
fig

grade_value -0.139943427395
duration_ms -0.122278190501
response_proportion -0.154682920042
keystroke_coefficient -0.0925546543477

from eyecode import classify
cp_cols = ["grade_correct", "duration_ge_med", "response_ge_half", "keystroke_ge_one"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, complexity_cols, perf_col)
    cv_df = classify.cross_validation(trials, complexity_cols, perf_col, repeat=10)
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10)
    
fig.tight_layout()
fig

grade_correct 0.708500719428
duration_ge_med 0.763862149398
response_ge_half 0.731754215876
keystroke_ge_one 0.843033038998

cp_cols = ["grade_correct", "duration_ge_med", "response_ge_half", "keystroke_ge_one"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, demo_cols, perf_col)
    cv_df = classify.cross_validation(trials, demo_cols, perf_col, repeat=10)
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10)
    
fig.tight_layout()
fig

Linear Models

complexity_cols = ["code_lines", "cyclo_comp", "hal_volume", "hal_effort"]
performance_cols = ["grade_value", "duration_sec", "keystroke_coefficient","response_proportion"]
demo_cols = ["age", "py_years", "prog_years"]

fits = {}
for perf_col in performance_cols:
    m_comp = sm.ols(formula="{0} ~ {1}".format(perf_col, "+".join(complexity_cols)), data=trials)
    m_demo = sm.ols(formula="{0} ~ {1}".format(perf_col, "+".join(demo_cols)), data=trials)
    m_comp_demo = sm.ols(formula="{0} ~ {1}".format(perf_col, "+".join(complexity_cols + demo_cols)), data=trials)
    for model, kind in zip([m_comp, m_demo, m_comp_demo], ["comp", "demo", "comp_demo"]):
        fit = model.fit()
        fits[(perf_col, kind)] = fit

sorted([(k, v.rsquared_adj) for k, v in fits.iteritems()], key=lambda x: x[1], reverse=True)

[(('duration_sec', 'comp_demo'), 0.24410930143517773),
 (('duration_sec', 'comp'), 0.23095969293024032),
 (('grade_value', 'comp_demo'), 0.10196661960638331),
 (('grade_value', 'comp'), 0.09598149152467228),
 (('keystroke_coefficient', 'comp_demo'), 0.028032441108442496),
 (('keystroke_coefficient', 'comp'), 0.024773421650415406),
 (('response_proportion', 'comp_demo'), 0.022982721295098596),
 (('response_proportion', 'comp'), 0.022889770128601494),
 (('duration_sec', 'demo'), 0.011708375030034635),
 (('grade_value', 'demo'), 0.0053168475861895548),
 (('keystroke_coefficient', 'demo'), 0.0030743177137255717),
 (('response_proportion', 'demo'), -0.00040819671687919445)]

ax = plot.misc.fit_coefficients(fits[("duration_sec", "comp_demo")], skip_intercept=True)
ax.set_title("OLS Coefficients for Duration (sec) Prediction\n(Complexity + Demographics)")
ax.set_ylabel("OLS Coefficient (95% CI)")
ax.figure.tight_layout()
ax.figure

ax = plot.misc.fit_coefficients(fits[("grade_value", "comp_demo")], skip_intercept=True)
ax.set_title("OLS Coefficients for Grade Prediction\n(Complexity + Demographics)")
ax.set_ylabel("OLS Coefficient (95% CI)")
ax.figure.tight_layout()
ax.figure

bin_perf_cols = ["grade_correct", "duration_ge_med", "keystroke_ge_one", "response_ge_half"]

fits = {}
for perf_col in bin_perf_cols:
    m_comp = sm.logit(formula="to_int({0}) ~ {1}".format(perf_col, "+".join(complexity_cols)), data=trials)
    m_demo = sm.logit(formula="to_int({0}) ~ {1}".format(perf_col, "+".join(demo_cols)), data=trials)
    m_comp_demo = sm.logit(formula="to_int({0}) ~ {1}".format(perf_col, "+".join(complexity_cols + demo_cols)), data=trials)
    for model, kind in zip([m_comp, m_demo, m_comp_demo], ["comp", "demo", "comp_demo"]):
        fit = model.fit()
        fits[(perf_col, kind)] = fit

Optimization terminated successfully.
         Current function value: 0.519114
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.570012
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.515841
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.597499
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.685247
         Iterations 4
Optimization terminated successfully.
         Current function value: 0.588063
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.386580
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.471764
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.385161
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.645306
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.664588
         Iterations 4
Optimization terminated successfully.
         Current function value: 0.644587
         Iterations 5

sorted([(k, v.prsquared) for k, v in fits.iteritems()], key=lambda x: x[1], reverse=True)

[(('keystroke_ge_one', 'comp_demo'), 0.18560709231567341),
 (('keystroke_ge_one', 'comp'), 0.18260707541278021),
 (('duration_ge_med', 'comp_demo'), 0.15141725170176235),
 (('duration_ge_med', 'comp'), 0.13780060404506345),
 (('grade_correct', 'comp_demo'), 0.099308533119037534),
 (('grade_correct', 'comp'), 0.093594630945558421),
 (('response_ge_half', 'comp_demo'), 0.030762995773983892),
 (('response_ge_half', 'comp'), 0.029681171729497913),
 (('duration_ge_med', 'demo'), 0.011179930751445299),
 (('grade_correct', 'demo'), 0.004722992003818316),
 (('keystroke_ge_one', 'demo'), 0.0024912951888387269),
 (('response_ge_half', 'demo'), 0.000687332978648314)]

ax = plot.misc.fit_coefficients(fits[("keystroke_ge_one", "comp_demo")], skip_intercept=True)
ax.set_title("Logit Coefficients for Grade Correct Prediction\n(Complexity + Demographics)")
ax.set_ylabel("Logit Coefficient (95% CI)")
ax.figure.tight_layout()
ax.figure

fit = sm.ols(formula="grade_value ~ code_lines + cyclo_comp + hal_effort", data=trials).fit()
fit.summary()

fit = sm.glm(formula="grade_correct ~ 1",
             data=trials, family=families.Binomial()).fit()
fit.summary()

def to_int(x):
    return x.astype(int)

fit = sm.logit(formula="to_int(grade_correct) ~ code_lines + cyclo_comp + hal_effort + hal_volume",
             data=trials).fit()
fit.summary()

Optimization terminated successfully.
         Current function value: 0.519114
         Iterations 6

ax = plot.misc.fit_coefficients(fit)
ax.set_title("OLS Coefficients for 
ax.figure.tight_layout()
ax.figure

base	version	code_chars	code_lines	cyclo_comp	hal_effort	hal_volume	output_chars	output_lines
between	functions	496	24	7	94192.063393	830.218507	33	3
between	inline	365	19	7	45596.278445	660.815630	33	3
counting	nospace	77	3	2	738.402323	82.044703	116	8
counting	twospaces	81	5	2	738.402323	82.044703	116	8
funcall	nospace	50	4	2	937.653743	109.392937	3	1
funcall	space	54	4	2	937.653743	109.392937	3	1
funcall	vars	72	7	2	1735.731282	154.287225	3	1
initvar	bothbad	103	9	3	3212.495182	212.396376	5	2
initvar	good	103	9	3	3212.495182	212.396376	6	2
initvar	onebad	103	9	3	2866.823438	208.496250	6	2
order	inorder	137	14	4	8372.306047	303.069902	6	1
order	shuffled	137	14	4	8372.306047	303.069902	6	1
overload	multmixed	78	11	1	2340.000000	120.000000	9	3
overload	plusmixed	78	11	1	3428.296498	117.206718	7	3
overload	strings	98	11	1	3428.296498	117.206718	21	3
partition	balanced	105	5	4	2896.001287	188.869649	26	4
partition	unbalanced	102	5	4	2382.342809	177.199052	19	3
partition	unbalanced_pivot	120	6	4	2707.766879	196.214991	19	3
rectangle	basic	293	18	2	18801.174998	396.335705	7	2
rectangle	class	421	21	5	43203.698685	620.148785	7	2
rectangle	tuples	277	14	2	15627.749381	403.817813	7	2
scope	diffname	144	12	3	2779.714286	188.000000	2	1
scope	samename	156	12	3	2413.342134	183.623858	2	1
whitespace	linedup	275	14	1	6480.000000	216.000000	13	3
whitespace	zigzag	259	14	1	6480.000000	216.000000	13	3

Dep. Variable:	grade_value	R-squared:	0.069
Model:	OLS	Adj. R-squared:	0.068
Method:	Least Squares	F-statistic:	39.73
Date:	Fri, 08 Nov 2013	Prob (F-statistic):	9.27e-25
Time:	10:56:04	Log-Likelihood:	-4056.9
No. Observations:	1602	AIC:	8122.
Df Residuals:	1598	BIC:	8143.
Df Model:	3

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	8.3994	0.282	29.756	0.000	7.846 8.953
code_lines	0.0802	0.022	3.595	0.000	0.036 0.124
cyclo_comp	-0.3135	0.067	-4.690	0.000	-0.445 -0.182
hal_effort	-2.879e-05	7.47e-06	-3.851	0.000	-4.34e-05 -1.41e-05

Omnibus:	240.363	Durbin-Watson:	1.739
Prob(Omnibus):	0.000	Jarque-Bera (JB):	362.826
Skew:	-1.165	Prob(JB):	1.63e-79
Kurtosis:	3.048	Cond. No.	9.21e+04

Dep. Variable:	['grade_correct[False]', 'grade_correct[True]']	No. Observations:	1602
Model:	GLM	Df Residuals:	1601
Model Family:	Binomial	Df Model:	0
Link Function:	logit	Scale:	1.0
Method:	IRLS	Log-Likelihood:	-917.49
Date:	Fri, 08 Nov 2013	Deviance:	1835.0
Time:	11:07:51	Pearson chi2:	1.60e+03
No. Iterations:	5

	code_lines	norm_duration_ms
code_lines	1.000000	-0.257481
norm_duration_ms	-0.257481	1.000000

	code_lines	norm_duration_ms
code_lines	1.000000	-0.296904
norm_duration_ms	-0.296904	1.000000

	code_lines	keystroke_coefficient
code_lines	1.000000	0.027206
keystroke_coefficient	0.027206	1.000000

	code_lines	response_proportion
code_lines	1.000000	0.031656
response_proportion	0.031656	1.000000

Complexity Metric	Performance Metric	Algorithm	Trial Filter	Correlation
hal_volume	duration_ms	spearman	all	0.502864
hal_volume	duration_ms	pearson	all	0.475635
code_lines	duration_ms	spearman	all	0.470349
hal_effort	duration_ms	spearman	all	0.467037
hal_effort	duration_ms	pearson	all	0.458027
hal_effort	duration_ms	pearson	correct	0.450659
hal_volume	duration_ms	spearman	correct	0.447555
hal_volume	duration_ms	pearson	correct	0.440966
code_lines	duration_ms	pearson	all	0.435540
code_lines	duration_ms	spearman	correct	0.416915
hal_effort	duration_ms	spearman	correct	0.406216
code_lines	duration_ms	pearson	correct	0.394970
cyclo_comp	duration_ms	pearson	all	0.338725
cyclo_comp	duration_ms	pearson	correct	0.284601
cyclo_comp	grade_value	pearson	all	-0.244609
hal_effort	grade_value	pearson	all	-0.212394
cyclo_comp	duration_ms	spearman	all	0.211662
hal_effort	grade_value	spearman	correct	-0.209779
cyclo_comp	grade_value	spearman	all	-0.196650
code_lines	grade_value	spearman	correct	-0.187668
hal_volume	grade_value	spearman	correct	-0.174171
hal_volume	grade_value	pearson	all	-0.168025
code_lines	grade_value	pearson	correct	-0.151874
cyclo_comp	duration_ms	spearman	correct	0.138354
cyclo_comp	response_proportion	spearman	correct	-0.134434
cyclo_comp	grade_value	pearson	correct	-0.124823
code_lines	keystroke_coefficient	spearman	all	0.115503
hal_effort	keystroke_coefficient	spearman	correct	0.113340
hal_volume	grade_value	pearson	correct	-0.112406
hal_volume	keystroke_coefficient	spearman	correct	0.107100
code_lines	grade_value	spearman	all	-0.102843
code_lines	grade_value	pearson	all	-0.098965
hal_volume	keystroke_coefficient	spearman	all	0.096389
hal_effort	response_proportion	spearman	all	0.095135
code_lines	keystroke_coefficient	spearman	correct	0.093464
cyclo_comp	response_proportion	spearman	all	-0.093067
hal_volume	grade_value	spearman	all	-0.092600
hal_effort	grade_value	spearman	all	-0.089613
hal_effort	keystroke_coefficient	spearman	all	0.086791
hal_effort	response_proportion	spearman	correct	0.085854
hal_effort	grade_value	pearson	correct	-0.082825
hal_effort	response_proportion	pearson	all	0.079122
cyclo_comp	grade_value	spearman	correct	-0.078175
hal_effort	keystroke_coefficient	pearson	all	-0.066700
cyclo_comp	response_proportion	pearson	correct	-0.057641
cyclo_comp	keystroke_coefficient	pearson	correct	-0.044185
hal_effort	keystroke_coefficient	pearson	correct	-0.040546
hal_volume	keystroke_coefficient	pearson	all	-0.036858
hal_volume	response_proportion	spearman	all	0.034812
code_lines	response_proportion	spearman	correct	0.034718
hal_volume	response_proportion	pearson	correct	-0.031945
code_lines	response_proportion	spearman	all	0.031656
hal_volume	response_proportion	pearson	all	0.029267
code_lines	response_proportion	pearson	all	0.027499
code_lines	keystroke_coefficient	pearson	all	0.027206
cyclo_comp	keystroke_coefficient	pearson	all	-0.026477
hal_effort	response_proportion	pearson	correct	0.021303
hal_volume	response_proportion	spearman	correct	0.020995
code_lines	response_proportion	pearson	correct	0.017609
code_lines	keystroke_coefficient	pearson	correct	0.012086
hal_volume	keystroke_coefficient	pearson	correct	-0.010946
cyclo_comp	keystroke_coefficient	spearman	all	-0.007864
cyclo_comp	response_proportion	pearson	all	-0.004059
cyclo_comp	keystroke_coefficient	spearman	correct	0.001981

Dep. Variable:	to_int(grade_correct)	No. Observations:	1602
Model:	Logit	Df Residuals:	1597
Method:	MLE	Df Model:	4
Date:	Fri, 08 Nov 2013	Pseudo R-squ.:	0.09359
Time:	11:10:21	Log-Likelihood:	-831.62
converged:	True	LL-Null:	-917.49
		LLR p-value:	4.415e-36

	coef	std err	z	P>\|z\|	[95.0% Conf. Int.]
Intercept	1.8043	0.233	7.745	0.000	1.348 2.261
code_lines	-0.1272	0.030	-4.213	0.000	-0.186 -0.068
cyclo_comp	-0.8971	0.103	-8.676	0.000	-1.100 -0.694
hal_effort	-8.29e-05	9.87e-06	-8.404	0.000	-0.000 -6.36e-05
hal_volume	0.0172	0.002	7.969	0.000	0.013 0.021

	code_lines	duration_ms
code_lines	1.00000	0.43554
duration_ms	0.43554	1.00000