Skip to main content

Analysis By Code Metrics

In [33]:
%run prelude.ipy
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [2]:
import statsmodels.formula.api as sm
from statsmodels import graphics
from statsmodels.api import families
from patsy import Treatment
In [34]:
def program_names(df):
    """Returns a list of program_version names from a pandas DataFrame."""
    return list(df.apply(lambda r: "{0}_{1}".format(r["base"], r["version"]), axis=1).values)

def names_distance(df1, df2):
    """Returns the normalized edit distance between program_version names in the two pandas DataFrames."""
    from nltk.metrics import edit_distance
    names1 = program_names(df1)
    names2 = program_names(df2)
    idxs1 = np.arange(len(names1))
    idxs2 = np.array([names1.index(n) for n in names2])
    return edit_distance(idxs1, idxs2) / float(len(idxs1))

def round_fmt(x):
    return str(np.round(x, 2))
    
trials["norm_duration_ms"] = trials.duration_ms / trials.code_lines
trials["response_ge_half"] = trials.response_proportion >= 0.5
trials["keystroke_ge_one"] = trials.keystroke_coefficient >= 1

duration_meds = {}
for base, version in programs[["base", "version"]].values:
    ts = util.filter_program(trials, base, version)
    duration_meds[(base, version)] = ts.duration_ms.median()
    
duration_all_med = trials.duration_ms.median()
#trials["duration_ge_med"] = trials.apply(lambda t: t["duration_ms"] >= duration_meds[(t["base"], t["version"])], axis=1)
trials["duration_ge_med"] = trials.duration_ms >= duration_all_med

complexity_cols = ["code_lines", "cyclo_comp", "hal_effort", "hal_volume"]
performance_cols = ["grade_value", "duration_ms", "norm_duration_ms", "keystroke_coefficient","response_proportion"]
demo_cols = ["age", "py_years", "prog_years", "degree_num", "cs_major_num"]

Programs and their Complexity Metrics

Let's start off by looking at metrics for each of the programs. The following metrics are shown below:

  • code_chars - number of characters in the code
  • code_lines - number of lines in the program (includes blank lines)
  • cyclo_comp - McCabe's Cyclomatic Complexity (computed with PyMetrics 0.8.1)
  • hal_effort - Halstead's Effort (computed with PyMetrics 0.8.1)
  • hal_volume - Halstead's Volume (computed with PyMetrics 0.8.1)
  • output_chars - number of characters in the correct output
  • output_lines - number of lines in the correct output
In [102]:
show_dataframe(programs)
Out[102]:
base version code_chars code_lines cyclo_comp hal_effort hal_volume output_chars output_lines
between functions 496 24 7 94192.063393 830.218507 33 3
between inline 365 19 7 45596.278445 660.815630 33 3
counting nospace 77 3 2 738.402323 82.044703 116 8
counting twospaces 81 5 2 738.402323 82.044703 116 8
funcall nospace 50 4 2 937.653743 109.392937 3 1
funcall space 54 4 2 937.653743 109.392937 3 1
funcall vars 72 7 2 1735.731282 154.287225 3 1
initvar bothbad 103 9 3 3212.495182 212.396376 5 2
initvar good 103 9 3 3212.495182 212.396376 6 2
initvar onebad 103 9 3 2866.823438 208.496250 6 2
order inorder 137 14 4 8372.306047 303.069902 6 1
order shuffled 137 14 4 8372.306047 303.069902 6 1
overload multmixed 78 11 1 2340.000000 120.000000 9 3
overload plusmixed 78 11 1 3428.296498 117.206718 7 3
overload strings 98 11 1 3428.296498 117.206718 21 3
partition balanced 105 5 4 2896.001287 188.869649 26 4
partition unbalanced 102 5 4 2382.342809 177.199052 19 3
partition unbalanced_pivot 120 6 4 2707.766879 196.214991 19 3
rectangle basic 293 18 2 18801.174998 396.335705 7 2
rectangle class 421 21 5 43203.698685 620.148785 7 2
rectangle tuples 277 14 2 15627.749381 403.817813 7 2
scope diffname 144 12 3 2779.714286 188.000000 2 1
scope samename 156 12 3 2413.342134 183.623858 2 1
whitespace linedup 275 14 1 6480.000000 216.000000 13 3
whitespace zigzag 259 14 1 6480.000000 216.000000 13 3

Of the 7 metrics listed above, only 4 commonly used as measures of complexity: lines of code, Cyclomatic Complexity, Halstead Effort, and Halstead Volume. We will focus on these 4 metrics for the rest of the analysis.

Correlations Between Metrics

Although the 4 complexity metrics measure different aspects of a program's text, they are often found to be highly correlated in large repositories of program code. Let's see how correlated they are in our 25 programs.

In [22]:
fmts = { c: round_fmt for c in complexity_cols }
print programs[complexity_cols].corr("pearson").to_latex(formatters=fmts)
\begin{tabular}{lrrrr}
\toprule
{} & code\_lines & cyclo\_comp & hal\_effort & hal\_volume \\
\midrule
code\_lines &        1.0 &       0.46 &       0.78 &       0.86 \\
cyclo\_comp &       0.46 &        1.0 &       0.72 &       0.77 \\
hal\_effort &       0.78 &       0.72 &        1.0 &       0.94 \\
hal\_volume &       0.86 &       0.77 &       0.94 &        1.0 \\
\bottomrule
\end{tabular}


We can see strong correlations between the Halstead metrics and everything else. There is also a medium correlation between lines of code and Cyclomatic Complexity (usually higher in larger code bases). Because the two Halstead metrics are so highly correlated with each other, we will only consider Halstead Effort moving forward.

Visualizing the Complexity Space

With 3 complexity metrics remaining, we can visualize our programs as points in a 3D complexity space. Below is a series of plots from different angles of this space.

In [16]:
xs, ys, zs = programs.code_lines, programs.cyclo_comp, programs.hal_effort / 1000.0
def plot_metrics(ax):
    ax.set_xlabel("Lines of Code")
    ax.set_ylabel("Cyclomatic Complexity")
    ax.set_zlabel("Halstead Effort")
    ax.scatter3D(xs, ys, zs)
    
fig = plot.misc.plot3d_views(plot_metrics, figsize=(20, 10))
fig.tight_layout()
fig.suptitle("Code Metrics Scatter Plot (all programs)")
fig
Out[16]:

From the plots, we can see that our programs are fairly spread out along the lines of code and Cyclomatic Complexity axes. Only a few have high Halstead Effort, but this is not surprising given only a few have many lines of code (and these two metrics are strongly correlated).

Predicting Performance

If our 3 complexity metrics (lines of code, Cyclomatic Complexity, and Halstead Effort) are truly measuring the cognitive complexity of their respective programs, then we should be able to predict participants' performance using them.

For example, lines of code is often used as a proxy for cognitive complexity by assuming that human short-term memory is limited. More lines of code places a cognitive burden on the reader and, therefore, longer programs should be harder to understand.

We will attempt to use each complexity metric to predict performance in 4 different categories:

  • Grade
    • A grade of 7 or higher (out of 10) is correct
    • More complex programs should result in a lower grade
  • Trial duration
    • Time from start to finish (reading + responding)
    • More complex programs should take longer to read and respond to (higher duration)
  • Keystroke coefficient
    • Number of actual keystrokes / required keystrokes
    • More complex programs should require more keystrokes due to mistakes/corrections (higher coefficient)
  • Response proportion
    • Time spent responding / trial duration
    • More complex programs should require more reading time up front (higher proportion)

To predict performance, we will order the list of programs using the complexity metric and performance metric. The normalized edit distance between these two orderings will low (close to 0) if the orderings are very close and high (close to 1.0) otherwise.

In [4]:
# Compute median trial duration by program (all trials)
med_duration_all = pandas.DataFrame(
    trials.groupby(["base", "version"])\
    .apply(lambda f: (f["duration_ms"].median() / 1000.0) / f["code_lines"].mean())
    .order(ascending=False), columns=["sec_per_line"])\
    .reset_index()
    
# Compute median trial duration by program (correct trials only)    
med_duration_correct = pandas.DataFrame(
    trials[trials.grade_correct].groupby(["base", "version"])\
    .apply(lambda f: (f["duration_ms"].median() / 1000.0) / f["code_lines"].mean())
    .order(ascending=False), columns=["sec_per_line"])\
    .reset_index()
    
# Compute median keystroke coefficient by program
med_keystroke_coeff = pandas.DataFrame(
    trials.groupby(["base", "version"])\
    .apply(lambda f: f["keystroke_coefficient"].median())
    .order(ascending=False), columns=["key_coeff"])\
    .reset_index()
        
# Compute median response proportion by program
med_response_prop = pandas.DataFrame(
    trials.groupby(["base", "version"])\
    .apply(lambda f: f["response_proportion"].median())
    .order(ascending=True), columns=["resp_prop"])\
    .reset_index()

By Lines of Code

Grade

In [134]:
trials[["code_lines", "grade_value"]].corr()
Out[134]:
code_lines grade_value
code_lines 1.000000 -0.098965
grade_value -0.098965 1.000000

Trial Duration

In [3]:
trials[["code_lines", "duration_ms"]].corr()
Out[3]:
code_lines duration_ms
code_lines 1.00000 0.43554
duration_ms 0.43554 1.00000
In [135]:
trials[["code_lines", "norm_duration_ms"]].corr()
Out[135]:
code_lines norm_duration_ms
code_lines 1.000000 -0.257481
norm_duration_ms -0.257481 1.000000
In [136]:
trials[trials.grade_correct][["code_lines", "norm_duration_ms"]].corr()
Out[136]:
code_lines norm_duration_ms
code_lines 1.000000 -0.296904
norm_duration_ms -0.296904 1.000000

Keystroke Coefficient

In [137]:
trials[["code_lines", "keystroke_coefficient"]].corr()
Out[137]:
code_lines keystroke_coefficient
code_lines 1.000000 0.027206
keystroke_coefficient 0.027206 1.000000

Response Proportion

In [139]:
trials[["code_lines", "response_proportion"]].corr()
Out[139]:
code_lines response_proportion
code_lines 1.000000 0.031656
response_proportion 0.031656 1.000000

By All Metrics (Regressors)

In [6]:
rows = []

for comp_col in complexity_cols:
    for perf_col in performance_cols:    
        for corr_alg in ["pearson", "spearman"]:
            for correct_only in [False, True]:
                ts = trials[trials.grade_correct] if correct_only else trials
                result = ts[[comp_col, perf_col]].corr(corr_alg).values[0, 1]
                rows.append([comp_col, perf_col, corr_alg, "correct" if correct_only else "all", result])
                
corr_df = pandas.DataFrame(rows, columns=["Complexity Metric", "Performance Metric",
                           "Algorithm", "Trial Filter", "Correlation"])

# Sort descending by absolute value
corr_df = corr_df.reindex(corr_df["Correlation"].abs().order(ascending=False).index)
show_dataframe(corr_df)
Out[6]:
Complexity Metric Performance Metric Algorithm Trial Filter Correlation
hal_volume duration_ms spearman all 0.502864
hal_volume duration_ms pearson all 0.475635
code_lines duration_ms spearman all 0.470349
hal_effort duration_ms spearman all 0.467037
hal_effort duration_ms pearson all 0.458027
hal_effort duration_ms pearson correct 0.450659
hal_volume duration_ms spearman correct 0.447555
hal_volume duration_ms pearson correct 0.440966
code_lines duration_ms pearson all 0.435540
code_lines duration_ms spearman correct 0.416915
hal_effort duration_ms spearman correct 0.406216
code_lines duration_ms pearson correct 0.394970
cyclo_comp duration_ms pearson all 0.338725
cyclo_comp duration_ms pearson correct 0.284601
cyclo_comp grade_value pearson all -0.244609
hal_effort grade_value pearson all -0.212394
cyclo_comp duration_ms spearman all 0.211662
hal_effort grade_value spearman correct -0.209779
cyclo_comp grade_value spearman all -0.196650
code_lines grade_value spearman correct -0.187668
hal_volume grade_value spearman correct -0.174171
hal_volume grade_value pearson all -0.168025
code_lines grade_value pearson correct -0.151874
cyclo_comp duration_ms spearman correct 0.138354
cyclo_comp response_proportion spearman correct -0.134434
cyclo_comp grade_value pearson correct -0.124823
code_lines keystroke_coefficient spearman all 0.115503
hal_effort keystroke_coefficient spearman correct 0.113340
hal_volume grade_value pearson correct -0.112406
hal_volume keystroke_coefficient spearman correct 0.107100
code_lines grade_value spearman all -0.102843
code_lines grade_value pearson all -0.098965
hal_volume keystroke_coefficient spearman all 0.096389
hal_effort response_proportion spearman all 0.095135
code_lines keystroke_coefficient spearman correct 0.093464
cyclo_comp response_proportion spearman all -0.093067
hal_volume grade_value spearman all -0.092600
hal_effort grade_value spearman all -0.089613
hal_effort keystroke_coefficient spearman all 0.086791
hal_effort response_proportion spearman correct 0.085854
hal_effort grade_value pearson correct -0.082825
hal_effort response_proportion pearson all 0.079122
cyclo_comp grade_value spearman correct -0.078175
hal_effort keystroke_coefficient pearson all -0.066700
cyclo_comp response_proportion pearson correct -0.057641
cyclo_comp keystroke_coefficient pearson correct -0.044185
hal_effort keystroke_coefficient pearson correct -0.040546
hal_volume keystroke_coefficient pearson all -0.036858
hal_volume response_proportion spearman all 0.034812
code_lines response_proportion spearman correct 0.034718
hal_volume response_proportion pearson correct -0.031945
code_lines response_proportion spearman all 0.031656
hal_volume response_proportion pearson all 0.029267
code_lines response_proportion pearson all 0.027499
code_lines keystroke_coefficient pearson all 0.027206
cyclo_comp keystroke_coefficient pearson all -0.026477
hal_effort response_proportion pearson correct 0.021303
hal_volume response_proportion spearman correct 0.020995
code_lines response_proportion pearson correct 0.017609
code_lines keystroke_coefficient pearson correct 0.012086
hal_volume keystroke_coefficient pearson correct -0.010946
cyclo_comp keystroke_coefficient spearman all -0.007864
cyclo_comp response_proportion pearson all -0.004059
cyclo_comp keystroke_coefficient spearman correct 0.001981
In [31]:
rows = []

for comp_col in complexity_cols:
    for perf_col in [c for c in performance_cols if c != "norm_duration_ms"]:    
        result = trials[[comp_col, perf_col]].corr().values[0, 1]
        rows.append([comp_col, perf_col, result])
                
corr_df = pandas.DataFrame(rows, columns=["Complexity Metric", "Performance Metric", "Correlation"])

# Sort descending by absolute value
#corr_df = corr_df.reindex(corr_df["Correlation"].abs().order(ascending=False).index)
#show_dataframe(corr_df)
fmts = [round_fmt] * (len(performance_cols) - 1)
print corr_df.pivot("Complexity Metric", "Performance Metric", "Correlation").to_latex(formatters=fmts)
\begin{tabular}{lrrrr}
\toprule
Performance Metric & duration\_ms & grade\_value & keystroke\_coefficient & response\_proportion \\
\midrule
Complexity Metric &             &             &                       &                     \\
code\_lines        &        0.44 &        -0.1 &                  0.03 &                0.03 \\
cyclo\_comp        &        0.34 &       -0.24 &                 -0.03 &                  -0 \\
hal\_effort        &        0.46 &       -0.21 &                 -0.07 &                0.08 \\
hal\_volume        &        0.48 &       -0.17 &                 -0.04 &                0.03 \\
\bottomrule
\end{tabular}


In [58]:
from eyecode import classify
cp_cols = performance_cols #["duration_ms", "response_proportion"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance
    fi_df = classify.feature_importances(trials, complexity_cols, perf_col, regressor=True)
    ax = plot.misc.feature_importances(fi_df, ax=axes[i, 0])
    ax.set_title("Feature Importances ({0})".format(perf_col))

    # Cross-validation
    cv_df = classify.cross_validation(trials, complexity_cols, perf_col, regressor=True)
    ax = plot.misc.cross_validation(cv_df, ax=axes[i, 1])
    ax.set_title("$R^2$ ({0}, CV=10)".format(perf_col))
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
fig.tight_layout()
fig
grade_value 0.0801560340656
duration_ms 0.185165864585
norm_duration_ms 0.144450190949
keystroke_coefficient 0.0542231991986
response_proportion 0.237702229863

Out[58]:

By All Metrics (Classifiers)

In [23]:
from eyecode import classify
In [152]:
importances = classify.feature_importances(trials, complexity_cols, "grade_correct")
cross_val = classify.cross_validation(trials, complexity_cols, "grade_correct", repeat=10)
In [154]:
axes = plot.misc.importances_and_crossval(importances, cross_val, "grade_correct",
                                          figsize=(12, 5), repeat=10)
axes[0].figure.tight_layout()
axes[0].figure
Out[154]:
In [155]:
importances = classify.feature_importances(trials, complexity_cols + demo_cols, "grade_correct")
cross_val = classify.cross_validation(trials, complexity_cols + demo_cols, "grade_correct", repeat=10)
In [157]:
axes = plot.misc.importances_and_crossval(importances, cross_val, "grade_correct",
                                          figsize=(12, 5), repeat=10)
axes[0].figure.tight_layout()
axes[0].figure
Out[157]:
In [161]:
stats.wilcox_test(cross_val[cross_val.classifier == "dummy"].score, classify.cross_val_performance(cross_val))
Out[161]:
1.6437797589705496e-48
In [47]:
scores = classify.one_at_a_time(trials, complexity_cols + demo_cols, "grade_correct", norm=True)
print zip(complexity_cols + demo_cols, scores)
[('code_lines', 1.0), ('cyclo_comp', 0.97194176133322308), ('hal_effort', 0.9983173390954958), ('hal_volume', 0.98232860410350886), ('age', 0.84522499606593615), ('py_years', 0.78965941806827289), ('prog_years', 0.79526529224529796), ('degree_num', 0.78376720938573352), ('cs_major_num', 0.75866836611437904)]

In [167]:
cp_cols = ["grade_value", "duration_ms", "response_proportion", "keystroke_coefficient"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, complexity_cols, perf_col, regressor=True)
    cv_df = classify.cross_validation(trials, complexity_cols, perf_col, repeat=10, regressor=True)
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10, regressor=True)
    
fig.tight_layout()
fig
grade_value 0.0801560340656
duration_ms 0.185165864585
response_proportion 0.237702229863
keystroke_coefficient 0.0542231991986

Out[167]:
In [168]:
cp_cols = ["grade_value", "duration_ms", "response_proportion", "keystroke_coefficient"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, demo_cols, perf_col, regressor=True)
    cv_df = classify.cross_validation(trials, demo_cols, perf_col, repeat=10, regressor=True)
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10, regressor=True)
    
fig.tight_layout()
fig
grade_value -0.139943427395
duration_ms -0.122278190501
response_proportion -0.154682920042
keystroke_coefficient -0.0925546543477

Out[168]:
In [169]:
from eyecode import classify
cp_cols = ["grade_correct", "duration_ge_med", "response_ge_half", "keystroke_ge_one"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, complexity_cols, perf_col)
    cv_df = classify.cross_validation(trials, complexity_cols, perf_col, repeat=10)
    print perf_col, classify.cross_val_performance(cv_df).mean()
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10)
    
fig.tight_layout()
fig
grade_correct 0.708500719428
duration_ge_med 0.763862149398
response_ge_half 0.731754215876
keystroke_ge_one 0.843033038998

Out[169]:
In [165]:
cp_cols = ["grade_correct", "duration_ge_med", "response_ge_half", "keystroke_ge_one"]
rows = len(cp_cols)
fig, axes = pyplot.subplots(rows, 2, figsize=(15, 5 * rows))
for i, perf_col in enumerate(cp_cols):
    # Feature importance and cross validation
    fi_df = classify.feature_importances(trials, demo_cols, perf_col)
    cv_df = classify.cross_validation(trials, demo_cols, perf_col, repeat=10)
    
    # Plot both
    plot.misc.importances_and_crossval(fi_df, cv_df, perf_col,
                                       axes=[axes[i, 0], axes[i, 1]],
                                       repeat=10)
    
fig.tight_layout()
fig
Out[165]:

Linear Models

In [127]:
complexity_cols = ["code_lines", "cyclo_comp", "hal_volume", "hal_effort"]
performance_cols = ["grade_value", "duration_sec", "keystroke_coefficient","response_proportion"]
demo_cols = ["age", "py_years", "prog_years"]
In [128]:
fits = {}
for perf_col in performance_cols:
    m_comp = sm.ols(formula="{0} ~ {1}".format(perf_col, "+".join(complexity_cols)), data=trials)
    m_demo = sm.ols(formula="{0} ~ {1}".format(perf_col, "+".join(demo_cols)), data=trials)
    m_comp_demo = sm.ols(formula="{0} ~ {1}".format(perf_col, "+".join(complexity_cols + demo_cols)), data=trials)
    for model, kind in zip([m_comp, m_demo, m_comp_demo], ["comp", "demo", "comp_demo"]):
        fit = model.fit()
        fits[(perf_col, kind)] = fit
In [129]:
sorted([(k, v.rsquared_adj) for k, v in fits.iteritems()], key=lambda x: x[1], reverse=True)
Out[129]:
[(('duration_sec', 'comp_demo'), 0.24410930143517773),
 (('duration_sec', 'comp'), 0.23095969293024032),
 (('grade_value', 'comp_demo'), 0.10196661960638331),
 (('grade_value', 'comp'), 0.09598149152467228),
 (('keystroke_coefficient', 'comp_demo'), 0.028032441108442496),
 (('keystroke_coefficient', 'comp'), 0.024773421650415406),
 (('response_proportion', 'comp_demo'), 0.022982721295098596),
 (('response_proportion', 'comp'), 0.022889770128601494),
 (('duration_sec', 'demo'), 0.011708375030034635),
 (('grade_value', 'demo'), 0.0053168475861895548),
 (('keystroke_coefficient', 'demo'), 0.0030743177137255717),
 (('response_proportion', 'demo'), -0.00040819671687919445)]
In [130]:
ax = plot.misc.fit_coefficients(fits[("duration_sec", "comp_demo")], skip_intercept=True)
ax.set_title("OLS Coefficients for Duration (sec) Prediction\n(Complexity + Demographics)")
ax.set_ylabel("OLS Coefficient (95% CI)")
ax.figure.tight_layout()
ax.figure
Out[130]:
In [131]:
ax = plot.misc.fit_coefficients(fits[("grade_value", "comp_demo")], skip_intercept=True)
ax.set_title("OLS Coefficients for Grade Prediction\n(Complexity + Demographics)")
ax.set_ylabel("OLS Coefficient (95% CI)")
ax.figure.tight_layout()
ax.figure
Out[131]:
In [132]:
bin_perf_cols = ["grade_correct", "duration_ge_med", "keystroke_ge_one", "response_ge_half"]
In [134]:
fits = {}
for perf_col in bin_perf_cols:
    m_comp = sm.logit(formula="to_int({0}) ~ {1}".format(perf_col, "+".join(complexity_cols)), data=trials)
    m_demo = sm.logit(formula="to_int({0}) ~ {1}".format(perf_col, "+".join(demo_cols)), data=trials)
    m_comp_demo = sm.logit(formula="to_int({0}) ~ {1}".format(perf_col, "+".join(complexity_cols + demo_cols)), data=trials)
    for model, kind in zip([m_comp, m_demo, m_comp_demo], ["comp", "demo", "comp_demo"]):
        fit = model.fit()
        fits[(perf_col, kind)] = fit
Optimization terminated successfully.
         Current function value: 0.519114
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.570012
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.515841
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.597499
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.685247
         Iterations 4
Optimization terminated successfully.
         Current function value: 0.588063
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.386580
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.471764
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.385161
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.645306
         Iterations 5
Optimization terminated successfully.
         Current function value: 0.664588
         Iterations 4
Optimization terminated successfully.
         Current function value: 0.644587
         Iterations 5

In [138]:
sorted([(k, v.prsquared) for k, v in fits.iteritems()], key=lambda x: x[1], reverse=True)
Out[138]:
[(('keystroke_ge_one', 'comp_demo'), 0.18560709231567341),
 (('keystroke_ge_one', 'comp'), 0.18260707541278021),
 (('duration_ge_med', 'comp_demo'), 0.15141725170176235),
 (('duration_ge_med', 'comp'), 0.13780060404506345),
 (('grade_correct', 'comp_demo'), 0.099308533119037534),
 (('grade_correct', 'comp'), 0.093594630945558421),
 (('response_ge_half', 'comp_demo'), 0.030762995773983892),
 (('response_ge_half', 'comp'), 0.029681171729497913),
 (('duration_ge_med', 'demo'), 0.011179930751445299),
 (('grade_correct', 'demo'), 0.004722992003818316),
 (('keystroke_ge_one', 'demo'), 0.0024912951888387269),
 (('response_ge_half', 'demo'), 0.000687332978648314)]
In [139]:
ax = plot.misc.fit_coefficients(fits[("keystroke_ge_one", "comp_demo")], skip_intercept=True)
ax.set_title("Logit Coefficients for Grade Correct Prediction\n(Complexity + Demographics)")
ax.set_ylabel("Logit Coefficient (95% CI)")
ax.figure.tight_layout()
ax.figure
Out[139]:
In [12]:
fit = sm.ols(formula="grade_value ~ code_lines + cyclo_comp + hal_effort", data=trials).fit()
fit.summary()
Out[12]:
OLS Regression Results
Dep. Variable: grade_value R-squared: 0.069
Model: OLS Adj. R-squared: 0.068
Method: Least Squares F-statistic: 39.73
Date: Fri, 08 Nov 2013 Prob (F-statistic): 9.27e-25
Time: 10:56:04 Log-Likelihood: -4056.9
No. Observations: 1602 AIC: 8122.
Df Residuals: 1598 BIC: 8143.
Df Model: 3
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 8.3994 0.282 29.756 0.000 7.846 8.953
code_lines 0.0802 0.022 3.595 0.000 0.036 0.124
cyclo_comp -0.3135 0.067 -4.690 0.000 -0.445 -0.182
hal_effort -2.879e-05 7.47e-06 -3.851 0.000 -4.34e-05 -1.41e-05
Omnibus: 240.363 Durbin-Watson: 1.739
Prob(Omnibus): 0.000 Jarque-Bera (JB): 362.826
Skew: -1.165 Prob(JB): 1.63e-79
Kurtosis: 3.048 Cond. No. 9.21e+04
In [19]:
fit = sm.glm(formula="grade_correct ~ 1",
             data=trials, family=families.Binomial()).fit()
fit.summary()
Out[19]:
Generalized Linear Model Regression Results
Dep. Variable: ['grade_correct[False]', 'grade_correct[True]'] No. Observations: 1602
Model: GLM Df Residuals: 1601
Model Family: Binomial Df Model: 0
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -917.49
Date: Fri, 08 Nov 2013 Deviance: 1835.0
Time: 11:07:51 Pearson chi2: 1.60e+03
No. Iterations: 5
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -1.0477 0.057 -18.386 0.000 -1.159 -0.936
In [21]:
def to_int(x):
    return x.astype(int)
In [22]:
fit = sm.logit(formula="to_int(grade_correct) ~ code_lines + cyclo_comp + hal_effort + hal_volume",
             data=trials).fit()
fit.summary()
Optimization terminated successfully.
         Current function value: 0.519114
         Iterations 6

Out[22]:
Logit Regression Results
Dep. Variable: to_int(grade_correct) No. Observations: 1602
Model: Logit Df Residuals: 1597
Method: MLE Df Model: 4
Date: Fri, 08 Nov 2013 Pseudo R-squ.: 0.09359
Time: 11:10:21 Log-Likelihood: -831.62
converged: True LL-Null: -917.49
LLR p-value: 4.415e-36
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 1.8043 0.233 7.745 0.000 1.348 2.261
code_lines -0.1272 0.030 -4.213 0.000 -0.186 -0.068
cyclo_comp -0.8971 0.103 -8.676 0.000 -1.100 -0.694
hal_effort -8.29e-05 9.87e-06 -8.404 0.000 -0.000 -6.36e-05
hal_volume 0.0172 0.002 7.969 0.000 0.013 0.021
In [18]:
ax = plot.misc.fit_coefficients(fit)
ax.set_title("OLS Coefficients for 
ax.figure.tight_layout()
ax.figure
Out[18]: