Paired t-test demo

From VoxBoWiki

Jump to: navigation, search

How can we code a t-test in the GLM framework? Below you'll find a short answer, a session in the statistical package R that demonstrates the point, and lastly, a copy of the R script. [R] is freely available for Linux, OSX, and Windows.

Summary

To code a paired t-test in a GLM/regression, include each observation as a separate data point. Include a group covariate (0/1) for your effect, and include a subject covariate for each subject (1 for that subject's two data points, 0 elsewhere). Omit one of the subject's covariates from the model if you include an intercept.

An interactive session in R

> # How do we carry out a paired t-test in a GLM or regression
> # framework?  Let's create two measures from each of 10 imaginary
> # subjects:
>
> g1=c(90,18,24,79,08,27,41,29,34,60)
> g2=c(12,93,47,62,84,76,19,34,87,51)
>
> # We can get the regular paired and unpaired t-tests this way:
>
> t.test(g1,g2)

        Welch Two Sample t-test

data:  g1 and g2
t = -1.2452, df = 17.918, p-value = 0.2291
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -41.65964  10.65964
sample estimates:
mean of x mean of y
     41.0      56.5

> t.test(g1,g2,paired=TRUE)

        Paired t-test

data:  g1 and g2
t = -0.9982, df = 9, p-value = 0.3443
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -50.62662  19.62662
sample estimates:
mean of the differences
                  -15.5

>
> # To accomplish the same thing in a regression framework, we need to
> # put our dependent measures in one vector:
>
> combined=c(g1,g2)
>
> # and define a group covariate:
>
> group=c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0)
>
> # That gets us the unpaired test, as we can see here:
>
> summary(lm(combined ~ group))

Call:
lm(formula = combined ~ group)

Residuals:
   Min     1Q Median     3Q    Max
-44.50 -18.38  -6.25  21.50  49.00

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   56.500      8.802   6.419 4.83e-06 ***
group        -15.500     12.447  -1.245    0.229
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 27.83 on 18 degrees of freedom
Multiple R-Squared: 0.07931,    Adjusted R-squared: 0.02816
F-statistic: 1.551 on 1 and 18 DF,  p-value: 0.229

>
> # The paired test differs in that we're also modeling variance due to
> # a "subject" factor, which we can define with this set of variables:
>
> s0=c(1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0)
> s1=c(0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0)
> s2=c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0)
> s3=c(0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0)
> s4=c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0)
> s5=c(0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
> s6=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0)
> s7=c(0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0)
> s8=c(0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0)
> s9=c(0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1)
>
> # Now we can do the regression.  We can either include an intercept
> # and omit one of the subjects, or omit the intercept and include all
> # the subjects.
>
> summary(fm<-lm(combined ~ 0+s0+s1+s2+s3+s4+s5+s6+s7+s8+s9+group))

Call:
lm(formula = combined ~ 0 + s0 + s1 + s2 + s3 + s4 + s5 + s6 +
    s7 + s8 + s9 + group)

Residuals:
       Min         1Q     Median         3Q        Max
-4.675e+01 -1.725e+01  2.220e-16  1.725e+01  4.675e+01

Coefficients:
      Estimate Std. Error t value Pr(>|t|)
s0       58.75      25.75   2.282   0.0484 *
s1       63.25      25.75   2.456   0.0364 *
s2       43.25      25.75   1.680   0.1273
s3       78.25      25.75   3.039   0.0140 *
s4       53.75      25.75   2.087   0.0665 .
s5       59.25      25.75   2.301   0.0469 *
s6       37.75      25.75   1.466   0.1767
s7       39.25      25.75   1.524   0.1618
s8       68.25      25.75   2.650   0.0265 *
s9       63.25      25.75   2.456   0.0364 *
group   -15.50      15.53  -0.998   0.3443
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.72 on 9 degrees of freedom
Multiple R-Squared: 0.8269,     Adjusted R-squared: 0.6153
F-statistic: 3.908 on 11 and 9 DF,  p-value: 0.02508

> summary(fm<-lm(combined ~ 1+s0+s1+s2+s3+s4+s5+s6+s7+s8+group))

Call:
lm(formula = combined ~ 1 + s0 + s1 + s2 + s3 + s4 + s5 + s6 +
    s7 + s8 + group)

Residuals:
       Min         1Q     Median         3Q        Max
-4.675e+01 -1.725e+01  1.332e-15  1.725e+01  4.675e+01

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  6.325e+01  2.575e+01   2.456   0.0364 *
s0          -4.500e+00  3.472e+01  -0.130   0.8997
s1          -3.469e-15  3.472e+01  -1e-16   1.0000
s2          -2.000e+01  3.472e+01  -0.576   0.5787
s3           1.500e+01  3.472e+01   0.432   0.6759
s4          -9.500e+00  3.472e+01  -0.274   0.7906
s5          -4.000e+00  3.472e+01  -0.115   0.9108
s6          -2.550e+01  3.472e+01  -0.734   0.4814
s7          -2.400e+01  3.472e+01  -0.691   0.5069
s8           5.000e+00  3.472e+01   0.144   0.8887
group       -1.550e+01  1.553e+01  -0.998   0.3443
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.72 on 9 degrees of freedom
Multiple R-Squared: 0.2836,     Adjusted R-squared: -0.5124
F-statistic: 0.3563 on 10 and 9 DF,  p-value: 0.9382

>
> # Inspecting the output, you can see that we get the same effect size
> # estimate and t value as with the paired t-test.

The R script used to produce the demo

# How do we carry out a paired t-test in a GLM or regression
# framework?  Let's create two measures from each of 10 imaginary
# subjects:

g1=c(90,18,24,79,08,27,41,29,34,60)
g2=c(12,93,47,62,84,76,19,34,87,51)

# We can get the regular paired and unpaired t-tests this way:

t.test(g1,g2)
t.test(g1,g2,paired=TRUE)

# To accomplish the same thing in a regression framework, we need to
# put our dependent measures in one vector:

combined=c(g1,g2)

# and define a group covariate:

group=c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0)

# That gets us the unpaired test, as we can see here:

summary(lm(combined ~ group))

# The paired test differs in that we're also modeling variance due to
# a "subject" factor, which we can define with this set of variables:

s0=c(1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0)
s1=c(0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0)
s2=c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0)
s3=c(0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0)
s4=c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0)
s5=c(0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
s6=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0)
s7=c(0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0)
s8=c(0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0)
s9=c(0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1)

# Now we can do the regression.  We can either include an intercept
# and omit one of the subjects, or omit the intercept and include all
# the subjects.

summary(fm<-lm(combined ~ 0+s0+s1+s2+s3+s4+s5+s6+s7+s8+s9+group))
summary(fm<-lm(combined ~ 1+s0+s1+s2+s3+s4+s5+s6+s7+s8+group))

# Inspecting the output, you can see that we get the same effect size
# estimate and t value as with the paired t-test.

Personal tools