Paired t-test demo
From VoxBoWiki
How can we code a t-test in the GLM framework? Below you'll find a short answer, a session in the statistical package R that demonstrates the point, and lastly, a copy of the R script. [R] is freely available for Linux, OSX, and Windows.
Summary
To code a paired t-test in a GLM/regression, include each observation as a separate data point. Include a group covariate (0/1) for your effect, and include a subject covariate for each subject (1 for that subject's two data points, 0 elsewhere). Omit one of the subject's covariates from the model if you include an intercept.
An interactive session in R
> # How do we carry out a paired t-test in a GLM or regression
> # framework? Let's create two measures from each of 10 imaginary
> # subjects:
>
> g1=c(90,18,24,79,08,27,41,29,34,60)
> g2=c(12,93,47,62,84,76,19,34,87,51)
>
> # We can get the regular paired and unpaired t-tests this way:
>
> t.test(g1,g2)
Welch Two Sample t-test
data: g1 and g2
t = -1.2452, df = 17.918, p-value = 0.2291
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-41.65964 10.65964
sample estimates:
mean of x mean of y
41.0 56.5
> t.test(g1,g2,paired=TRUE)
Paired t-test
data: g1 and g2
t = -0.9982, df = 9, p-value = 0.3443
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-50.62662 19.62662
sample estimates:
mean of the differences
-15.5
>
> # To accomplish the same thing in a regression framework, we need to
> # put our dependent measures in one vector:
>
> combined=c(g1,g2)
>
> # and define a group covariate:
>
> group=c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0)
>
> # That gets us the unpaired test, as we can see here:
>
> summary(lm(combined ~ group))
Call:
lm(formula = combined ~ group)
Residuals:
Min 1Q Median 3Q Max
-44.50 -18.38 -6.25 21.50 49.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.500 8.802 6.419 4.83e-06 ***
group -15.500 12.447 -1.245 0.229
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27.83 on 18 degrees of freedom
Multiple R-Squared: 0.07931, Adjusted R-squared: 0.02816
F-statistic: 1.551 on 1 and 18 DF, p-value: 0.229
>
> # The paired test differs in that we're also modeling variance due to
> # a "subject" factor, which we can define with this set of variables:
>
> s0=c(1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0)
> s1=c(0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0)
> s2=c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0)
> s3=c(0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0)
> s4=c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0)
> s5=c(0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0)
> s6=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0)
> s7=c(0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0)
> s8=c(0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0)
> s9=c(0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1)
>
> # Now we can do the regression. We can either include an intercept
> # and omit one of the subjects, or omit the intercept and include all
> # the subjects.
>
> summary(fm<-lm(combined ~ 0+s0+s1+s2+s3+s4+s5+s6+s7+s8+s9+group))
Call:
lm(formula = combined ~ 0 + s0 + s1 + s2 + s3 + s4 + s5 + s6 +
s7 + s8 + s9 + group)
Residuals:
Min 1Q Median 3Q Max
-4.675e+01 -1.725e+01 2.220e-16 1.725e+01 4.675e+01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
s0 58.75 25.75 2.282 0.0484 *
s1 63.25 25.75 2.456 0.0364 *
s2 43.25 25.75 1.680 0.1273
s3 78.25 25.75 3.039 0.0140 *
s4 53.75 25.75 2.087 0.0665 .
s5 59.25 25.75 2.301 0.0469 *
s6 37.75 25.75 1.466 0.1767
s7 39.25 25.75 1.524 0.1618
s8 68.25 25.75 2.650 0.0265 *
s9 63.25 25.75 2.456 0.0364 *
group -15.50 15.53 -0.998 0.3443
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 34.72 on 9 degrees of freedom
Multiple R-Squared: 0.8269, Adjusted R-squared: 0.6153
F-statistic: 3.908 on 11 and 9 DF, p-value: 0.02508
> summary(fm<-lm(combined ~ 1+s0+s1+s2+s3+s4+s5+s6+s7+s8+group))
Call:
lm(formula = combined ~ 1 + s0 + s1 + s2 + s3 + s4 + s5 + s6 +
s7 + s8 + group)
Residuals:
Min 1Q Median 3Q Max
-4.675e+01 -1.725e+01 1.332e-15 1.725e+01 4.675e+01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.325e+01 2.575e+01 2.456 0.0364 *
s0 -4.500e+00 3.472e+01 -0.130 0.8997
s1 -3.469e-15 3.472e+01 -1e-16 1.0000
s2 -2.000e+01 3.472e+01 -0.576 0.5787
s3 1.500e+01 3.472e+01 0.432 0.6759
s4 -9.500e+00 3.472e+01 -0.274 0.7906
s5 -4.000e+00 3.472e+01 -0.115 0.9108
s6 -2.550e+01 3.472e+01 -0.734 0.4814
s7 -2.400e+01 3.472e+01 -0.691 0.5069
s8 5.000e+00 3.472e+01 0.144 0.8887
group -1.550e+01 1.553e+01 -0.998 0.3443
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 34.72 on 9 degrees of freedom
Multiple R-Squared: 0.2836, Adjusted R-squared: -0.5124
F-statistic: 0.3563 on 10 and 9 DF, p-value: 0.9382
>
> # Inspecting the output, you can see that we get the same effect size
> # estimate and t value as with the paired t-test.
The R script used to produce the demo
# How do we carry out a paired t-test in a GLM or regression # framework? Let's create two measures from each of 10 imaginary # subjects: g1=c(90,18,24,79,08,27,41,29,34,60) g2=c(12,93,47,62,84,76,19,34,87,51) # We can get the regular paired and unpaired t-tests this way: t.test(g1,g2) t.test(g1,g2,paired=TRUE) # To accomplish the same thing in a regression framework, we need to # put our dependent measures in one vector: combined=c(g1,g2) # and define a group covariate: group=c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0) # That gets us the unpaired test, as we can see here: summary(lm(combined ~ group)) # The paired test differs in that we're also modeling variance due to # a "subject" factor, which we can define with this set of variables: s0=c(1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0) s1=c(0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0) s2=c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0) s3=c(0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0) s4=c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0) s5=c(0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0) s6=c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0) s7=c(0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0) s8=c(0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0) s9=c(0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1) # Now we can do the regression. We can either include an intercept # and omit one of the subjects, or omit the intercept and include all # the subjects. summary(fm<-lm(combined ~ 0+s0+s1+s2+s3+s4+s5+s6+s7+s8+s9+group)) summary(fm<-lm(combined ~ 1+s0+s1+s2+s3+s4+s5+s6+s7+s8+group)) # Inspecting the output, you can see that we get the same effect size # estimate and t value as with the paired t-test.
