## t-Tests in R

All three types of \(t\)-tests can be performed using the same `t.test`

function in R. The primary arguments are the following:

`x`

and (optionally)`y`

,*or*a formula, e.g.`y ~ x`

. These specify the interval-level outcome variable`y`

and the two-level factor variable`x`

. The formula syntax can be used for the independent samples \(t\)-test. If a formula is specified, the`data`

argument can be specified so that it is not necessary to specify the data frame using`df$x`

and`df$y`

notation.

`alternative`

, which specifies if a two-tailed test will be used (the default), or a one-sided test.`mu`

, which is the null hypothesized difference between means. This can be set for the one-sample test (see example below) but is usually left at its default value of`0`

for differences in means (paired or independent).`paired`

, which specifies, when two means are compared, whether the observations are paired or independent. The default is`paired = FALSE`

, or the independent samples \(t\)-test.`var.equal`

is set for independent samples \(t\)-tests to determine if an adjustment should be made for unequal variances between the groups. It defaults to`FALSE`

, meaning equal variances are not assumed.`conf.level`

, the confidence level. By default this is`0.95`

, corresponding to \(\alpha = 0.05\).

The data used in this tutorial can be downloaded from this GitHub repository. The one-sample and independent samples examples will use the `iq_long.sav`

data, and the paired samples example will use `iq_wide.sav`

. These are SPSS files and can be read in using the haven package. Assuming the data are saved in a local folder `data`

inside the current working directory, the following syntax can be run:

```
library(tidyverse)
library(haven)
library(knitr)
library(broom)
iq_long <- read_sav("data/iq_long.sav") %>%
mutate(gender = as_factor(gender))
iq_wide <-read_sav("data/iq_wide.sav")
```

Notice that we load four packages that will be used in this tutorial, `tidyverse`

, `haven`

, `knitr`

, and `broom`

. Also note that `read_sav`

automatically treats the `gender`

variable as numeric. The `mutate`

call makes sure R knows this is a factor (categorical) variable.

## One Sample \(t\)-Test

Say we have data from 200 subjects who have taken an IQ test. We know in the general population the mean IQ is 100. We want to test the hypothesis that our sample comes from a different population, e.g. one that is more gifted than the general population. We will first look at the distribution of scores to determine if there are any outliers or if the distribution is highly skewed. Then we will test the null hypothesis that our sample comes from a population where \(\mu \neq 100\).

First, let’s use `ggplot`

to look at our data using a histogram.

```
iq_long %>%
ggplot(aes(x = iq)) + geom_histogram(color = "black", fill = "firebrick")
```

`## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.`

We’ll ignore the message that `ggplot`

is defaulting to 30 bins for the histogram, since this is a reasonable choice for our data. The observations look like they may be centered above 100, and the distribution looks roughly symmetric. What are the mean and standard deviation?

```
iq_long %>%
summarise(Mean = mean(iq),
SD = sd(iq)) %>%
kable(align = c("c", "c"))
```

Mean | SD |
---|---|

105.0351 | 15.78354 |

The call to `kable`

produces a nicely formatted table in our output file.

Is this mean significantly different from 100? We can run a one-sample \(t\)-test to determine the answer. The syntax is the following:

`t.test(iq_long$iq, mu = 100)`

`iq_long`

is our sample data, and`iq`

is the variable we are testing.

`mu = 100`

means that we are testing our sample against a population mean of 100.

Leaving the other arguments as their defaults results in a two-sided test and confidence level of 0.95. If we run the code, we get the following output:

```
##
## One Sample t-test
##
## data: iq_long$iq
## t = 4.5115, df = 199, p-value = 1.099e-05
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 102.8343 107.2359
## sample estimates:
## mean of x
## 105.0351
```

The output is a little ugly, we can get a nicer output if we pipe the output into the `tidy`

function from the `broom`

package.

```
t.test(iq_long$iq, mu = 100) %>%
tidy() %>%
kable()
```

estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|

105.0351 | 4.511477 | 1.1e-05 | 199 | 102.8343 | 107.2359 | One Sample t-test | two.sided |

The `estimate`

column gives us our sample mean. The `statistic`

column tells us that our \(t\)-statistic is equal to 4.511. When compared to a \(t\)-distribution with 199 degrees of freedom (from the `parameter`

column), we get a \(p\)-value that is less than .001. We also get a 95% confidence interval around our sample mean of [102.83, 107.24]. Since the \(p\)-value that we found is less than 0.05, and the 95% confidence interval does not include 100, we reject the null hypothesis.

If we wanted to test the null hypothesis that our sample comes from a population with a mean of 103, we would run another \(t\)-test that changes the `mu`

argument.

```
t.test(iq_long$iq, mu = 103) %>%
broom::tidy() %>%
knitr::kable()
```

estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|

105.0351 | 1.823461 | 0.0697339 | 199 | 102.8343 | 107.2359 | One Sample t-test | two.sided |

This gives us a p-value of 0.06971, so we would not reject the null hypothesis.

## Independent Samples \(t\)-Test

Say we wanted to test whether there is a significant difference in the IQs of males and females in our sample of 200 subjects. We’ll start out by visualizing the differences between groups using boxplots. This will give us an initial sense of whether differences exist and allow us to look for major outliers, skew in the distributions, or dramatically unequal variability between the two groups.

```
iq_long %>%
ggplot(aes(x = gender, y = iq)) +
geom_boxplot(color = 'black', fill = 'firebrick') +
labs(x = "Gender", y = "IQ")
```

Females look like their central tendency sits higher than the male median. The distributions are both similar (consistent with the equal variance assumption) and roughly symmetric. We can get the specific descriptive statistics for each gender as follows:

```
iq_long %>%
group_by(gender) %>%
summarise(Mean = mean(iq),
SD = sd(iq)) %>%
kable(align = c("c", "c"))
```

`## `summarise()` ungrouping output (override with `.groups` argument)`

gender | Mean | SD |
---|---|---|

Male | 103.5565 | 13.88069 |

Female | 106.4000 | 17.31133 |

The mean IQ for males is 104 (*SD* = 13.9), and the mean IQ for females is 106 (*SD* = 17.3). Are these differences statistically significant? The following syntax tests performs the independent samples \(t\)-test. Note that, for the independent samples \(t\)-test, we can use the formula syntax.

`t.test(iq ~ gender, data = iq_long, var.equal = TRUE)`

In this syntax, `iq`

is the interval-level outcome variable, and `gender`

is the two-level factor variable. By default, R will conduct a two-sided test at the 95% confidence level. Also by default, R will run the test using the version of the \(t\)-test that adjusts for unequal variance. We saw that our two groups had a similar variance, so we changed the `var.equal`

argument to equal `TRUE`

. We can leave it as the default `FALSE`

if we wanted to be more conservative.

We’ll again run the syntax and pipe the results into the `tidy`

function to get nicer output.

```
t.test(iq ~ gender, data = iq_long, var.equal = TRUE) %>%
tidy() %>%
kable()
```

estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|---|---|

-2.843542 | 103.5565 | 106.4 | -1.274893 | 0.203841 | 198 | -7.24196 | 1.554876 | Two Sample t-test | two.sided |

The first two `esimtate`

columns give the means for each group. `statistic`

is the value of the \(t\)-statistic. When evaluated against a \(t\)-distribution with 198 degrees of freedom (listed in the `parameter`

column), we get a \(p\)-value of .204. This is greater than 0.05, so we fail to reject the null-hypothesis of no difference. We also see that the 95% confidence interval around the mean difference is [-7.204, 1.517]. Because this includes zero (equivalent to \(p < 0.05\)), we do not reject the null.

## Paired Samples \(t\)-Test

Finally, say we have IQ data collected on 100 individuals at two points in time. We want to know if an intervention that occurs in between the measures - say forming a test study group - increases IQ scores. The null hypothesis is that the mean change (\(IQ_{t2} - IQ_{t1}\)) is zero.

To conduct a dependent (or paired) samples \(t\)-test in r, the data must be in wide format. That is, the \(t_1\) measures are in one column, the \(t_2\) measures are in another, and each row represents one subject.

```
head(iq_wide) %>%
kable(align = rep("c", 3))
```

id | Time_1 | Time_2 |
---|---|---|

1 | 93.89 | 83.71 |

2 | 131.22 | 116.23 |

3 | 102.80 | 110.93 |

4 | 107.27 | 95.90 |

5 | 89.94 | 101.37 |

6 | 104.17 | 99.24 |

First, we’ll visualize the differences between the two time points. However, this requires the data be in long format (\(t_1\) is stacked on \(t_2\) in a single column). We can use the `gather`

function from the `tidyr`

package.

```
iq_for_graph <- iq_wide %>%
gather(Time, IQ, Time_1:Time_2)
```

The first argument names the new column that will contain the original variable names (`Time_1`

and `Time_2`

). The second argument provides the name of the column that will contain the values. The remaining arguments simply name the variables from the wide format data that will be stacked. The output looks like this:

```
head(iq_for_graph) %>%
kable(align = rep("c", 3))
```

id | Time | IQ |
---|---|---|

1 | Time_1 | 93.89 |

2 | Time_1 | 131.22 |

3 | Time_1 | 102.80 |

4 | Time_1 | 107.27 |

5 | Time_1 | 89.94 |

6 | Time_1 | 104.17 |

Now we’ll create the graph.

```
iq_for_graph %>%
ggplot(aes(x = Time, y = IQ)) +
geom_boxplot(color = "black", fill = "firebrick")
```

If we were going to publish this, we’d probably want to take the time to clean up the x-axis tick labels by removing the underscore. We can do this directly in `ggplot`

as follows:

```
iq_for_graph %>%
ggplot(aes(x = Time, y = IQ)) + geom_boxplot(color = "black", fill = "firebrick") +
scale_x_discrete(labels = c("Time 1", "Time 2"))
```

Looking at the figure, it looks like the \(t_2\) scores are a little higher than the \(t_1\) scores, though not by much. Is this difference statistically significant? We run the paired samples \(t\)-test using the wide format of the data as follows:

```
t.test(iq_wide$Time_2, iq_wide$Time_1, paired = TRUE) %>%
tidy() %>%
kable()
```

estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|

3.5234 | 1.55743 | 0.1225596 | 99 | -0.9655275 | 8.012327 | Paired t-test | two.sided |

The first argument is the column containing the \(t_2\) measures, and the second argument is the \(t_1\) measures. Note that we can’t use the formula syntax here because R needs to know which \(t_1\) observations go with which \(t_2\) observations, and it can only do so if the data are in wide format. This means we have to use the syntax in which the data frame `iq_wide`

is prepended to the variable name with the `$`

operator.

We see that the difference in means is 3.52, which results in a \(t\)-statistic equal to 1.56. Evaluating this against a \(t\)-distribution with 99 degrees of freedom, we get a (two-sided) \(p\)-value of .123, not enough to be statistically significant. The 95% confidence interval around the estimated mean difference is [-0.966, 8.012]. Since this interval includes zero, and because \(p\) > 0.05, we do not reject the null hypothsis.

Note that the paired samples \(t\)-test is equivalent to creating a difference score, \(D = IQ_{t2} - IQ_{t1}\), and then testing if the mean difference score is significantly different from zero in a one-sample \(t\)-test. To see this, first create the difference score:

```
iq_wide <- iq_wide %>%
mutate(Difference = Time_2 - Time_1)
```

Now perform the one-sample \(t\)-test with the null hypothesis set to be \(\mu_D = 0\).

```
t.test(iq_wide$Difference, mu = 0) %>%
tidy() %>%
kable()
```

estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|

3.5234 | 1.55743 | 0.1225596 | 99 | -0.9655275 | 8.012327 | One Sample t-test | two.sided |

Other than the `method`

column in the output table, the results are identical to the prior table using the paired sample \(t\)-test.

Still have questions? Contact us!