Self-test Solutions

Note

Solutions to the self-test questions in chapters of Discovering Statistics Using IBM SPSS Statistics (6th edition).

Chapter 1

Self-test 1.1

Based on what you have read in this section, what qualities do you think a scientific theory should have?

A good theory should do the following:

  • Explain the existing data.
  • Explain a range of related observations.
  • Allow statements to be made about the state of the world.
  • Allow predictions about the future.
  • Have implications.

Self-test 1.2

What is the difference between reliability and validity?

Validity is whether an instrument measures what it was designed to measure, whereas reliability is the ability of the instrument to produce the same results under the same conditions.

Self-test 1.3

Why is randomization important?

It is important because it rules out confounding variables (factors that could influence the outcome variable other than the factor in which you’re interested). For example, with groups of people, random allocation of people to groups should mean that factors such as intelligence, age and gender are roughly equal in each group and so will not systematically affect the results of the experiment.

Self-test 1.4

Compute the mean but excluding the score of 234.

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{22+40+53+57+93+98+103+108+116+121}{10} \\ \ &= \frac{811}{10} \\ \ &= 81.1 \end{aligned} \]

Self-test 1.5

Compute the range but excluding the score of 234.

Range = maximum score - minimum score = 121 − 22 = 99.

Self-test 1.6

Twenty-one heavy smokers were put on a treadmill at the fastest setting. The time in seconds was measured until they fell off from exhaustion: 18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57. Compute the mode, median, mean, upper and lower quartiles, range and interquartile range

First, let’s arrange the scores in ascending order: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57.

  • The mode: The scores with frequencies in brackets are: 16 (1), 18 (2), 22 (2), 23 (2), 24 (1), 26 (1), 29 (1), 32 (1), 34 (2), 36 (2), 42 (1), 43 (1), 46 (2), 49 (1), 57 (1). Therefore, there are several modes because 18, 22, 23, 34, 36 and 46 seconds all have frequencies of 2, and 2 is the largest frequency. These data are multimodal (and the mode is, therefore, not particularly helpful to us).
  • The median: The median will be the (n + 1)/2th score. There are 21 scores, so this will be the 22/2 = 11th. The 11th score in our ordered list is 32 seconds.
  • The mean: The mean is 32.19 seconds:

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{16+(2\times18)+(2\times22)+(2\times23)+24+26+29+32+(2\times34)+(2\times36)+42+43+(2\times46)+49+57}{21} \\ \ &= \frac{676}{21} \\ \ &= 32.19 \end{aligned} \]

  • The lower quartile: This is the median of the lower half of scores. If we split the data at 32 (not including this score), there are 10 scores below this value. The median of 10 scores is the 11/2 = 5.5th score. Therefore, we take the average of the 5th score and the 6th score. The 5th score is 22, and the 6th is 23; the lower quartile is therefore 22.5 seconds.
  • The upper quartile: This is the median of the upper half of scores. If we split the data at 32 (not including this score), there are 10 scores above this value. The median of 10 scores is the 11/2 = 5.5th score above the median. Therefore, we take the average of the 5th score above the median and the 6th score above the median. The 5th score above the median is 42 and the 6th is 43; the upper quartileis therefore 42.5 seconds.
  • The range: This is the highest score (57) minus the lowest (16), i.e. 41 seconds. _ The interquartile range: This is the difference between the upper and lower quartiles: 42.5 − 22.5 = 20 seconds.

Self-test 1.7

Assuming the same mean and standard deviation for the ice bucket example above, what’s the probability that someone posted a video within the first 30 days of the challenge?

As in the example, we know that the mean number of days was 39.68, with a standard deviation of 7.74. First we convert our value to a z-score: the 30 becomes (30−39.68)/7.74 = −1.25. We want the area below this value (because 30 is below the mean), but this value is not tabulated in the Appendix. However, because the distribution is symmetrical, we could instead ignore the minus sign and look up this value in the column labelled ‘Smaller Portion’ (i.e. the area above the value 1.25). You should find that the probability is 0.10565, or, put another way, a 10.57% chance that a video would be posted within the first 30 days of the challenge. By looking at the column labelled ‘Bigger Portion’ we can also see the probability that a video would be posted after the first 30 days of the challenge. This probability is 0.89435, or a 89.44% chance that a video would be posted after the first 30 days of the challenge.

Chapter 2

Self-test 2.1

In Section 1.6.2.2 we came across some data about the number of friends that 11 people had on Facebook. We calculated the mean for these data as 95 and standard deviation as 56.79. Calculate a 95% confidence interval for this mean. Recalculate the confidence interval assuming that the sample size was 56.

To calculate a 95% confidence interval for the mean, we begin by calculating the standard error:

\[ SE = \frac{s}{\sqrt{N}} = \frac{56.79}{\sqrt{11}}=17.12 \]

The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. For this we need the degrees of freedom, N – 1. With 11 data points, the degrees of freedom are 10. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.23. The confidence interval is, therefore, given by:

\[ \begin{aligned} \text{lower boundary of confidence interval} &= \bar{X}-(2.23 \times 17.12) \\ &= 95 - (2.23 \times 17.12) \\ & = 56.82 \\ \text{upper boundary of confidence interval} &= \bar{X}+(2.23 \times 17.12) \\ &= 95 + (2.23 \times 17.12) \\ &= 133.18 \end{aligned} \]

Assuming now a sample size of 56, we need to calculate the new standard error:

\[ SE = \frac{s}{\sqrt{N}} = \frac{56.79}{\sqrt{56}}=7.59 \]

The sample is big now, so to calculate the confidence interval we can use the critical value of z for a 95% confidence interval (i.e. 1.96). The confidence interval is, therefore, given by:

\[ \begin{aligned} \text{lower boundary of confidence interval} &= \bar{X}-(1.96 \times 7.59) = 95 - (1.96 \times 7.59) = 80.1 \\ \text{upper boundary of confidence interval} &= \bar{X}+(1.96 \times 7.59) = 95 + (1.96 \times 7.59) = 109.8 \end{aligned} \]

Self-test 2.2

What are the null and alternative hypotheses for the following questions: (1) ‘Is there a relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten?’ (2) ‘Does reading this chapter improve your knowledge of research methods?’

‘Is there a relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten?’

  • Null hypothesis: There will be no relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten.
  • Alternative hypothesis: There will be a relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten.

‘Does reading this chapter improve your knowledge of research methods?’

  • Null hypothesis: There will be no difference in the knowledge of research methods in people who have read this chapter compared to those who have not.
  • Alternative hypothesis: Knowledge of research methods will be in those who have read the chapter compared to those who have not.

Self-test 2.3

Compare the plots in Figure 2.16. What effect does the difference in sample size have? Why do you think it has this effect?

The plot showing larger sample sizes has smaller confidence intervals than the plot showing smaller sample sizes. If you think back to how the confidence interval is computed, it is the mean plus or minus 1.96 times the standard error. The standard error is the standard deviation divided by the square root of the sample size (√N), therefore as the sample size gets larger, the standard error (and, therefore, confidence interval) will get smaller.

Chapter 3

Self-test 3.1

Based on what you have learnt so far, which of the following statements best reflects your view of antiSTATic? (A) The evidence is equivocal, we need more research. (B) All of the mean differences show a positive effect of antiSTATic, therefore, we have consistent evidence that antiSTATic works. (C) Four of the studies show a significant result (p < .05), but the other six do not. Therefore, the studies are inconclusive: some suggest that antiSTATic is better than placebo, but others suggest there’s no difference. The fact that more than half of the studies showed no significant effect means that antiSTATic is not (on balance) more successful in reducing anxiety than the control. (D) I want to go for C, but I have a feeling it’s a trick question.

If you follow NHST you should pick C because only four of the six studies have a ‘significant’ result, which isn’t very compelling evidence for antiSTATic.

Self-test 3.2

Now you’ve looked at the confidence intervals, which of the earlier statements best reflects your view of Dr Weeping’s potion?

I would hope that some of you have changed your mind to option B: 10 out of 10 studies show a positive effect of antiSTATic (none of the means are below zero), and even though sometimes this positive effect is not always ‘significant’, it is consistently positive. The confidence intervals overlap with each other substantially in all studies, suggesting that all studies have sampled the same population. Again, this implies great consistency in the studies: they all throw up (potential) population effects of a similar size. Look at how much of the confidence intervals are above zero across the 10 studies: even in studies for which the confidence interval includes zero (implying that the population effect might be zero) the majority of the bar is greater than zero. Again, this suggests very consistent evidence that the population value is greater than zero (i.e. antiSTATic works).

Self-test 3.3

Compute Cohen’s d for the effect of singing when a sample size of 100 was used (right-hand plot in Figure 2.21).

\[ \begin{aligned} \hat{d} &= \frac{\bar{X}_\text{singing}-\bar{X}_\text{conversation}}{\sigma} \\ &= \frac{10-12}{3} \\ &= 0.667 \end{aligned} \]

Self-test 3.4

Compute Cohen’s d for the effect in Figure 2.22. The exact mean of the singing group was 10, and for the conversation group was 10.01. In both groups the standard deviation was 3.

\[ \begin{aligned} \hat{d} &= \frac{\bar{X}_\text{singing}-\bar{X}_\text{conversation}}{\sigma} \\ &= \frac{10-10.01}{3} \\ &= -0.003 \end{aligned} \]

Self-test 3.5

Look at Figures 2.22 and Figure 2.23. Compare what we concluded about these three data sets based on p-values, with what we conclude using effect sizes.

Answer given in the text.

Self-test 3.6

Look back at Figure 3.2. Based on the effect sizes, is your view of the efficacy of the potion more in keeping with what we concluded based on p-values or based on confidence intervals?

Answer given in the text.

Self-test 3.7

Use Table 3.2 and Bayes’ theorem to calculate p(human |match).

Answer given in the text.

Self-test 3.8

What are the problems with NHST?

Answer given in the text.

Chapter 4

Self-test 4.1

Now try creating the variable death_date using what you have learned.

First, move back to the variable view using the tab at the bottom of the data editor (). Move to the cell in row 3 of the column labelled Name (under the previous variable you created). Type ‘death_date’. Move into the column labelled using the → key on the keyboard. The cell you have moved into will indicate the default of , and to change this click to activate the Variable Type dialog box, and click . On the right of the dialog box is a list of date formats, from which you can choose your preference; being British, I am used to the day coming before the month and have chosen dd-mmm-yyyy (i.e., 26-Dec-1963), but Americans, for example, more often put the month before the date so might select mm/dd/yyyy (12/26/1963). When you have selected a date format, click to return to the variable view. Finally, move to the cell in the column labelled Label and type ‘Date of death’.

Once the variable has been created, return to the data view by clicking on the ‘Data View’ tab (). The third column now has the label death_date; click the white cell in the row for Cliff Burton and type the value, 27-09-1986. To register this value in this cell, move down to the next cell by pressing the ↓ key. Note that SPSS automatically changes the 09 to ‘Sep’.

Self-test 4.2

Using what you have learned try creating the variable current_member using codes and labels of 1 = True and 0 = False.

To create the current_member go to the variable view (), move to the first empty cell under the column labelled Name (this should be the cell under instrument. Type current_member into this empty cell. Move along the row to the column called Label and give the variable a full description such as Current Member of the Band. To define the group codes, move along the row to the column labelled . The cell will indicate the default of . Click to access the Value Labels dialog box (see Figure 4.10 in book).

To add a category and attach a value and label click . A new row will be created in the table in the dialog box with an empty cell under the column labelled Value and another under the column labelled Label. We want to create two categories (True and False) so we can click twice to create the necessary rows. Click on the first empty cell in the Value column and type a code (in this case 0), then press ↓ to move to the cell underneath and type the next code (in this case 1). Having set the numeric values for the categories, we assign each category a label by filling in the cells of the Label column. Click on the first empty cell in this column and type a descriptive label (in this case ‘False’), then press ↓ to move to the cell underneath and type the next label (‘True’). The completed dialog box should look like this:

Dialog box for the value labels of the current_member{widt =“300px”}

Click to return to the variable view. Now set the level of measurement for the variable to nominal by going to the column labelled Measure and selecting from the drop-down list.

Self-test 4.3

Why is the songs_written variable a ‘scale’ variable?

It is a scale variable because the numbers represent consistent intervals and ratios along the measurement scale: the difference between having written (for example) 1 and 2 songs is the same as the difference between having written (for example) 10 and 11 songs, and a band member who has written (for example) 20 songs has written twice as many as a band mamber who has written only 10 songs.

Self-test 4.4

Having created the first four variables with a bit of guidance, try to enter the rest of the variables in Table 3.1 yourself.

The finished data and variable views should look like those in the figures below (more or less!). You can also download the data file (metallica.sav)

Data view of metallica.sav

Variable view of metallica.sav

Chapter 5

Self-test 5.1

What does a histogram show?

A histogram plots the values of observations on the horizontal x-axis, and the frequency with which each value occurs in the data set on the vertical y-axis.

Self-test 5.2

Produce a histogram and population pyramid for the success scores before the intervention.

First, access the Chart Builder and then select Histogram in the list labelled Choose from: to bring up the gallery. This gallery has four icons representing different types of histogram, and you should select the appropriate one either by double-clicking on it, or by dragging it onto the canvas. We are going to do a simple histogram first, so double-click the icon for a simple histogram. The dialog box will show a preview of the plot in the canvas area. Next, drag the variable (success_pre) to . You will now find the histogram previewed on the canvas. To produce the histogram click .

The resulting histogram is shown below. Looking at the histogram, the data look fairly symmetrical and there doesn’t seem to be any sign of skew.

Histogram of success before intervention

To compare frequency distributions of several groups simultaneously we can use a population pyramid. click the population pyramid icon (see the book chapter) to display the template for this plot on the canvas. Then from the variable list select the variable representing the success scores before the intervention and drag it into the Distribution Variable? drop zone. Then drag the variable strategy to . click to produce the plot.

The resulting population pyramid is show below and looks fairly symmetrical. This indicates that both groups had a similar spread of scores before the intervention. Hopefully, this example shows how a population pyramid can be a very good way to visualise differences in distributions in different groups (or populations).

Population pyramid of success pre-intervention

Self-test 5.3

Produce boxplots for the success scores before the intervention.

To make a boxplot of the pre-intervention success scores for our two groups, double-click the simple boxplot icon, then from the variable list select the success_pre variable and drag it into and select the variable strategy and drag it to . Note that the variable names are displayed in the drop zones, and the canvas now displays a preview of our plot (e.g. there are two boxplots representing each gender). click to produce the plot.

Boxplot of success before each of the two interventions

Looking at the resulting boxplots above, notice that there is a tinted box, which represents the IQR (i.e., the middle 50% of scores). It’s clear that the middle 50% of scores are more or less the same for both groups. Within the boxes, there is a thick horizontal line, which shows the median. The workers had a very slightly higher median than the wishers, indicating marginally greater pre-intervention success but only marginally.

In terms of the success scores, we can see that the range of scores was very similar for both the workers and the wishers, but the workers contained slightly higher levels of success than the wishers. Like histograms, boxplots also tell us whether the distribution is symmetrical or skewed. If the whiskers are the same length then the distribution is symmetrical (the range of the top and bottom 25% of scores is the same); however, if the top or bottom whisker is much longer than the opposite whisker then the distribution is asymmetrical (the range of the top and bottom 25% of scores is different). The scores from both groups look symmetrical because the two whiskers are similar lengths in both groups.

Self-test 5.4

Use what you learnt in Section 5.6.3 to add error bars to this plot and to label both the x- (I suggest ‘Time’) and y-axis (I suggest ‘Mean grammar score (%)’).

See Figure 5.26 in the book.

Self-test 5.5

The procedure for producing line charts is basically the same as for bar charts. Follow the previous sections for bar charts but selecting a simple line chart instead of a simple bar chart, and a multiple line chart instead of a clustered bar chart. Produce line charts equivalents of each of the bar charts in the previous section. If you get stuck, the self-test answers on the companion website will walk you through it.

Simple Line Charts for Independent Means

Let’s use the data in notebook.sav (see book for details). Load this file now. Let’s plot the mean rating of the two films. We have one grouping variable (the film) and one outcome (the arousal); therefore, we want a simple line chart. Therefore, in the Chart Builder double-click the icon for a simple line chart. On the canvas you will see a plot and two drop zones: one for the y-axis and one for the x-axis. The y-axis needs to be the dependent variable, or the thing you’ve measured, or more simply the thing for which you want to display the mean. In this case it would be arousal, so select arousal from the variable list and drag it into . The x-axis should be the variable by which we want to split the arousal data. To plot the means for the two films, select the variable film from the variable list and drag it into .

Dialog boxes for a simple line chart with error bars

The figure above shows some other options for the line chart. We can add error bars to our line chart by selecting . Normally, error bars show the 95% confidence interval, and I have selected this option (). Click , then to produce the plot.

Line chart of the mean arousal for each of the two films

The resulting line chart displays the means (and the confidence interval of those means). This plot shows us that, on average, people were more aroused by The notebook than a documentary about notebooks.

Multiple line charts for independent means

To do a multiple line chart for means that are independent (i.e., have come from different groups) we need to double-click the multiple line chart icon in the Chart Builder (see the book chapter). On the canvas you will see a plot as with the simple line chart but there is now an extra drop zone: . All we need to do is to drag our second grouping variable into this drop zone.

As with the previous example, drag arousal into , then drag film into . Now drag gender_identity into . This will mean that lines representing those identifying as males and females will be displayed in different colours. As in the previous section, select error bars in the properties dialog box and click to apply them, click to produce the plot.

Dialog boxes for a multiple line chart with error bars

Line chart of the mean arousal for each of the two films.

The mean arousal for the notebook shows that males were more aroused during this film than females. This indicates they enjoyed the film more than the women did. Contrast this with the documentary, for which arousal levels are comparable in males and females.

Multiple line charts for mixed designs

To do the line plot equivalent of the bar chart we did for the social_media.sav data (see book for details) we follow the same procedure that we used to produce a bar chart of these described in the book, except that we begin the whole process by selecting a multiple line chart in the Chart Builder. Once this selection is made, everything else is the same as in the book.

Completed dialog box for an error bar plot of a mixed design

The resulting line chart shows that that at baseline (before the intervention) the grammar scores were comparable in our two groups; however, after the intervention, the grammar scores were lower in those encouraged to use social media than those banned from using it. If you compare the lines you can see that social media users’ grammar scores have fallen over the six months; compare this to the controls whose grammar scores are similar over time. We might, therefore, conclude that social media use has a detrimental effect on people’s understanding of English grammar.

Error bar plot of the mean grammar score over 6 months in children who were allowed to text-message versus those who were forbidden

Self-test 5.6

Doing a simple dot plot in the Chart Builder is quite similar to drawing a histogram. Reload the jiminy_cricket.sav data and see if you can produce a simple dot plot of the success scores after the intervention. Compare the resulting plot to the earlier histogram of the same data (Figure 5.11). Remember that your starting point is to double-click the icon for a simple dot plot in the Chart Builder (Figure 5.32), then use the instructions for plotting a histogram (Section 5.4) – there is guidance on the companion website.

First, make sure that you have loaded the jiminy_cricket.sav file and that you open the Chart Builder from this data file. Once you have accessed the Chart Builder (see the book chapter) select the Scatter/Dot in the chart gallery and then double-click the icon for a simple dot plot (again, see the book chapter if you’re unsure of what icon to click).

Like a histogram, a simple dot plot plots a single variable (x-axis) against the frequency of scores (y-axis).To do a simple dot plot of the success scores after the intervention we drag this variable to as shown in the figure. click .

Defining a simple dot plot (a.k.a. density plot) in the Chart Builder

The resulting density plot is shown below. Compare this with the histogram of the same data from the book. The first thing that should leap out at you is that they are very similar; they are two ways of showing the same thing. The density plot gives us a little more detail than the histogram, but essentially they show the same thing.

Density plot of the success scores after the intervention

Self-test 5.7

Doing a drop-line plot in the Chart Builder is quite similar to drawing a clustered bar chart. Reload the notebook.sav data and see if you can produce a drop-line plot of the arousal scores. Compare the resulting plot to the earlier clustered bar chart of the same data (Figure 5.21). The instructions in Section 5.6.2 should help.

To do a drop-line chart for means that are independent double-click the drop-line chart icon in the Chart Builder (see the book chapter if you’re not sure what this icon looks like or how to access the Chart Builder). As with the clustered bar chart example from the book, drag arousal from the variable list into , drag Film from the variable list into , and drag gender_identity into the drop zone.

This will mean that the dots representing those identifying as males and females will be displayed in different colours, but if you want them displayed as different symbols then read SPSS Tip 5.3 in the book. The completed dialog box is shown in the figure; click to produce the plot.

Using the Chart Builder to plot a drop-line plot

The resulting drop-line plot is shown below: compare it with the clustered bar chart from the book. Hopefully it’s clear that these plots show the same information and can be interpretted in the same way (see the book).

Drop-line plot of mean arousal scores during two films for those identifying as male and female and the original clustered bar chart from the book

Now see if you can produce a drop-line plot of the social_media.sav data from earlier in this chapter. Compare the resulting plot to the earlier clustered bar chart of the same data (Figure 5.30). The instructions in Section 5.6.5 should help.

Double-click the drop-line chart icon in the Chart Builder (see the book chapter if you’re not sure what this icon looks like or how to access the Chart Builder). We have a repeated-measures variable is time (whether grammatical ability was measured at baseline or six months) and is represented in the data file by two columns, one for the baseline data and the other for the follow-up data. In the Chart Builder select these two variables simultaneously and drag them into as shown in the figure. (See the book for details of how to do this, if you need them.) The second variable (whether people were encouraged to use social media or were banned) was measured using different participants and is represented in the data file by a grouping variable (Social media use). Drag this variable from the variable list into . The completed Chart Builder is shown in the figure; click to produce the plot.

Completing the dialog box for a drop-line plot of a mixed design

The resulting drop-line plot is shown below. Compare this figure with the clustered bar chart of the same data from the book. They both show that at baseline (before the intervention) the grammar scores were comparable in our two groups. On the drop-line plot this is particularly apparent because the two dots merge into one (you can’t see the drop line because the means are so similar). After the intervention, in those encouraged to use social media than those banned from using it. By comparing the two vertical lines the drop-line plot makes clear that the difference between those encouraged to use social media than those banned is bigger at 6 months than it is pre-intervention.

Drop line plot of the mean grammar score over six months in people who were encouraged to use social media versus those who were banned

Chapter 6

Self-test 6.1

Compute the mean and sum of squared error for the new data set.

First we need to compute the mean:

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{1+3+10+3+2}{5} \\ \ &= \frac{19}{5} \\ \ &= 3.8 \end{aligned} \]

Compute the squared errors as follows:

Score Error (score - mean) Error squared
1 -2.8 7.84
3 -0.8 0.64
10 6.2 38.44
3 -0.8 0.64
2 -1.8 3.24

The sum of squared errors is:

\[ \begin{aligned} \ SS &= 7.84 + 0.64 + 38.44 + 0.64 + 3.24 \\ \ &= 50.8 \\ \end{aligned} \]

Self-test 6.2

Using what you learnt in Chapter 5 plot a scatterplot of the day 2 scores against the day 1 scores.

First, access the Chart Builder and select Scatter/Dot in the list labelled Choose from:. We are going to do a simple scatterplot, so double-click the icon for a Scatter plot. The dialog box will now show a preview of the plot in the canvas area. Drag the hygiene day 1 variable to and the hygiene day 2 variable to as shown below; you will now find the plot previewed on the canvas.Click to produce the plot (which is reproduced in the book).

Defining a scatterplot in the Chart Builder

Self-test 6.3

Now we have removed the outlier in the data, re-plot the scatterplot and repeat the explore command from Section 6.9.2.

Repeat the instructions for self-test 6.2.

Self-test 6.4

Follow the main text to try to create a variable that is the natural log of the fear scores. Name this variable log_fear.

Follow the instructions in the book. Here’s a screencast to help:

Creating the variable log_fear

Self-test 6.5

Follow the main text to try to create a variable that is the square root of the fear scores. Name this variable sqrt_fear. Produce a matrix scatterplot of attractiveness against the raw scores and the two transformed versions of fear.

Follow the instructions in the book to create sqrt_fear. Here’s a screencast to help:

Creating the variable sqrt_fear

Then produce the scatterplot matrix as in the screen cast. Remember that you can select multiple variables by holding down the Ctrl key (Cmd on a Mac), and you can then drag them all onto the canvas simultaneously (on a Mac you need to keep Cmd held down as you drag).

Creating a matrix scatterplot

The resulting scatterplot is below. Look at the top row. These plots show the mean attractiveness ratings on the vertical y-axis, against different versions of the fear variable. Note that the raw fear scores have a curvilinear relationship with attractiveness ratings because the pattern of dots has a noticeable bend (first scatterplot in the top row). However, for log fear scores (second scatterplot in the top row) and square root fear scores (final scatterplot in the top row) the relationship is linear (the pattern of dots follows a straight line). These patterns suggest that the long and square root transformations have improved the linearity of the relationship between fear and attractiveness ratings.

A matrix scatterplot of attractiveness ratings against raw and transformed fear scores

Self-test 6.6

Compute the mean and variance of the attractiveness ratings. Now compute them for the 5%, 10% and 20% trimmed data.

Mean and variance

Compute the squared errors as follows:

Score Error (score - mean) Error squared
0 -6 36
0 -6 36
3 -3 9
4 -2 4
4 -2 4
5 -1 1
5 -1 1
6 0 0
6 0 0
6 0 0
6 0 0
7 1 1
7 1 1
7 1 1
8 2 4
8 2 4
9 3 9
9 3 9
10 4 16
10 4 16
120 NA 152

To calculate the mean of the attractiveness ratings we use the equation (and the sum of the first column in the table):

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{120}{20} \\ \ &= 6 \end{aligned} \]

To calculate the variance we use the sum of squares (the sum of the values in the final column of the table) and this equation:

\[ \begin{aligned} \ s^2 &= \frac{\text{sum of squares}}{n-1} \\ \ &= \frac{152}{19} \\ \ &= 8 \end{aligned} \]

5% trimmed mean and variance

Next, let’s calculate the mean and variance for the 5% trimmed data. We basically do the same thing as before but delete 1 score at each extreme (there are 20 scores and 5% of 20 is 1).

Compute the squared errors as follows:

Score Error (score - mean) Error squared
0 -6.11 37.33
3 -3.11 9.67
4 -2.11 4.45
4 -2.11 4.45
5 -1.11 1.23
5 -1.11 1.23
6 -0.11 0.01
6 -0.11 0.01
6 -0.11 0.01
6 -0.11 0.01
7 0.89 0.79
7 0.89 0.79
7 0.89 0.79
8 1.89 3.57
8 1.89 3.57
9 2.89 8.35
9 2.89 8.35
10 3.89 15.13
110 NA 99.74

To calculate the mean of the attractiveness ratings we use the equation (and the sum of the first column in the table):

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{110}{18} \\ \ &= 6.11 \end{aligned} \]

To calculate the variance we use the sum of squares (the sum of the values in the final column of the table) and this equation:

\[ \begin{aligned} \ s^2 &= \frac{\text{sum of squares}}{n-1} \\ \ &= \frac{99.74}{17} \\ \ &= 5.87 \\ \end{aligned} \]

10% trimmed mean and variance

Next, let’s calculate the mean and variance for the 10% trimmed data. To do this we need to delete 2 scores from each extreme of the original data set (there are 20 scores and 10% of 20 is 2).

Compute the squared errors as follows:

Score Error (score - mean) Error squared
3 -3.25 10.56
4 -2.25 5.06
4 -2.25 5.06
5 -1.25 1.56
5 -1.25 1.56
6 -0.25 0.06
6 -0.25 0.06
6 -0.25 0.06
6 -0.25 0.06
7 0.75 0.56
7 0.75 0.56
7 0.75 0.56
8 1.75 3.06
8 1.75 3.06
9 2.75 7.56
9 2.75 7.56
100 NA 46.96

To calculate the mean of the attractiveness ratings we use the equation (and the sum of the first column in the table):

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{100}{16} \\ \ &= 6.25 \end{aligned} \]

To calculate the variance we use the sum of squares (the sum of the values in the final column of the table) and this equation:

\[ \begin{aligned} \ s^2 &= \frac{\text{sum of squares}}{n-1} \\ \ &= \frac{46.96}{15} \\ \ &= 3.13 \\ \end{aligned} \]

20% trimmed mean and variance

Finally, let’s calculate the mean and variance for the 20% trimmed data. To do this we need to delete 4 scores from each extreme of the original data set (there are 20 scores and 20% of 20 is 4).

Compute the squared errors as follows:

Score Error (score - mean) Error squared
4 -2.25 5.06
5 -1.25 1.56
5 -1.25 1.56
6 -0.25 0.06
6 -0.25 0.06
6 -0.25 0.06
6 -0.25 0.06
7 0.75 0.56
7 0.75 0.56
7 0.75 0.56
8 1.75 3.06
8 1.75 3.06
75 NA 16.22

To calculate the mean of the attractiveness ratings we use the equation (and the sum of the first column in the table):

\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ \ &= \frac{75}{12} \\ \ &= 6.25 \end{aligned} \]

To calculate the variance we use the sum of squares (the sum of the values in the final column of the table) and this equation:

\[ \begin{aligned} \ s^2 &= \frac{\text{sum of squares}}{n-1} \\ \ &= \frac{16.22}{11} \\ \ &= 1.47 \\ \end{aligned} \]

Chapter 7

Self-test 7.1

What are the null hypotheses for these hypotheses?

  • There is no difference in depression levels between those who drank alcohol and those who took ecstasy on Sunday.
  • There is no difference in depression levels between those who drank alcohol and those who took ecstasy on Wednesday.

Self-test 7.2

Based on what you have just learnt, try ranking the Sunday data.

The answers are in Figure 7.4. There are lots of tied ranks and the data are generally horrible.

Self-test 7.3

See whether you can use what you have learnt about data entry to enter the data in Table 7.1 into SPSS.

The solution is in the chapter (and see the file drug.sav).

Self-test 7.4

Use SPSS to test for normality and homogeneity of variance in these data.

To get the outputs in the book use the following dialog boxes:

Dialog box for the explore command

Dialog box for plots for the explore command

Self-test 7.5

Have a go at ranking the data and see if you get the same results as me.

Solution is in the book chapter (Table 7.3).

Self-test 7.6

See whether you can enter the data in Table 7.3 into SPSS (you don’t need to enter the ranks). Then conduct some exploratory analyses on the data (see Sections 6.9.4 and 6.9.6).

Data entry is explained in the book (and see soya.sav. To get the outputs in the book use the following dialog boxes:

Dialog box for the explore command

Dialog box for plots for the explore command

Self-test 7.7

Have a go at ranking the data and see if you get the same results as in Table 7.5.

Solution is in the book chapter.

Self-test 7.8

Using what you know about inputting data, enter these data into SPSS and run exploratory analyses.

Data entry is explained in the book. To get the outputs in the book use the following dialog boxes:

Dialog box for the explore command

Dialog box for plots for the explore command

Chapter 8

Self-test 8.1

Enter the advert data and use the chart editor to produce a scatterplot (number of packets bought on the y-axis, and adverts watched on the x-axis) of the data.

The finished Chart Builder should look like this:

Dialog box for a scatterplot

My scatterplot came out like this:

A horrible scatterplot

This plot looks daft because SPSS has not scaled the axes from 0. If yours looks like this too, then, as an additional task, edit it so that the axes both start at 0. While you’re at it, why not make it look Tufte style. Mine ended up like this:

A less horrible scatterplot

Self-test 8.2

Create P-P plots of the variables revise, exam, and anxiety.

To get a P-P plot use Analyze > Descriptive Statistics > P-P Plots… to access the dialog box below. There’s not a lot to say about this dialog box really because the default options will compare any variables selected to a normal distribution, which is what we want (although note that there is a drop-down list of different distributions against which you could compare your data). Drag the three variables revise, exam and anxiety from the variable list to the box labelled Variables. Click to draw the plots. The plots are interpretted in the book.

Dialog box for P-P plots

Self-test 8.3

Conduct a Pearson correlation analysis of the advert data from the beginning of the chapter.

Select Analyze > Correlate > Bivariate to get this dialog box:

Dialog box for a Pearson correlation

Drag adverts and packets to the variables list (or click ). Click to run the analysis. The output is shown in the book chapter.

Self-test 8.4

Using the roaming_cats.sav file, compute a Pearson correlation between sex and time.

Select Analyze > Correlate > Bivariate to get this dialog box:

Dialog box for a Pearson correlation

Drag time and sex to the variables list (or click ). click to get some robust confidence intervals and select these options:

Dialog box for a Pearson correlation

Click to return to the main dialog box and to run the analysis. The output is shown in the book chapter.

Self-test 8.5

Use the split file command to compute the correlation coefficient between exam anxiety and exam performance in men and women.

To split the file, select Data > Split File … . In the resulting dialog box select the option Organize output by groups. Drag the variable Sex to the Groups Based on box (or click ). The completed dialog box should look like this:

Dialog box for splitting the file

To get the correlation coefficients select Analyze > Correlate > Bivariate to get the main dialog box. Drag the variables exam and anxiety to the variables list (or click ). Click to run the analysis. The completed dialog box will look like this:

Dialog box for a Pearson correlation

The output for males will look like this:

Pearson correlation between anxiety and exam performance in males

For females, the output is as follows:

Pearson correlation between anxiety and exam performance in females

The book chapter has some interpretation of these findings and suggestions for how to compare the coefficients for males and females.

Chapter 9

Self-test 9.1

Produce a scatterplot of sales (y-axis) against advertising budget (x-axis). Include the regression line.

Create the scatterplot as follows:

Creatinga scatterplot with linear fit line ## Self-test 9.2

If a predictor had ‘no effect’, how would the outcome change as the predictor changes? What would be the corresponding value of b? What, therefore, would be a reasonable null hypothesis?

Answered in the book:

Under the null hypothesis that there is ‘no relationship’ or ‘no effect’ between a predictor and an outcome, then as the predictor changes we would expect the predicted value of the outcome to not change (it is a constant value). In other words, the regression line would be flat. Therefore, ‘no effect’ equates to ‘flat line’ and ‘flat line’ equates to b = 0. If we want a hypothesis test then we can compare the ‘alternative hypothesis’ that there is an effect against this null. When there is an effect, the model will not be flat and b will not equal zero. So, we get the following hypotheses - \(H_0: b = 0\). If there is no association between a predictor and outcome variable we’d expect the parameter for that predictor to be zero. - \(H_1: b \ne 0\). If there is a non-zero association between a predictor and outcome variable we’d expect the parameter for that predictor to be non-zero.

Self-test 9.3

Once you have read Section 9.7, fit a linear model first with all the cases included and then with case 30 deleted.

To run the analysis on all 30 cases, you need to access the main dialog box by selecting Analyze > Regression > Linear …. The figure below shows the resulting dialog box. There is a space labelled Dependent in which you should place the outcome variable (in this example y). There is another space labelled Independent(s) in which any predictor variable should be placed (in this example, x). click and tick Unstandardized under Predicted values (see figure below), and then click to return to the main dialog box and to run the analysis.

Dialog box for regression

After running the analysis you should get the output below. (See the book chapter for an explanation of these results.)

Output for all 30 cases

To run the analysis with case 30 deleted, go to Data > Select Cases to open the dialog box in the figure below. Once this dialog box is open select Based on time or case range and then click Range. We want to set the range to be from case 1 to case 29, so type these numbers in the relevant boxes (see figure below). Click to return to the main dialog box and to filter the cases.

Filtering case 30

Once you have done this, your data should look like mine below. You will see that case 30 now has a diagonal strike through it to indicate that this case will be excluded from any further analyses.

Filtered data

Now we can run the regression in the same way as we did before by selecting Analyze > Regression > Linear … (see screencast above). You should get the same output as mine below (see the book chapter for an explanation of the results).

Output for first 29 cases

Once you have run both regressions, your data view should look like mine. You can see two new columns PRE_1 and PRE_2 which are the saved unstandardized predicted values that we requested.

Filtered data

Self-test 9.4

How is the t in Output 9.4 calculated? Use the values in the table to see if you can get the same value as SPSS.

The t is computed as follows:

\[ \begin{aligned} t &= \frac{b}{SE_b} \\ &= \frac{0.096}{0.010} \\ &= 9.6 \end{aligned} \]

This value is different to the value in the SPSS output (9.979) because we’ve used the rounded values displayed in the table. If you double-click the table, and then double click the cell for b and then for the SE we get the values to more decimal places:

\[ \begin{aligned} t &= \frac{b}{SE_b} \\ &= \frac{0.096124}{0.009632} \\ &= 9.979 \end{aligned} \]

which match the value of t computed by SPSS.

Self-test 9.5

How many albums would be sold if we spent £666,000 on advertising the latest album by Deafheaven?

Remember that advertising budget is in thousands, so we need to put £666 into the model (not £666,000). The b-values come from the SPSS output in the chapter:

\[ \begin{aligned} \widehat{\text{sales}}_i &= \hat{b}_0 + \hat{b}_1\text{advertising}_i \\ \widehat{\text{sales}}_i &= 134.14 + (0.096 \times \text{advertising}_i) \\ \widehat{\text{sales}}_i &= 134.14 + (0.096 \times 666) \\ \widehat{\text{sales}}_i &= 198.08 \end{aligned} \]

Self-test 9.6

Produce a matrix scatterplot of sales, adverts, airplay and image including the regression line.

Self-test 9.7

Think back to what the confidence interval of the mean represented. Can you work out what the confidence intervals for b represent?

This question is answered in the text just after the self-test box.

Chapter 10

Self-test 10.1

Enter these data into SPSS.

The file invisibility_cloak.sav shows how you should have entered the data.

Self-test 10.2

Produce some descriptive statistics for these data (using Explore)

To get some descriptive statistics using the Explore command go to Analyze > Descriptive Statistics > Explore …. The dialog box for the Explore command is shown below. First, drag any variables of interest to the box labelled Dependent List. For this example, select Mischievous acts. To split the output by the different cloak groups drag cloak to the box labelled Factor List. If you click a dialog box appears, but the default option is fine (it will produce means, standard deviations and so on). If you click and select the option Normality plots with tests, you will get the Kolmogorov-Smirnov test and some normal Q-Q plots in your output. Click to return to the main dialog box and to run the analysis.

Explore dialog box

Self-test 10.3

To prove that I’m not making it up as I go along, fit a linear model to the data in invisibility_cloak.sav with cloak as the predictor and mischief as the outcome using what you learnt in the previous chapter. cloak is coded using zeros and ones as described above.

Regression dialog box

Self-test 10.4

Produce an error bar chart of the invisibility_cloak.sav data (cloak will be on the x-axis and mischief on the y-axis).

Completed dialog box

Self-test 10.5

Enter the data in Table 10.1 into the data editor as though a repeated-measures design was used.

We would arrange the data in two columns (one representing the cloak condition and one representing the no_cloak condition). You can see the correct layout in invisibility_rm.sav.

Self-test 10.6

Using the invisibility_rm.sav data, compute the differences between the cloak and no cloak conditions and check the assumption of normality with a Q-Q plot.

First compute the differences using the compute function:

Completed dialog box

Next, use Analyze > Descriptive Statistics > Explore … to get some Q-Q plots:

Completed dialog box

The Q-Q plot shows that the quantiles fall pretty much on the diagonal line (indicating normality). As such, it looks as though we can assume that our differences are fairly normal and that, therefore, the sampling distribution of these differences is normal too. Happy days!

The P-P plot

Self-test 10.7

Produce an error bar chart of the invisibility_rm.sav data (cloak on the x-axis and mischief on the y-axis).

Completed dialog box

Self-test 10.8

Create an error bar chart of the mean of the adjusted values that you have just made (cloak_adjusted and no_cloak_adjusted).

Completed dialog box

Chapter 11

Self-test 11.1

Follow Oliver Twisted’s instructions to create the centred variables caunts_cent and vid_game_cent. Then use the compute command to create a new variable called interaction in the video_games.sav file, which is caunts_cent multiplied by vid_game_cent.

To create the centred variables follow Oliver Twisted’s instructions for this chapter. I’ll assume that you have a version of the data file video_games.sav containing the centred versions of the predictors (caunts_cent and vid_game_cent). To create the interaction term, access the compute dialog box by selecting Transform > Compute Variable … and enter the name interaction into the box labelled Target Variable. Drag the variable caunts_cent to the area labelled Numeric Expression, then click and then select the variable vid_game_cent and drag it across to the area labelled Numeric Expression. The completed dialog box is shown below. Click and a new variable will be created called interaction, the values of which are caunts_cent multiplied by vid_game_cent.

Dialog box to compute an interaction

Self-test 11.2

Assuming you have done the previous self-test, fit a linear model predicting aggress from caunts_cent, vid_game_cent and their interaction.

To do the analysis access the main dialog box by selecting Analyze > Regression > Linear …. The resulting dialog box is shown below. Drag aggress from the list on the left-hand side to the space labelled Dependent (or click ). Drag caunts_cent, vid_game_cent and interaction from the variable list to the space labelled Independent(s) (click or click ). The default method of Enter is what we want, so click to run the basic analysis.

Dialog box for linear regression

Self-test 11.3

Assuming you did the previous self-test, compare the table of coefficients that you got with those in Output 11.1.

The output below shows the regression coefficients from the regression analysis that you ran using the centred versions of callous traits and hours spent gaming and their interaction as predictors. Basically, the regression coefficients are identical to those in Output 11.1 from using PROCESS. The standard errors differ a little from those from PROCESS, but that’s because when we used PROCESS we asked for heteroscedasticity-consistent standard errors, consequently the t-values are slightly different too (because these are computed from the standard errors: \(\sfrac{\hat{b}}{SE_{\hat{b}}}\)). The basic conclusion is the same though: there is a significant moderation effect as shown by the significant interaction between hours spent gaming and callous unemotional traits.

Output for linear regression

Self-test 11.4

Fit the three models necessary to test mediation for Lambert et al’s data: (1) a linear model predicting phys_inf from ln_porn; (2) a linear model predicting commit from ln_porn; and (3) a linear model predicting phys_inf from both ln_porn and commit. Is there mediation?

Model 1: Predicting infidelity (phys_inf) from pornography consumption (ln_porn)

Dialog box for model 1

Output for model 1

Model 2: Predicting relationship commitment (commit) from pornography consumption (ln_porn)

Dialog box for model 2

Output for model 2

Model 3: Predicting infidelity (phys_inf) from pornography consumption (ln_porn) and relationship commitment (commit)

Dialog box for model 3

Output for model 3

Interpretation

  • Model 1 shows that pornography consumption significantly predicts infidelity, \(\hat{b} = 0.59\), 95% CI [0.19, 0.98], t = 2.93, p = .004. As consumption increases, physical infidelity increases also.
  • Model 2 shows that pornography consumption significantly predicts relationship commitment, \(\hat{b} = -0.47\), 95% CI [\(-0.89\), \(-0.05\)], t = \(-2.21\), p = .028. As pornography consumption increases, commitment declines.
  • Model 3 shows that relationship commitment significantly predicts infidelity, \(\hat{b} = -0.27\), 95% CI [\(-0.39\),\(-0.16\)], t = \(-4.61\), p < .001. As relationship commitment increases, physical infidelity declines.
  • The relationship between pornography consumption and infidelity is stronger in model 1, \(\hat{b} = 0.59\), than in model 3, \(\hat{b} = 0.46\).

As such, the four conditions of mediation have been met.

Chapter 12

Self-test 12.1

To illustrate what is going on I have created a file called puppies_dummy.sav that contains the puppy therapy data along with the two dummy variables (short and long) we’ve just discussed (Table 12.4). Fit a linear model predicting happiness from short and long.

Fitting a model using dummy variables

Self-test 12.2

To illustrate these principles, I have created a file called puppies_contrast.sav in which the puppy therapy data are coded using the contrast coding scheme used in this section. Fit a linear model using happiness as the outcome and puppies_vs_none and short_vs_long as the predictor variables (leave all default options).

Fitting a model using contrast coding

Self-test 12.3

Produce a line chart with error bars for the puppy therapy data.

Completed dialog box

Self-test 12.4

Can you explain the contradiction between the planned contrasts and post hoc tests?

The answer is given in the book chapter.

Chapter 13

Self-test 13.1

Use SPSS Statistics to find the means and standard deviations of both happiness and love of puppies across all participants and within the three groups.

You could do this using the Analyze > Descriptive Statistics > Explore dialog box:

Completed dialog box

Answers are in Table 13.2 of the chapter.

Self-test 13.2

Add two dummy variables to the file puppy_love.sav that compare the 15 minutes to the control (low_control) and the 30 minutes to the control (high_control) – see Section 12.3.1 for help.

The data should look like the file puppy_love_dummy.sav.

Self-test 13.3

Fit a hierarchical regression with happiness as the outcome. In the first block enter love of puppies (puppy_love) as a predictor, and then in a second block enter both low_control and high_control (forced entry) – see Section 9.10 for help.

To get to the main regression dialog box select Analyze > Regression > Linear …. Drag the outcome variable (Puppy_love) the box labelled Dependent (or click ). To specify the predictor variable for the first block we drag puppy_love to the box labelled Independent(s) (or click . Underneath the Independent(s) box, there is a drop-down menu for specifying the Method of regression. The default option is forced entry, and this is the option we want.

Completed dialog box

To specify the second block click . This process clears the Independent(s) box so that you can enter the new predictors (you should also note that above this box it now reads Block 2 of 2, indicating that you are in the second block of the two that you have so far specified). The second block must contain both of the dummy variables, so you should drag on Low_Control and High_Control from the variable list to the Independent(s) box (or click ). We also want to leave the method of regression set to Enter.

Completed dialog box

Outut 13.1 (in the book) shows the results that you should get and the text in the chapter explains this output.

Self-test 13.4

Fit a model to test whether love of puppies (our covariate) is independent of the dose of puppy therapy (our independent variable).

We can do this analysis by selecting either Analyze > Compare Means > One-Way ANOVA… or Analyze > General Linear Model > Univariate…. If we do the latter then we can follow the example in the chapter but drag the covariate (Puppy_love) to the box labelled Dependent Variable and exclude Happiness from the model. The completed dialog box would look like this:

Completed dialog box

Self-test 13.5

Fit the model without the covariate to see whether the three groups differ in their levels of happiness.

We can do this analysis by selecting either Analyze > Compare Means > One-Way ANOVA… or Analyze > General Linear Model > Univariate…. If we do the latter then we can follow the example in the chapter exclude the covariate (Puppy_love). The completed dialog box would look like this:

Completed dialog box

The output is in the book chapter.

Self-test 13.6

Produce a scatterplot of love of puppies (horizontal axis) against happiness (vertical axis).

Completed dialog box

The scatterplot itself is in the book chapter.

Chapter 14

Self-test 14.1

The file goggles_regression.sav contains the dummy variables used in this example. Just to prove that this works, use this file to fit a linear model predicting attractiveness ratings from facetype, alcohol and the interaction variable.

Select Analyze > Regression > Linear … and complete the dialog box as below. The output is shown in Output 14.1 of the book.

Completed dialog box

Self-test 14.2

What about panels (c) and (d): do you think there is an interaction?

This question is answered in the text in the chapter.

Self-test 14.3

Use the Chart Builder to plot an error bar graph of the attractiveness ratings with alcohol consumption on the x-axis and different coloured lines to represent whether the faces being rated were unattractive or attractive.

Select Graphs > Chart Builder … and complete the dialog box as below.

Completed dialog box

Chapter 15

Self-test 15.1

What is a repeated-measures design? (Clue: it is described in Chapter 1.)

Repeated-measures is a term used when the same entities participate in all conditions of an experiment.

Self-test 15.2

Devise some contrast codes for the contrasts described in the text.

The answer is in Table 15.3 in the chapter.

Self-test 15.3

Thinking back to the order in which we specified the levels of Entity (Section 15.8.1), what groups are being compared in each contrast?

Given the order in which we specified the levels of Entity, contrast 1 compares the mannequin and human, contrast 2 compares the shapeshifter to the human, and contrast 3 compares the alien to the shapeshifter.

Chapter 16

Self-test 16.1

In the data editor create nine variables with the names and variable labels given in Figure 16.3. Create a variable strategy with value labels 0 = normal, 1 = hard to get.

The data in the file speed_date.sav show how the variables should be set up.

Self-test 16.2

Enter the data as in Table 16.1. If you have problems then use the file speed_date.sav.

The data in the file speed_date.sav show how the variables should be set up.

Self-test 16.3

Output 16.2 shows information about sphericity. Based on what you have already learnt, what would you conclude form this information?

Answers are in the text within the chapter.

Self-test 16.4

What is the difference between a main effect and an interaction?

A main effect is the unique effect of a predictor variable (orindependent variable) on an outcome variable. In this context it can be the effect of strategy, charisma or looks on their own. So, in the case of strategy, the main effect is the difference between the average ratings of all dates that played hard to get (irrespective of their attractiveness or charisma) and all dates that acted typically (irrespective of their attractiveness or charisma).

The main effect of looks would be the mean rating given to all attractive dates (irrespective of their charisma, or whether they played hard to get or not), compared to the average rating given to all average-looking dates (irrespective of their charisma, or whether they played hard to get or not) and the average rating of all ugly dates (irrespective of their charisma, or whether they played hard to get or acted normally).

An interaction, on the other hand, looks at the combined effect of two or more variables: for example, were the average ratings of attractive, ugly and average-looking dates different when those dates played hard to get compared to when they did not?

Self-test 16.5

Was the assumption of homogeneity of variance met (Output 16.4)?

Answers are in the text within the chapter.

Self-test 16.6

Based on the previous section, on what you have learned in previous chapters, and on Output 16.3, can you interpret the main effect of looks?

Answers are in the text within the chapter.

Chapter 17

Self-test 17.1

What is a cross-product?

Cross-products represent a total value for the combined error between two variables (in some sense they represent an unstandardized estimate of the total correlation between two variables).

Self-test 17.2

Why might the univariate tests be non-significant when the multivariate tests were significant?

The answer is in the chapter:

“The reason for the anomaly is that the multivariate test takes account of the correlation between outcome variables and looks at whether groups can be distinguished by a linear combination of the outcome variables. This suggests that it is not thoughts or actions in themselves that distinguish the therapy groups, but some combination of them. The discriminant function analysis will provide more insight into this conclusion.”

Self-test 17.3

Based on what you have learnt in previous chapters, interpret the table of contrasts in your output.

In the chapter I suggested carrying out a simple contrast that compares each of the therapy groups to the no-treatment control group. The output below shows the results of these contrasts. The table is divided into two sections conveniently labelled Level 1 vs. Level 3 and Level 2 vs. Level 3 where the numbers correspond to the coding of the group variable. If you coded the group variable using the same codes as I did, then these contrasts represent CBT vs. NT and BT vs. NT respectively. Each contrast is performed on both dependent variables separately and so they are identical to the contrasts that would be obtained from a univariate ANOVA. The table provides values for the contrast estimate and the hypothesized value (which will always be zero because we are testing the null hypothesis that the difference between groups is zero). The observed estimated difference is then tested to see whether it is significantly different from zero based on the standard error. A 95% confidence interval is produced for the estimated difference.

The first thing that you might notice (from the values of Sig.) is that when we compare CBT to NT there are no significant differences in thoughts (p = 0.104) or behaviours (p = 0.872) because both values are above the 0.05 threshold. However, comparing BT to NT, there is no significant difference in thoughts (p = 0.835) but there is a significant difference in behaviours between the groups (p = 0.044). The confidence intervals confirm these findings: they all include zero (the lower bounds are negative whereas the upper bounds are positive) except for the BT vs. NT contrast for behaviours. Assuming that these intervals are from the 95% that contain the population value, this means that all of these effects might be 0 in the population, except for the effect of BT vs. NT for behaviours. This finding is a little unexpected because the univariate ANOVA for behaviours was non-significant and so we would not expect there to be significant group differences.

Output

Chapter 18

Self-test 18.1

What is the equation of a straight line/linear model?

As shown in the book:

\[ Y_i = b_1X_{\text{1}i} + b_2X_{\text{2}i} + \ldots + b_nX_{ni} \]

Self-test 18.2

Having done this, select the Direct oblimin option in Figure 18.12 and repeat the analysis. You should obtain two outputs identical in all respects except that one used an orthogonal rotation and the other an oblique.

This should be self-explanatory from the book chapter.

Self-test 18.3

Use the case summaries command (Section 9.11.6) to list the factor scores for these data (given that there are over 2500 cases, restrict the output to the first 10).

To list the factor scores select Analyze > Reports > Case Summaries …. Drag the variables that you want to list (in this case the four columns of factor scores) to the box labelled Variables. By default, SPSS will limit the output to the first 100 cases, but let’s set this to 10 so we just look at the first few cases (as in the book chapter).

Completed dialog box

Self-test 18.4

Thinking back to Chapter 1, what are reliability and test–retest reliability?

The answer is given in the text.

Self-test 18.5

Use the compute command to reverse-score item 3 and store as a variable called question_03_rev (see Chapter 6; remember that you are changing the variable to 6 minus its original value).

To access the compute dialog box, select Transform > Compute Variable …. Enter the name of the variable that we want to change in the space labelled Target Variable (in this case the variable is called question_03_rev). Then, where it says Numeric Expression you need to tell SPSS how to compute the new variable. In this case, we want to take each person’s original score on item 3, and subtract that value from 6. Therefore, we type 6–Question_03 (which means 6 minus the value found in the column labelled question_03). If you’ve used the same name then when you click you’ll get a dialog box asking if you want to change the existing variable; click if you’re happy for the new values to replace the old ones.

Reverse scoring question 3

Self-test 18.6

Run reliability analysis on the other three subscales.

The outputs and interpretation are in the chapter.

Chapter 19

Self-test 19.1

Fit a linear model with ln_observed as the outcome, and training, dance and interaction as the three predictors.

The multiple regression dialog box will look like the figure below. We can leave all of the default options as they are because we are interested only in the regression parameters. The regression parameters are shown in the book.

Self-test 19.2

Fit another linear model using cat_reg.sav. This time the outcome is the log of expected frequencies (ln_expected) and training and dance are the predictors (the interaction is not included).

The multiple regression dialog box will look like this:

Dialog box for the model

We can leave all of the default options as they are because we are interested only in the regression parameters. The resulting regression parameters are shown below. Note that \(\hat{b}_0\) = 3.16, the beta coefficient for the type of training is 1.45 and the beta coefficient for whether they danced is 0.49. All of these values are consistent with those calculated in the book chapter.

Self-test 19.3

Using the cats_weight.sav data, change the frequency of cats that had food as reward and didn’t dance from 10 to 28. Re-do the chi-square test and select and interpret z-tests. Is there anything about the results that seems strange?

You need to change the score then follow the instructions in the book to run the analysis. This screencast might help:

The contingency table you get looks like this:

Output

In the row labelled Food as Reward the count of 28 in the column labelled Yes has a subscript letter a, and in the column labelled No the count of 28 has a subscript letter b. These subscripts tell us the results of the z-test that we asked for: columns with different subscripts have significantly different column proportions. This is what should strike you as strange: how can it be that two identical counts of 28 can be deemed significantly different? The answer is that despite the subscripts being attached to the counts, that isn’t what they compare: they compare the proportion of the total frequency of each column that falls into that row against the proportion of the total frequency of the second column that falls into that row. In this case, the first column represents all the cats that danced (n = 76). Within this column, 28/76 = 36.8% had food (i.e. fell into the row representing food as a reward). In other words, of all the cats that danced 36.8% had food. the second column represents all of the cats that did not dance (n = 142). Within this column 28/142 = 19.7% had food. In other words, of all the cats that did not dance, 19.7% had food. The significance test is testing whether these proportions are different to each other: it’s testing whether 19.7% is different from 36.8%, and it is (p < 0.05), which is why the column counts have been denoted with different letters.

Self-test 19.4

Use Section 19.7.3 to help you to create a contingency table with dance as the columns, training as rows and animal as a layer.

Select Analyze > Descriptive Statistics > Crosstabs …. We have three variables in our crosstabulation table: whether the animal danced or not (dance), the type of reward given (training), and whether the animal was a cat or dog (animal). Drag training into the box labelled Row(s) (or click ). Next, drag dance to the box labelled Column(s) (or click ). Finally,drag animal to the box labelled Layer 1 of 1 (or click ). The completed dialog box should look like this:

Completed dialog box for Crosstabs

Click and select these options:

Self-test 19.5

Use the split file command (see Section 6.10.4) to run a chi-square test on dance and training for dogs and cats.

Select Date > Split File … and then select Organize output by groups. Once this option is selected, the Groups Based on box will activate. Drag Animal) into this box (or click ):

Dialog box for split file

To run the chi-square tests, select Analyze > Descriptive Statistics > Crosstabs …. Drag training into the box labelled Row(s) (or click ). Next, drag dance to the box labelled Column(s) (or click ). The completed dialog box should look like this:

Dialog box for crosstabs

Select the same options as in the book (for the cat example).

Chapter 20

Self-test 20.1

Using equations 20.23 and 20.25, calculate the values of Cox and Snell’s and Nagelkerke’s \(R^2\). (Remember the sample size, N, is 113.)

SPSS reports \(-2LL_\text{new}\) as 144.16 and \(-2LL_\text{baseline}\) as 154.08. The sample size, N, is 113. So Cox and Snell’s \(R^2\) is calculated as follows:

\[ \begin{aligned} R_{\text{CS}}^2 &= 1-exp\bigg(\frac{-2LL_\text{new}-(-2LL_\text{baseline})}{n}\bigg) \\ &= 1-exp\bigg(\frac{144.16-154.08}{113}\bigg) \\ &= 1-exp(-0.0878) \\ &= 1-e^{-0.0878} \\ &= 0.084 \end{aligned} \]

Nagelkerke’s adjustment is calculated as:

\[ \begin{aligned} R_{\text{N}}^2 &= \frac{R_{\text{CS}}^2}{1-exp(-(\frac{-2LL_\text{baseline}}{n}))} \\ &= \frac{0.084}{1-exp(-(\frac{154.08}{113}))} \\ &= \frac{0.084}{1-e^{-1.3635}} \\ &= \frac{0.084}{1-0.2558} \\ &= 0.113 \end{aligned} \]

Self-test 20.2

Use the case summaries function to create a table for the first 15 cases in the file eel.sav showing the values of cured, intervention, duration, the predicted probability (PRE_1) and the predicted group membership (PGR_1) for each case.

The completed dialog box should look like this:

Completed dialog box for case summaries

Self-test 20.3

Why might the model be better at classifying scored penalty kicks?

This question is answered in the book:

“The classification table gives us a clue as to why scored penalties are better predicted than missed ones. The vast majority of kicks are scored because people tend not to end up as professional soccer players unless they’re extremely good at kicking footballs. Of the 868 penalties in the data 793 are scored and only 75 missed!”

Self-test 20.4

Try creating a new variable that is the natural logs of position called ln_position.

Select Transform > Compute Variable … and complete the dialog box as follows:

Creating the log of position

Chapter 21

Self-test 21.1

Produce a scatterplot of days (x-axis) against post_qol (y-axis), with each clinic plotted as a different line. Include a line of best fit for each clinic.

Select Graphs > Chart Builder … and then complete the dialog box as follows

Creating a scatterplot of days against post_qol by clinic

Self-test 21.2

Refit the model using months instead of days.

Select Analyze > Mixed Models > Linear … and complete the dialog boxes as follows:

Fitting a multilevel model (Part 1)

Fitting a multilevel model (Part 2)

Fitting a multilevel model (Part 3)

Self-test 21.3

Split the file by reason and then run a multilevel model predicting post_qol with a random intercept, and random slopes for months, and including base_qol and months as predictors..

First, split the file by reason by selecting Data > Split File…. The completed dialog box should look like this:

Completed dialog box

To run the multilevel model. Select Analyze > Mixed Models > Linear… and specify the contextual variable by dragging clinic to the box labelled Subjects (or click ).

Completed dialog box

Click to move to the main dialog box. First drag post_qol to the space labelled Dependent variable (or click ). Next, drag months and base_qol to the space labelled Covariate(s) (or click ).

Completed dialog box

To add the predictors (base_qol and months) as fixed effects to the model, click to activated the Fixed Effects dialog box, then, make sure that is set to and select these variables and click . Click to return to the main dialog box.

Completed dialog box

We now need to ask for a random intercept and random slopes for the effect of months. Click in the main dialog box. Drag clinic to the area labelled Combinations (or click ). Select to allow intercepts to vary across contexts (i.e., a random intercepts model). Next, add months to the model by selecting it in the list of Factors and Covariates and clicking . Finally, to estimate the covariance between the random slope and random intercept click to access the drop-down list and select .

Completed dialog box

Click on and select . Click to return to the main dialog box. In the main dialog box click and request Parameter estimates and Tests for covariance parameter. Click to return to the main dialog box. To run the analysis, click .