To get a better understanding of the difference we will build the R and SPSS-flavored QQ-plot from scratch.
R type
In order to calculate theoretical quantiles corresponding to the observed values, we first need to find a way to assign a probability to each value of the original data. A lot of different approaches exist for this purpose (for an overview see e.g. Castillo-Gutiérrez, Lozano-Aguilera, & Estudillo-Martínez, 2012b). They usually build on the ranks of the observed data points to calculate corresponding p-values, i.e. the plotting positions for each point. The qqnorm
function uses two formulae for this purpose, depending on the number of observations \(n\) (Blom’s mfethod, see ?qqnorm
; Blom, 1958). With \(r\) being the rank, for \(n > 10\) it will use the formula \(p = (r - 1/2) / n\), for \(n \leq 10\) the formula \(p = (r - 3/8) / (n + 1/4)\) to determine the probability value \(p\) for each observation (see the help files for the functions qqnorm
and ppoint
). For simplicity reasons, we will only implement the \(n > 10\) case here.
n <- length(x) # number of observations
r <- order(order(x)) # order of values, i.e. ranks without averaged ties
p <- (r - 1/2) / n # assign to ranks using Blom's method
y <- qnorm(p) # theoretical standard normal quantiles for p values
plot(x, y) # plot empirical against theoretical values

Before we take at look at the code, note that our plot is identical to the plot generated by qqnorm
above, except that the QQ-line is missing. The main point that makes the difference between R and SPSS is found in the command order(order(x))
. The command calculates ranks for the observations using ordinal ranking. This means that all observations get different ranks and no average ranks are calculated for ties, i.e. for observations with equal values. Another approach would be to apply fractional ranking and calculate average values for ties. This is what the function rank
does. The following codes shows the difference between the two approaches to assign ranks.
v <- c(1,1,2,3,3)
order(order(v)) # ordinal ranking used by R
## [1] 1 2 3 4 5
rank(v) # fractional ranking used by SPSS
## [1] 1.5 1.5 3.0 4.5 4.5
R uses ordinal ranking and SPSS uses fractional ranking by default to assign ranks to values. Thus, the positions do not overlap in R as each ordered observation is assigned a different rank and therefore a different p-value. We will pick up the second approach again later, when we reproduce the SPSS-flavored plot in R.
The second difference between the plots concerned the scaling of the y-axis and was already clarified above.
The last point to understand is how the QQ-line is drawn in R. Looking at the probs
argument of qqline
reveals that it uses the 1st and 3rd quartile of the original data and theoretical distribution to determine the reference points for the line. We will draw the line between the quartiles in red and overlay it with the line produced by qqline
to see if our code is correct.
plot(x, y) # plot empirical against theoretical values
ps <- c(.25, .75) # reference probabilities
a <- quantile(x, ps) # empirical quantiles
b <- qnorm(ps) # theoretical quantiles
lines(a, b, lwd=4, col="red") # our QQ line in red
qqline(x, datax=T) # R QQ line

The reason for different lines in R and SPSS is that several approaches to fitting a straight line exist (for an overview see e.g. Castillo-Gutiérrez, Lozano-Aguilera, & Estudillo-Martínez, 2012a). Each approach has different advantages. The method used by R is more robust when we expect values to diverge from normality in the tails, and we are primarily interested in the normality of the middle range of our data. In other words, the method of fitting an adequate QQ-line depends on the purpose of the plot. An explanation of the rationale of the R approach can e.g. be found here.
SPSS type
The default SPSS approach also uses Blom’s method to assign probabilities to ranks (you may choose other methods is SPSS) and differs from the one above in the following aspects:
- As already mentioned, SPSS uses ranks with averaged ties (fractional rankings) not the plain order ranks (ordinal ranking) as in R to derive the corresponding probabilities for each data point. The rest of the code is identical to the one above, though I am not sure if SPSS distinguishes between the \(n <= 10\) and \(n > 10\) case.
- The theoretical quantiles are scaled to match the estimated mean and standard deviation of the original data.
- The QQ-line goes through all quantiles with identical values on the x and y axis.
n <- length(x) # number of observations
r <- rank(x) # a) ranks using fractional ranking (averaging ties)
p <- (r - 1/2) / n # assign to ranks using Blom's method
y <- qnorm(p) # theoretical standard normal quantiles for p values
y <- y * sd(x) + mean(x) # b) transform SND quantiles to mean and sd from original data
plot(x, y) # plot empirical against theoretical values
Lastly, let us add the line. As the scaling of both axes is the same, the line goes through the origin with a slope of \(1\).
abline(0,1) # c) slope 0 through origin

The comparison to the SPSS output shows that they are (visually) identical.