**Ordination
in R**

This page covers the R functions to perform
multivariate ordination, including some methods beyond the default biplots for displaying your results graphically. Some of these methods will use functions in
the vegan package, which you should load and install (see here if you
haven’t loaded packages before). The
following four major ordination methods are covered:

1.
Principal
Components Analysis (PCA)

2.
Correspondence
Analysis (CA) and Detrended Correspondence Analysis
(DCA)

3.
Principal
Coordinates Analysis (PCoA)

4.
Non-Metric
Multidimensional Scaling (NMDS).

__Principal Components Analysis (PCA)__

This function is part of the basic R
installation, so you won’t need the vegan package yet. Before you run the function, you will need to
have a data frame containing only numerical data (there can be row names). The default arrangement is to have the
samples (or sites) in rows and the measured variables (or counts) in
columns. You can transpose a data frame
(or matrix) – swap the rows to columns and vice versa – using the transpose
function t(…):

transposed.frame<-t(dataframe)
#transposes data
frame so rows become columns and vice versa

The default setting for the PCA prcomp(…)
function performs ordination using the covariance matrix, for example:

PCA.res<-prcomp(data.frame) #where data.frame is the name of the
data frame variable

If you want to use the correlation matrix (for
example, when the variables are measured in different units or when differences
in their variances are not important for interpretation), specify:

PCA.res<-prcomp(data.frame, scale.=T)

You should store the results as a variable, an
object of class “prcomp” (don’t worry about what “class”
means) that contains specific items.
Most of the time you will want to plot the results manually using the
values provided, to customize the sample colors and other parameters, but to
get a quick plot you can use the biplot(…) command:

biplot(PCA.res)

The first two (most important) components are
plotted, samples are shown in black and the loadings of each variable are shown
by the red arrows. There are two sets of
axis scales; the ones on the right and top correspond to the axis scores for
samples, and the bottom and left axes correspond to the loadings of the
variables. Note that it uses the sample
labels for plot points, which is usually unwieldy. If you have a lot of variables, you will also
get a giant mess of loadings like in the plot above. Because of that, it is often better to
produce a custom plot. That will allow
you to customize other characteristics – grouping samples into categories by
color of the points, for example.

The PCA results object contains the following
variables:

names(PCA.res)

[1]
"sdev" "rotation"
"center"
"scale"
"x"

“Center” and “scale” simply keep track of your
choices when running the function, so “sdev,”
“rotation,” and “x” are the important data. The “sdev” variable
contains the standard deviation for each principal component axis. Because it is more traditional to talk about
variance explained (rather than standard deviation), you can easily convert sdev to variance (variance is standard deviation squared)
or proportion of variance.

You can also assess the proportion of variance
explained by asking for the summary of the PCA results:

summary(PCA.res)

Importance
of components:

PC1 PC2
PC3 PC4 PC5

Standard
deviation 0.175 0.142 0.115 0.0924
0.0845

Proportion
of Variance 0.312 0.205 0.135 0.0871 0.0728

Cumulative
Proportion 0.312
0.517 0.652 0.7395 0.8123

The output is truncated to only include the
first five components.

You can also use “sdev”
to produce a scree plot to help decide how many components to interpret:

The variable somewhat cryptically named “x” actually
contains the axis scores for each sample on each principal component. The variable named “rotation” contains the
values for the loadings, which can be used to add arrows for loadings on the biplot.

Even though PCA is not an appropriate ordination
method for this ecological data set, here is an example of the type of biplot you can make by custom plotting. For more information on adding arrows, text
labels, and legends to plots, and information on how to vary colors, see the graphing
page. I also only plotted the ten
largest loadings (the ten most important variables for these two components)
for clarity.

__Correspondence and Detrended
Correspondence Analysis (DCA)__

This function is in the vegan package and is
called “decorana” and it requires a matrix-like
object (data frame) with samples in rows and species in columns (DCA is typically
performed on taxonomic count data, but could be applied to other types of
data). The basic function call is
simple, but you should keep in mind the default settings:

DCA.res<-decorana(dataframe) #performs DCA

The default settings are to 1) perform the detrending/rescaling procedure (ira=0),
2) not downweight rare taxa (iweigh=0),
3) perform four rescaling cycles (iresc=4), and 4)
use 26 segments for detrending/rescaling (mk=26). You generally won’t want to change the
rescaling cycles or segment number, but the first two parameters can be
adjusted. The function only returns the
first four axes.

DCA.res<-decorana(dataframe,ira=1) #performs CA (actually reciprocal averaging)

DCA.res<-decorana(dataframe,iweigh=1) #performs DCA and downweights rare
taxa

The result variable is an object containing
eigenvalues, sample and species scores, etc.
There are some things to beware, however!

The eigenvalues are reported in the summary (summary(DCA.res))
and can be obtained from DCA.res$evals. In CA, these eigenvalues represent the
inertia of each axis and are conceptually analogous to the “variance explained”
in PCA. Total inertia is not given, but
it is simply based on the chi-squared statistic for the whole data frame:

total.inertia<-chisq.test(dataframe/sum(dataframe))$statistic

You can easily calculate the proportion of
inertia for each axis. In DCA, however,
the returned eigenvalues do not sum to total inertia and don’t actually have a
lot of meaning because of the detrending/rescaling
process.

Likewise, the variables DCA.res$rproj
and DCA.res$cproj seem to contain sample and species axis scores
(which you could use for plotting), but this is only true for CA (ira=1). In DCA, the detrending/rescaling procedure distorts the scores, so you
need to use a different method.

scores(DCA.res,display="sites") #extracts sample/site axis scores

scores(DCA.res,display="species") #extracts variable/species axis scores

scores(DCA.res,display="sites",choice=1:2)
#extracts
sample/site scores for axes 1 and 2 only

scores(DCA.res,display="species",choice=2:3)
#extracts
sample/site scores for axes 2 and 3 only

To plot the results, you can just use the
default plot(…) command on the results (I added the titles),
which plots the sites as open circles and the variables/species as red crosses,
but without labels, as shown below.

For more customization, you can extract the
scores (as explained above) and plot manually as done in the PCA section. You can also use the display=c("sites","species")
option combined with points(…) or text(…) to add points or labels. For example:

plot(CA.res,type="n",main="Correspondence
Analysis") #creates
empty plot(type=”n”)

text(CA.res,display="species",col="red",cex=0.6) #adds red labels for species

points(CA.res,display="sites",pch=16) #adds filled circles for sites

Note that the taxon
names plot on top of one another, resulting in a jumbled mess. Producing a publication-quality plot here
would take some more detailed work.

__Principal Coordinates Analysis (PCoA)__

PCoA is a distance-based
ordination method that can be performed via the capscale(…) function in
vegan. You can also use cmdscale(…)
in the base installation, but you will need to produce a distance matrix from
the original data (see the cluster analysis
page for more information on distances).
PCoA is also called classical multidimensional
scaling or metric multidimensional scaling.
The major benefit of PCoA is the ability to
choose a different distance measure – when Euclidean distance is used, PCoA is the same as PCA.

The capscale(…) function is designed for another purpose, so the syntax
is a bit different than the other ordination methods, but it can be used to
perform PCoA:

PCoA.res<-capscale(dataframe~1,distance="bray")
#must specify
dataframe~1 (where dataframe is the sample/variable
data matrix) to perform PCoA

#must
specify a distance from distances provided in vegdist(…)

The vegdist(…) function has more distances, including some more
applicable to (paleo)ecological data:

Distances available in vegdist(…)
are: `"manhattan"`

, `"euclidean"`

,
`"canberra"`

, `"bray"`

, `"kulczynski"`

, `"jaccard"`

, `"gower"`

,
`"altGower"`

, `"morisita"`

,
`"horn"`

, `"mountford"`

,
`"raup"`

, `"binomial"`

or `"chao" and the default is “bray” or Bray-Curtis.`

```
You can see the eigenvalues (including
proportion contribution to distance explained) and the species and sample
scores using:
```

`summary(`

`PCoA.res)`

```
Scores can be extracted in the same fashion
as DCA:
```

`scores(`

`PCoA.res,display`

`=”sites”) `

`#or display=”species”`

`The default `

`plot(`

`PCoA.res)`

` `

```
doesn’t
include labels so you can either add points(…) and text(…) as outlined above,
or extract the scores and plot manually:
```

Notice that the PCoA
plot looks similar to the PCA plot (although rotated), because both enforce a
linear relationship that is not appropriate for ecological data like this. I’m not sure why the species scores are so tightly
clustered around the origin.

__Non-Metric Multidimensional Scaling (NMDS)__

The default syntax to perform NMDS in the vegan
package is simple – the function is metaMDS(…), but there are some defaults that may or may not be what
you want. Make sure to store the results
as a variable.

1. The default distance is Bray-Curtis, but any distance available in vegdist(…)
can be used.

For example, metaMDS(dataframe,distance=”`kulczynski`

`”)`

`2. `

```
If some counts are large, the
default is to perform a square-root transformation. A Wisconsin-style double transformation
(normalizing taxa to percent abundance and then normalizing abundances to the
maximum for each species) is also the default.
To turn off these defaults, use:
```

metaMDS(dataframe,autotransform=F`)`

3. Axes are meaningless in NMDS (because it is not an eigenanalysis ordination), so the final result is rotated
using PCA to make axis 1 (on the plot) correspond to the greatest variance
among the NMDS sample points. This
doesn’t change interpretation, cannot be modified, and is a good idea, but you
should be aware of it.

4. Choice of dimensionality is important in NMDS (and must be
user-specified) because NMDS at lower dimensions is not a projection of the
higher-dimension solution. The default
is to ordinate the points in two dimensions (k=2), but you may want to look at
three dimensions or more.

metaMDS(dataframe,autotransform=F,k=3`)`

A plot of stress (a
measure of goodness-of-fit) vs. dimensionality can be used to assess the proper
choice of dimensions, in much the same way as you could use a scree plot in
PCA.

Here is a function that produces a stress vs.
dimensionality plot:

NMDS.scree<-function(x) { #where x is the name of the data
frame variable

plot(rep(1,10),replicate(10,metaMDS(x,autotransform=F,k=1)$stress/100),xlim=c(1,nrow(x)),ylim=c(0,0.5),xlab="#
of Dimensions",ylab="Stress",main=”NMDS
stress plot”)

for
(i in 1:(nrow(x)-2)) {

points(rep(i+1,10),replicate(10,metaMDS(x,autotransform=F,k=i+1)$stress/100))

}

}

Ideally, there will be an inflection point (an
“elbow”) that you can use to choose the number of dimensions (usually k+1 if
the inflection is at point k). In
practice, as in this case, it is often less clear – should you choose three
dimensions or five? The stress values
themselves (the metaMDS results are stress*100) can
be used as an indicator. Stress values
>0.2 are generally poor and potentially uninterpretable,
whereas values <0.1 are good and <0.05 are excellent, leaving little
danger of misinterpretation. Stress
values between 0.1 and 0.2 are useable but some of the distances will be
misleading.

The metaMDS
ordination results contain sample/site axis scores (NMDS.res$points)
and variable/taxon scores (NMDS.res$species) that you can use to
make a custom plot (as with previous ordinations, the default plot has species
marked by red crosses and samples by open circles, but both are unlabeled).