- NAME
PCA
- PURPOSE
Carry out a Principal Components Analysis (Karhunen-Loeve Transform)
- EXPLANATION
Results can be directed to the screen, a file, or output variables
See notes below for comparison with the intrinsic IDL function PCOMP.
Harris Geospatial has a video/blog post on using pca.pro at
http://tinyurl.com/h6ky6qy . Also see David Fanning's discussion of
PCA analysis with IDL ( http://www.idlcoyote.com/code_tips/pca.html )
- CALLING SEQUENCE
PCA, data, eigenval, eigenvect, percentages, proj_obj, proj_atr,
[MATRIX =, TEXTOUT = ,/COVARIANCE, /SSQ, /SILENT ]
- INPUT PARAMETERS
data - 2-d data matrix, data(i,j) contains the jth attribute value
for the ith object in the sample. If N_OBJ is the total
number of objects (rows) in the sample, and N_ATTRIB is the
total number of attributes (columns) then data should be
dimensioned N_OBJ x N_ATTRIB.
- OPTIONAL INPUT KEYWORD PARAMETERS
/COVARIANCE - if this keyword is set, then the PCA will be carried out
on the covariance matrix (rare), the default is to use the
correlation matrix
/SILENT - If this keyword is set, then no output is printed
/SSQ - if this keyword is set, then the PCA will be carried out on
on the sums-of-squares & cross-products matrix (rare)
TEXTOUT - Controls print output device, defaults to !TEXTOUT
textout=1 TERMINAL using /more option
textout=2 TERMINAL without /more option
textout=3 .prt
textout=4 laser.tmp
textout=5 user must open file
textout = filename (default extension of .prt)
- OPTIONAL OUTPUT PARAMETERS
eigenval - N_ATTRIB element vector containing the sorted eigenvalues
eigenvect - N_ATRRIB x N_ATTRIB matrix containing the corresponding
eigenvectors
percentages - N_ATTRIB element containing the cumulative percentage
variances associated with the principal components
proj_obj - N_OBJ by N_ATTRIB matrix containing the projections of the
objects on the principal components
proj_atr - N_ATTRIB by N_ATTRIB matrix containing the projections of
the attributes on the principal components
- OPTIONAL OUTPUT PARAMETER
MATRIX = analysed matrix, either the covariance matrix if /COVARIANCE
is set, the "sum of squares and cross-products" matrix if
/SSQ is set, or the (by default) correlation matrix. Matrix
will have dimensions N_ATTRIB x N_ATTRIB
- NOTES
This procedure performs Principal Components Analysis (Karhunen-Loeve
Transform) according to the method described in "Multivariate Data
Analysis" by Murtagh & Heck [Reidel : Dordrecht 1987], pp. 33-48.
See http://www.classification-society.org/csna/mda-sw/pca.f
Keywords /COVARIANCE and /SSQ are mutually exclusive.
The printout contains only (at most) the first seven principle
eigenvectors. However, the output variables EIGENVECT contain
all the eigenvectors
Different authors scale the covariance matrix in different ways.
The eigenvalues output by PCA may have to be scaled by 1/N_OBJ or
1/(N_OBJ-1) to agree with other calculations when /COVAR is set.
PCA uses the non-standard system variables !TEXTOUT and !TEXTUNIT.
These are automatically added if not originally present.
The intrinsic IDL function PCOMP duplicates most of the
functionality of PCA, but uses different conventions and
normalizations. Note the following:
PCOMP requires a N_ATTRIB x N_OBJ input array; this is the transpose
of what PCA expects
PCA uses standardized variables for the correlation matrix: the input
vectors are set to a mean of zero and variance of one and divided by
sqrt(n); use the /STANDARDIZE keyword to PCOMP for a direct comparison.
PCA (unlike PCOMP) normalizes the eigenvectors by the square root
of the eigenvalues.
PCA returns cumulative percentages; the VARIANCES keyword of PCOMP
returns the variance in each variable
PCOMP divides the eigenvalues by (1/N_OBJ-1) when the covariance matrix
is used.
- EXAMPLE
Perform a PCA analysis on the covariance matrix of a data matrix, DATA,
and write the results to a file
IDL> PCA, data, /COVAR, t = 'pca.dat'
Perform a PCA analysis on the correlation matrix. Suppress all
printing, and save the eigenvectors and eigenvalues in output variables
IDL> PCA, data, eigenval, eigenvect, /SILENT
- PROCEDURES CALLED
TEXTOPEN, TEXTCLOSE
- REVISION HISTORY
Immanuel Freedman (after Murtagh F. and Heck A.). December 1993
Wayne Landsman, modified I/O December 1993
Fix MATRIX output, remove GOTO statements W. Landsman August 1998
Changed some index variable to type LONG W. Landsman March 2000
Fix error in computation of proj_atr, see Jan 1990 fix in
http://www.classification-society.org/csna/mda-sw/pca.f W. Landsman Feb 2008
- Constants
- are array elements near-zero ?
- Dispatch table
- Constants
- are array elements near-zero ?
- umber of objects and attributes
- form column-means
- form column-means
- Carry out eigenreduction
- D contains diagonal, E contains off-diagonal
- D contains the eigen-values, A(*,i) -vectors
- Use TOLERANCE to decide if eigenquantities are sufficiently near zero
- Order by increasing eigenvalue
- Eigenvalues expressed as percentage variance and ...
- .. Cumulative percentage variance
- efine returned parameters
- Output eigen-values and -vectors
Open output file
- Obtain projection of row-point on principal axes (Murtagh & Heck convention)
- Use TOLERANCE again...
- Obtain projection of column-points on principal axes
- Use TOLERANCE again...
- scale by square root of eigenvalues...
- Close output file