Sunday, September 27, 2015

Discriminant Function Analysis

We've just finished logistic regression, which uses a set of variables to predict status on a two-category outcome, such as whether college students graduate or don't graduate. What if we wanted to make finer distinctions, say into three categories: graduated, dropped-out, and transferred to another school?

There is an extension of logistic regression, known as multinomial logistic regression, which uses a series of pairwise comparisons (e.g., dropped-out vs. graduates, transferred vs. graduates). See explanatory PowerPoint in the links section to the right.

Discriminant function analysis (DFA) allows you to put all three (or more) groups into one analysis. DFA uses spatial-mathematical principles to map out the three (or more) groups' spatial locations (with each group having a mean or "centroid") on a system of axes defined by the predictor variables.  As a result, you get neat diagrams such as this, this, and this.

DFA, like statistical modeling in general, generates a somewhat oversimplified solution that is accurate for a large proportion of cases, but has some error. An example can be seen in this document (see Figure 4). Classification accuracy is one of the statistics one receives in DFA output.

(A solution that would be accurate for all cases might be popular, but wouldn't be useful. As Nate Silver writes in his book The Signal and The Noise, you would have "an overly specific solution to a general problem. This is overfitting, and it leads to worse predictions"; p. 163 )

The axes, known as canonical discriminant functions, are defined in the structure matrix, which shows correlations between your predictor variables and the functions. An example appears in this document dealing with classification of obsidian archaeological finds (see Figure 7-17 and Table 7-18). A warning: Archaeology is a career that often ends in ruins!

[The presence of groups and coefficients may remind you of MANOVA. According to lecture notes from Andrew Ainsworth, "MANOVA and discriminant function analysis are mathematically identical but are different in terms of emphasis. [Discriminant] is usually concerned with actually putting people into groups (classification) and testing how well (or how poorly) subjects are classified. Essentially, discrim is interested in exactly how the groups are differentiated not just that they are significantly different (as in MANOVA)."]

The following article illustrates a DFA with a mainstream HDFS topic:

Hazan, C., & Shaver, P. R. (1987). Romantic love conceptualized as an attachment process. Journal of Personality and Social Psychology, 52, 511-524.

Finally, this video, as well as this document, explain how to implement and interpret DFA in SPSS. And here's our latest song...

Discriminant! 
Lyrics by Alan Reifman
May be sung to the tune of “Notorious” (LeBon/Rhodes/Taylor for Duran Duran)

Disc-disc-discriminant, discriminant!
Disc-disc-discriminant!

(Funky bass groove)

You’ve got multiple groups, all made from categories,
To predict membership, IV’s can tell their stories,
A technique, you can use,
It’s called discriminant -- the results are imminent,
You get an equation, for who belongs in the sets,

Number of functions, you subtract one, from sets,
To form the functions, you get the coefficients,
These weight the IV’s, to yield a composite score,
These scores determine, how it sorts the people,
That’s how, discriminant runs,

Disc-disc...

You can see in a graph, how all the groups are deployed,
Each group has a home base, which is known, as a “centroid,”
Weighted IV’s on axes, how you keep track -- it's just like, you're reading a map,
See how each group differs, from all the other ones there,

Number of functions, you subtract one, from sets,
To form the functions, you get the coefficients,
These weight the IV’s, to yield a composite score,
These scores determine, how it sorts the people,
That’s how, discriminant runs,

Disc-
Disc-disc...

(Brief interlude)

Discriminant,

Number of functions, you subtract one, from sets,
To form the functions, you get the coefficients,
These weight the IV’s, to yield a composite score, 
These scores determine, how it sorts the people,

Number of functions, you subtract one, from sets,
To form the functions, you get the coefficients,
These weight the IV’s, to yield a composite score,
These scores determine, how it sorts the people,
That’s how, discriminant runs,

Disc-discriminant,
Disc-Disc,
That’s how, discriminant runs,

Disc-
Yeah, that’s how, discriminant runs,

Disc-Disc,

(Sax improvisation)

Yeah...That’s how, discriminant runs,
Disc-discriminant,
Disc-disc-discriminant,
That’s how, discriminant runs,
Disc-discriminant,
Disc-disc-discriminant...