Python for Data Science - Explanatory factor analysis

Chapter 5 - Dimensionality Reduction Methods

Segment 1 - Explanatory factor analysis

Factor Analysis

A method that explores a data set in order to find root causes which explain why data is acting a certain way

Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

Factor Analysis Assumptions

  • Features are metric
  • Feature are continuous or ordinal
  • There is r > 0.3 correlation between the features in your dataset
  • You have > 100 observations and > 5 observations per feature
  • Sample is homogenous

The Iris Dataset

Iris flowers(labels):

  • Setosa
  • Versicolour
  • Virginica

Attributes (predictive features):

  • Sepal length
  • Sepal length
  • Petal length
  • Petal width

Factor Loading

  • ~ -1 or 1 = Factor has a strong influence on the variable
  • ~0 = Factor weakly influences on the variable
  • '>1 = That means these are highly correlated factors
import pandas as pd
import numpy as np

import sklearn
from sklearn.decomposition import FactorAnalysis

from sklearn import datasets

Factor analysis on iris dataset

iris = datasets.load_iris()

X = iris.data
variable_names = iris.feature_names

X[0:10,]
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])
factor = FactorAnalysis().fit(X)

DF = pd.DataFrame(factor.components_, columns=variable_names)
print(DF)
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0           0.706989         -0.158005           1.654236           0.70085
1           0.115161          0.159635          -0.044321          -0.01403
2          -0.000000          0.000000           0.000000           0.00000
3          -0.000000          0.000000           0.000000          -0.00000
posted @ 2021-01-24 15:21  晨风_Eric  阅读(111)  评论(0编辑  收藏  举报