As a recent believer of the works of dimensionality reduction and noise filtering, I would be averse to keep myself from preaching the good word of PCA.

Principal Component Analysis, like most other ML models, is self descriptive. Take it to mean that we are taking the principal(or most important) components of the data that is provided to us. Thusly, PCA is essentially a dimentionality reduction algorithm.

PCA is also used for noise reduction, feature extraction and feature engineering.



While it is entirely possible to learn ML with an arbitrary understanding of Mathematical concepts such as Linear Algebra, it is absolutely paramount to develop a deep understanding of the aforementioned concept so as to truly grasp the ins and outs of ML.

Higher dimension data can be reduced to lower dimensionality by zeroing some of the components(principal). The purpose is mainly to maintain maima data variance.

In this case we have opted to have two principal components and reduced the data to a two dimensional dataframe as such.

In doing this the model learns some quantities from the data such as the ‘component’ values and the ‘explained variance’

The components are obtained using while the explained variance is obtained using

Worth noting however, is that PCA does not just totally entail discarding elements of a dataset. Such naive dimensionality reduction is avoided because it may end up distorting the data. Instead PCA chooses optimal basis functions such that adding a few of them up ends up in adequate dimensionality reduction while still maintaining an ability to reconstruct the bulk of the elements of the dataset.


Think of the cumulative sum of the explained variance ratio as a function of the number of components and vice versa.

The explained_variance_ratio would give us the components that have the most variance, least variance and possibly those with no variance.

We just need to make a line plot using whereby the cumulatice explained variance would comprise the y-axis while the components would comprise the x-axis. Such a plot helps put in perspective how unimportant some of the data provided is.

Basically you would have to estimate the number of components necessary by checking for where the line graph flattens off. This represents a variance of zero.


The concept involved in this is that any components with a variance high enough that it is larger than the noise effect, will be remain relatively undistorted by the effect of the noise.

Basically we get the data that contains half or a slightly higher ratio of the variance

From this we can see how many components are involved by using

Now we reduce the data to filter out noise by using and then find the inverse so as to reconstruct the filtered data using

I am an aspiring Data Scientist who enjoys talking about the topic.