Central Limit theorem

Statistics: A major concept in building ML(https://blog.capterra.com/)

There are many times when a situation arises where we are unsure whether the distribution we have is normally distributed. Clearly, at this point, we need some theorem that would give us the freedom to some extent that irrespective of what the distribution is, we finally get the output as a normal distribution.

The central limit theorem applied to any distribution(Source: Wikipedia)

The CLT theorem helps us in this regard where whenever an independent random variable is given to us and if their sum is properly normalized it will tend to give us the normal distribution utilizing CLT. This theorem is significant as it implies that the statistical methods and concept that functions perfectly for normal distribution can be used and applied over the other types of distribution as well.

Normally Distributed Histogram(Source:https://www.lsssimplified.com/normal-distribution-for-lean-six-sigma/)

So what actually happens in CLT, we can understand this easily in simple process flow Suppose we have been given a distribution i.e a uniform distribution where we have equal probability of selecting values within certain limits/range. We can collect a random 20 values from this distribution and then calculate the means of the samples and then on another graph we draw a histogram for those particular mean. Since we have only one value, the histogram is not very impressive but after we start to collect some more samples and their means and plot it over histogram, it starts to look more interesting, and as we keep on increasing the sample size with respective means and plot them over histogram and now we would see that these means over graph are now normally distributed.

We can repeat this experiment for any kind of distribution such as exponential etc it turns out that it does not matter what distribution we start with, if we collect the samples from those distributions the means would be normally distributed.

So now the final question is what is the practical implication of knowing the fact that the means are normally distributed?

Small concept brings great impact on learning(Source:https://www.facebook.com/pg/STBILeaders/photos/)

When we experiment we do not know from which distribution our data would come. The CLT theorem provides an immediate solution to this as we know the sample means would be normally distributed we need not care from which distribution the data came. Now, this normal distribution can be used further to perform a T-test, make confidence interval, and ANOVA.

To summarize, from the above we now know that how simple the CLT theorem is but how effectively it works to perform our further statistical analysis but just performing some simple function over the given set of distribution.