Someone from an IRC (Internet Relay Chat) channel needed a set of data normalized, all he knew was that it fell loosely into a bell curve (or was supposed to anyway). With that information, we can take the data set and calculate the equation needed to graph the bell curve. After we have the equation we then compare the difference between the value ‘y’ with the corresponding one generated by the equation.

Our sample data set for now is:

1
[0, 0, 0, 0, 0, 0, 6, 13, 15, 18, 20, 19, 17, 15, 16, 13, 11, 11, 9, 7, 4, 2, 0, 0]

(You will find that this is quite a small data set for any kind of accurate statistical curve matching. The resulting curve will appear off due to the few datapoints. You are welcome to try a larger data set. But I digress…)

First we need to know WHAT the equation for a bell curve is. I found this to be the most fitting for normalizing small data sets.

1
(sigma * sqrt(2 * pi) * e)^(-((x - mu)^2 / (2 * sigma)^2)

Where ‘sigma’ is the standard deviation and ‘mu’ is the mean (simple average) of the data.

So the first step we need to take is to calculate the standard deviation. This is generally found by following 3 steps: Find the mean of the data, use that to find the variance, and then take the square root of the variance. The script below should explain a bit better.

1
2
3
4
5
6
simple_mean = sum(data)
squared_differences = []
for i in data:
squared_differences.append((i - simple_mean)^2)
variance = sum(squared_differences) / (len(squared_differences) - 1)
std_deviation = sqrt(variance)

I trimmed the data set I was given by removing some outliers, in this case the leading zeros (save for one). Given the small data set and extreme outliers the curve was far too skewed to be useful (Although technically it was correctly normalized). So the new dataset I have now is:

1
[0, 6, 13, 15, 18, 20, 19, 17, 15, 16, 13, 11, 11, 9, 7, 4, 2, 0]

Using those numbers I arrived at the following results:

mu ~= 10.9
sigma ~= 6.5

This gives us the final equation of

1
(6.5 * sqrt(2 * pi) * e)^(-((x-mu)^2 / (2 * sigma)^2)

Desmos Link | Interactive Graph

This image above will show you a working graph should the Desmos Link fail in the future.

TODO :: Write the script to compare y values. (Part 2 maybe?)

2017-12-15

⬆︎TOP