Something I've been working on has me calculating averages over time on data arriving into a system. I don't know beforehand how many pieces of data there will be, which makes the calculation difficult. This operation needs to be done for both floating point and integers.
The floating point solution turned out to be fairly simple:
double average = 0d; int count = 0; // when a number comes in, do this: average += (new_number - average) / ++count;
Here's how it works: Realize that when a new number comes in, it is going to pull the average slightly higher or lower. That difference is this calculation. Since
countis always growing this means that numbers arriving later have far less influence on the average unless they are fairly large (as it should be).
The integer solution looks different even though it is essentially the same thing. What complicates the problem is that I want to avoid using floating point math if at all possible. There are several different computations that would work, but here is the one that seemed to follow the true average as much as possible:
long average = 0, remainder = 0; int count = 0; // when a number comes in, do this: count++; average += (new_number + remainder) / count; remainder = (new_number + remainder) % count;
Here is how it works: since I cannot add incremental pieces of a whole number as they arrive (as I did in the floating point example), those incremental chunks are tracked in
remainder. Eventually, if the inputs are fairly uniform, enough delta builds up in
remainderto add or subtract a number from the average, or it happens right away if the new number is significantly higher or lower than the average.
I wanted to verify correctness of the algorithm, so I wrote a simulator. I found early on, the averages vary from the true average a bit, but this variance goes away as
countincreases. To compensate for this, I keep track of the sum of all new_numbers and calculate average the normal way until either
countovercome predetermined thresholds.