Since writing my earlier sample statistics blog post I’ve learned a few things and thought I’d provide an update. Firstly, a quick recap.

For a population the population variance is defined by

where

and for a sample the sample variance is defined by

where

The fact that is used instead of is because that’s what makes the **average** sample variance equal to the population variance, when the samples are taken with replacement allowed (as discussed in detail in my first sample statistics blog post).

What I didn’t consider in the previous blog post was the case where the samples are taken without allowing repetition (which is often the way that real life sampling is done). I didn’t because at the time I didn’t know how to perform the relevant analysis. Since then I’ve figured it out (and, as far as I know, it’s not explained elsewhere). It turns out that the divisor isn’t and it’s not either. It’s

Here’s how to see that, together with some ideas that I think are conceptually helpful when dealing with these matters. Continue reading “Sample Statistics Part 2”