If outliers would be considered (which are in fact based on distance from median, and not distance from mean), the most fair way would need to factor in percentiles. Even in the subject of Statistics, there is no universal rule that states where exactly the border is between an outlier and a non-outlier, but if you did, percentiles would need to be involved. Even without a universal rule, it's standard practice (but decided arbitrarily) for outliers to be outside of 1.5 times the interquartile range.

That much be given and I don't think that's where the issue lies. I think the other problem is the way the ratings are being used. I did of course not ask every member how they factor in the ratings when browsing the database, but from a statistial point of view they are averaged in order to be comparable to each other. However, I cannot really compare an average from a sample of five reviews with the average from a sample of twenty reviews. They are different things and I cannot say that quest A is better or worse than quest B based on these two averages. At about N=30 we can assume that they are normally distributed (after having taken care of outliers) and *then* we can compare them with each other in a meaningful way. There's no getting around this porblem because the community is not big enough to gather so much data on all quests at which point I am wondering whether having the rating system in place is a useful thing at all. It *looks* like it was saying something useful, but it doesn't. So in a way it is kind of deceiving even.

