# NavList:

## A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

**Re: Rejecting outliers**

**From:**Peter Hakel

**Date:**2011 Jan 4, 12:46 -0800

I think we are in general agreement. Eq2 is indeed a "kludge" which I only adopted due to the absence of the statistically proper sigmas. If there is a better, more rigorous version of Eq2, I would certainly work with it. My initial value for "Scatter" is 0.1', a semi-educated guess for the best-case scenario. "Scatter" should not be less than 0.1' (higher for lower-quality sextants) and can certainly be higher yet if difficult conditions affect the data spread even more.

As you pointed out, an indicator of a good fit is chi_squared = N - M, where N is the number of data points and M is the number of parameters to be fitted (M=2 in my case). Equivalently, one can look at the so-called,

normalized_chi_squared = chi_squared / (N-M)

and aim at getting a number around 1 for it.

Thanks for your comments!

Peter Hakel

**From:**George B <gbrandenburg@rcn.com>

**To:**NavList@fer3.com

**Sent:**Tue, January 4, 2011 11:05:36 AM

**Subject:**[NavList] Re: Rejecting outliers

Hi Peter H,

As to my point 7, I wasn't objecting to your Eq 3, which is the correct definition of chi square. My problem was with the redefinition of standard deviation in Eq 2, which I now understand better from your response.

My rewording of your Eq 2 would be that you are setting the measurement uncertainty on your altitudes to be the value of your Scatter parameter, namely 0.1', except in the case where the measurement is further from the fit line than Scatter. In this case you reset the measurement uncertainty to the residual (distance from fit line), which then down-weights the point in your fit.

This definitely has the desired effect of weakening the effect of any measurements that don't "fall in line", but I wouldn't say that it's a valid statistical procedure! It is similar in intent to the practice of multiplying fit parameter errors by kludge = sqrt(chi square/number of degrees of freedom), except when kludge is less than one. Both procedures are attempting to somehow compensate for measurements that are not conforming to a Gaussian distribution, where the sigma is given by the estimated measurement uncertainty.

But formalities aside, if there is always going to be "Kurtosis" in our measurements, maybe your method is a good one for down-weighting those measurements that look bad even though we don't know why. If this is the case, then I would set your Scatter parameter to at least the estimated measurement error, if not larger, so that the only down-weighted measurements are the ones truly in the tails.

Cheers,

George B