# NavList:

## A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

**Re: Rejecting outliers**

**From:**George Brandenburg

**Date:**2011 Jan 4, 11:05 -0800

Hi Peter H,

As to my point 7, I wasn't objecting to your Eq 3, which is the correct definition of chi square. My problem was with the redefinition of standard deviation in Eq 2, which I now understand better from your response.

My rewording of your Eq 2 would be that you are setting the measurement uncertainty on your altitudes to be the value of your Scatter parameter, namely 0.1', except in the case where the measurement is further from the fit line than Scatter. In this case you reset the measurement uncertainty to the residual (distance from fit line), which then down-weights the point in your fit.

This definitely has the desired effect of weakening the effect of any measurements that don't "fall in line", but I wouldn't say that it's a valid statistical procedure! It is similar in intent to the practice of multiplying fit parameter errors by kludge = sqrt(chi square/number of degrees of freedom), except when kludge is less than one. Both procedures are attempting to somehow compensate for measurements that are not conforming to a Gaussian distribution, where the sigma is given by the estimated measurement uncertainty.

But formalities aside, if there is always going to be "Kurtosis" in our measurements, maybe your method is a good one for down-weighting those measurements that look bad even though we don't know why. If this is the case, then I would set your Scatter parameter to at least the estimated measurement error, if not larger, so that the only down-weighted measurements are the ones truly in the tails.

Cheers,

George B

[NavList] Re: Rejecting outliers

From: pmh099---com

Date: 3 Jan 2011 18:38

George B,

Re: 1) Yes, your attachment contains the math that leads to equations used to compute the weighted linear fit. The iterations in my spreadsheet occur due to the need for a self-consistent determination of the weights and the fit.

Re: 2) You made a good point about how much information is really needed to plot an LOP. I characterize the entire fit (including the slope) so that users have the option of choosing any UT in range (for whatever reason) to get their averaged altitude.

Re: 6) I am fitting both the slope and the intercept, so chi_squared should be near N - 2.

For Gary LaPook's Venus data set #1 (6 data points) I calculate final chi_squared as 3.3 (ideally 4 = 6 - 2).

For Peter Fogg's Canopus data (9 data points) I calculate final chi_squared as 5.8 (ideally 7 = 9 - 2).

I will think about adding this result to inform the user how well the fit is doing. Thank you for reminding of this property; it may provide a rather rigorous way of determining the "Scatter" parameter and therefore maybe allay George Huxtable's concerns about "magic."

Re: 7) I am not sure why you would object to my Eq3, since that is the very same definition of chi_squared included in your attachment. Perhaps you see it as problematic in connection with my Eq2, which, as I acknowledged, has been introduced in lieu of unavailable bone fide standard deviations. These "effective" uncertainties are not allowed to drop below a certain threshold controlled by the "Scatter" parameter, no matter how close to any data point the current fit may pass. Initially I selected this "Scatter" value to be 0.1' consistent with the number of decimals usually given in CelNav angular data. This choice worked OK for Gary LaPook's Venus data but for Peter Fogg's Canopus data (apparently taken under more adverse conditions) I ended up using 2.5'.

You are also concerned about chi_squared becoming equal only to the number of measurements. Again, with appropriately chosen "Scatter" some (but not all!) data points will hit the 1/Scatter^2 ceiling for their weights (represented by 1.000 in the yellow column D, and overriding the weight = 1 / diff^2 relation), so the substitution of Eq3 into Eq2 will not actually result in chi_squared = N (also, see above in Re: 6).

I realize that the reasonableness of Eq2 is debatable, so if there is a more sensible way to replace the unknown sigmas then all the better! I wanted to detect a deviation from the prevailing linear trend which has the same units as the measured quantity, hence I came up with sigma_i -> | diff_i |. I will think about other monotonously increasing functions sigma -> f( | diff | ) as possible, more appropriate replacements.

Peter Hakel

----------------------------------------------------------------

NavList message boards and member settings: www.fer3.com/NavList

Members may optionally receive posts by email.

To cancel email delivery, send a message to NoMail[at]fer3.com

----------------------------------------------------------------