NavList:

A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

HOME

Re: Rejecting outliers
From: George Brandenburg
Date: 2011 Jan 3, 15:14 -0800

In response to posts by Antoine, George H, Peter H, and John H, I'd like to make a few comments about the least squares solution to the problem of several sights taken at different times. As has been noted, in this case the altitude measurements can be assumed to have a linear dependence on the time, provided the time interval isn't too great.

1. This is a simple example of the case where the function describing the data is linear in its parameters. Here the two parameters are the slope (a) and the intercept (b). In this case there the solution can easily be found algebraically - there is no need to perform any iterations. If you are interested in the algebra see the attachment.

2. For the case where both the slope and the intercept are to be determined, the result can be most easily summarized (as Antoine noted on Jan 1) by noting that the fit line goes through the "pivot" of the data, namely the point defined by the average altitude measurement and the average time of the measurements. If you also want the slope you need to make additional calculations (see attachment), but this isn't necessary to determine the LOP.

3. If you assume the value for the slope is defined and only find the intercept using the least squares method, the result is exactly the same as in point 2, the fit line goes through the pivot.

4. The least squares method is the best estimator of the fit line, in fact it is equivalent to the maximum liklihood method with Gaussian errors assumed for the measurements. But as pointed out in 2 and 3, because the of the assumed linear dependence of the altitude on the time, it is also equivalent to the intuitive method of finding the pivot point by simply averaging the altitudes and times (assuming you don't require the slope value).

5. The averages that are taken in the calculation should be weighted averages, where the weights are given by 1/standard deviation^2 (as in Eq 1 from Peter H on Dec 31). The standard deviation of each measurement is the best estimate of the uncertainty in each altitude measurement. If all the measurements are estimated to have the same uncertainty, which should usually be the case, then the weights are all the same, and the fitted result will not depend on them. However, the propagated uncertainty on the fit result and the value of chi square will depend on the estimated measurement uncertainties.

6. As I noted in a "cocked hat" post, the chi square value for the fit to the measurements can be calculated and could in principle be a useful estimator of the "quality" of the set of measurements. In particular if the slope is assumed to be fixed, then for an "average" set of measurements chi square should have a value near N-1, where N is the number of measurements. If chi square is much larger than N-1 then either the measurement uncertainties were underestimated or the set of measurements includes some particularly unlucky (or poorly done) points. (If both slope and intercept are calculated then the average value of chi square should be N-2.)

7. I seriously disagree with Eqs 2 & 3 from Peter H on Dec 31. First of all the least square method is only valid if the weights are based on sensible estimates of the measurement uncertainty. (This can be seen by its connection with the maximum liklihood method as noted in point 4.) Second PH's Eq 2, which equates the standard deviation to the "residual" or the distance of the measurement from the fit line, is a very bad estimator of the uncertainty of a specific measurement. In particular if the measurement happens to fall right on the fit line it assigns it an absurd uncertainty of zero (and infinite weight). Finally when PH's Eq 2 is substituted into chi square in Eq 3, the numerator and denominator for each term cancel with the result that chi square is equal to the number of measurements and nothing else. I may have badly misunderstood what is intended here, but nonetheless I conclude that the correct best estimate is just the simple one given in points 2 and 3 above.

8. And last but not least I agree with my experimental physicist colleagues George H and John H that discarding measurements is a dangerous business and should be done only when there is reason to believe that a measurement was faulty. And this should preferably be done before looking at the results.

Sorry to run on so long - I hope this clarifies a few things without having further muddied the water.

Happy New Year!
George B

----------------------------------------------------------------
NavList message boards and member settings: www.fer3.com/NavList
Members may optionally receive posts by email.
To cancel email delivery, send a message to NoMail[at]fer3.com
----------------------------------------------------------------

File:

Subject:
Author:
Start date:	(yyyymm dd)
End date:	(yyyymm dd)

NavList:

A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

Compose Your Message

NavList

What is NavList?

Get a NavList ID Code

Retrieve a NavList ID Code

Email Settings

Custom Index

Add Images & Files
Name or NavList Code:	Email:

Name:
	(please, no nicknames or handles)
Email:

NavList ID Code: