# NavList:

## A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

**Re: Rejecting outliers**

**From:**Peter Hakel

**Date:**2011 Jan 11, 18:55 -0800

In response to the comments from George Huxtable and Gary LaPook I produced a new weighted least-squares spreadsheet that allows the navigator to use a precomputed slope. Thus I now have two types of spreadsheets:

average1.xls (one fitted parameter, precomputed fixed slope. Only the difference in cell F3 actually matters in the calculation.)

average2.xls (two fitted parameters, including the slope - i.e. the original version)

This new average1.xls is also different from average2.xls in that it takes Hs (and not Ho) on input in column B. The precomputed altitudes in cells F2 and F3 are expected to take into account all effects such as refraction, declination change, vessel motion, etc. The calculated average altitude therefore must be processed as any other Hs in order to generate the Ho and, subsequently, the LOP.

For Gary's Venus data I calculate the average:

Hs = 10d 28.3' at 9:16:05 (see attached average1_venus.xls, Scatter=0.2')

Ho = 10d 16.3'

The resulting intercept is 0.5A and azimuth 102.6.

My original average2.xls (fitted slope) yielded intercept 0.7A and azimuth 102.7 (with Scatter=0.1').

The differences between these two results are minuscule, which is mainly due to the high quality of Gary's observations.

Peter Fogg's Canopus data have more scatter, which makes for an interesting study. There is no declination change and no vessel motion, therefore I filled columns B in both spreadsheets with the same altitude values. This allows for a direct comparison between the results of the two encoded fitting methods working with the same data. Each spreadsheet calculates a slightly different average UT; hence I calculate the average altitude for time 5:29:00 adopted by Peter Fogg (see attached):

average1_canopus.xls: 66d 30.4'

average2_canopus.xls: 66d 27.7'

These two values are noticeably different from each other due to the difference between the precomputed slope obtained from DR (average1) and the result of the two-parameter fit in average2. Peter Fogg's used his graph to adopt the altitude 66d 30' which agrees well with my average1 result. Therefore, at least in this one case, his plot and eye guided him to essentially the same answer that was produced by a statistically very well-grounded numerical procedure. In summary:

1) weighted-least squares method: detects outliers by deciding the importance (weight) of each data point,

2) weights are determined by a procedure that relies on the prevailing trend in the data set (this part is heuristic but reasonable),

3) data points are not simply accepted or rejected; the weights operate in a continuum of the degree of their influence,

4) the "Scatter" input parameter is not really free; the user is guided by the value of the normalized Q^2 computed on output, which is a standard criterion for judging the quality of the fit. The weights in column D also carry information about what and how the fit is doing.

So there is really no "magic" left in this fitting and averaging method, unless one wants to nitpick on item 2), in which case I am open to suggestions for improvement.

I think that the main point of this whole exercise is that it provides an argument against dismissing Peter Fogg's graphical method of averaging sights as somehow ad hoc or arbitrary. In fact, the opposite appears to be true; our "pattern-recognizing neural network" (more commonly called "eyes+brain" :-) ) involved in his procedure has a similarly well-performing counterpart, which is based on good mathematics.

Peter Hakel

average1.xls (one fitted parameter, precomputed fixed slope. Only the difference in cell F3 actually matters in the calculation.)

average2.xls (two fitted parameters, including the slope - i.e. the original version)

This new average1.xls is also different from average2.xls in that it takes Hs (and not Ho) on input in column B. The precomputed altitudes in cells F2 and F3 are expected to take into account all effects such as refraction, declination change, vessel motion, etc. The calculated average altitude therefore must be processed as any other Hs in order to generate the Ho and, subsequently, the LOP.

For Gary's Venus data I calculate the average:

Hs = 10d 28.3' at 9:16:05 (see attached average1_venus.xls, Scatter=0.2')

Ho = 10d 16.3'

The resulting intercept is 0.5A and azimuth 102.6.

My original average2.xls (fitted slope) yielded intercept 0.7A and azimuth 102.7 (with Scatter=0.1').

The differences between these two results are minuscule, which is mainly due to the high quality of Gary's observations.

Peter Fogg's Canopus data have more scatter, which makes for an interesting study. There is no declination change and no vessel motion, therefore I filled columns B in both spreadsheets with the same altitude values. This allows for a direct comparison between the results of the two encoded fitting methods working with the same data. Each spreadsheet calculates a slightly different average UT; hence I calculate the average altitude for time 5:29:00 adopted by Peter Fogg (see attached):

average1_canopus.xls: 66d 30.4'

average2_canopus.xls: 66d 27.7'

These two values are noticeably different from each other due to the difference between the precomputed slope obtained from DR (average1) and the result of the two-parameter fit in average2. Peter Fogg's used his graph to adopt the altitude 66d 30' which agrees well with my average1 result. Therefore, at least in this one case, his plot and eye guided him to essentially the same answer that was produced by a statistically very well-grounded numerical procedure. In summary:

1) weighted-least squares method: detects outliers by deciding the importance (weight) of each data point,

2) weights are determined by a procedure that relies on the prevailing trend in the data set (this part is heuristic but reasonable),

3) data points are not simply accepted or rejected; the weights operate in a continuum of the degree of their influence,

4) the "Scatter" input parameter is not really free; the user is guided by the value of the normalized Q^2 computed on output, which is a standard criterion for judging the quality of the fit. The weights in column D also carry information about what and how the fit is doing.

So there is really no "magic" left in this fitting and averaging method, unless one wants to nitpick on item 2), in which case I am open to suggestions for improvement.

I think that the main point of this whole exercise is that it provides an argument against dismissing Peter Fogg's graphical method of averaging sights as somehow ad hoc or arbitrary. In fact, the opposite appears to be true; our "pattern-recognizing neural network" (more commonly called "eyes+brain" :-) ) involved in his procedure has a similarly well-performing counterpart, which is based on good mathematics.

Peter Hakel