NavList:
A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding
Re: Rejecting outliers
From: John Huth
Date: 2011 Jan 1, 17:51 -0500
From: John Huth
Date: 2011 Jan 1, 17:51 -0500
I had the benefit of looking over tons of sights done by students this last semester. I think I became pretty good at catching obvious errors in their work just from the way the points were distributed. In a lot of cases, they were simple transcription errors.
I've spoken with a number of surveyors who will attest to this problem - they'll place an object to within a few mils, but will be off by 1.00 inches.
In a lot of cases, I only had to look at the students' data to spot them. In one exercise, they had to take a series of Sun shots over the course of a day and then we cooked up a parabolic fit to the data to extract the meridian altitude and also the time of meridian passage. I think I looked at 100 fits, and could spot a transcription error a mile away by the time I was done.
I did my own version of this exercise the week before, so I could make sure the assignment was correct. I found that things worked pretty well, except at the end of the day and one of my points looked like it was way off. I was feeling uneasy about it at the time, but didn't know why. I suspect it was because there was a haze that had drifted over the Sun. I still used it in my fit, however.
By the way, I know it's not common usage, but I had a professor who insisted on making a distinction between "uncertainties" and "errors". He said "errors are mistakes, we correct errors, uncertainties are part of the process of measurement." When I was teaching statistics to the students I was fairly careful to not use the word "error" unless I was specifically talking about a "mistake".
I believe that there are a number of physicists on this board, so I'll mention an experiment done at Stanford, looking for fractionally charged quarks. The group hovered a superconducting sphere, and tried to measure residual charges. They claimed evidence for fractionally charged particles. In looking over their technique, they rejected some data points because a truck passed by or something strange happened. The only problem was that they knew the value of the charge when they accepted or rejected the data. A famous physicist suggested that the numbers be randomized by a cypher so that they could accept or reject the data and then remove the cypher later on. This should convince the skeptics. They did this, but never reported the results. I managed to get hold of a copy of the thesis written by the student who did the measurements. In an appendix he described the result of the cypher version of the experiment. They didn't see *any* fractional charges - BUT, they didn't retract the original result, which I found irritating. That's a good example of the dangers of rejecting data points, but I don't know of a 100% foolproof way to do this, because there are so many variables at play. E.g. you find that a low-lying fog bank was distorting your horizon 10 minutes after taking a sighting - you already see that the shot seems off, and you have a good rationale for rejecting it, but you already know the number.