Welcome to the NavList Message Boards.

NavList:

A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding

Compose Your Message

Message:αβγ
Message:abc
Add Images & Files
    or...
       
    Reply
    Re: Rejecting outliers
    From: George Huxtable
    Date: 2011 Jan 3, 11:46 -0000

    Thanks to Peter Hakel for providing an unlocked version of his spreadsheet:
    I can now read it all, though without understanding exactly what it's
    doing.
    
    Perhaps Peter can clarify for me a point about his earlier posting, on 31
    December, in which he wrote-
    
    Eq(1):      weight = 1 / variance = 1 / (standard deviation squared)
    
    It may only be a matter of words, but it seems to me that the weight has to
    be assessed individually for each member of the set. Isn't "standard
    deviation" a measure of the set as a whole, not just a single member?
    Shouldn't Eq(1) read something like-
    "= 1/ (deviation squared)", not
    "= 1/ (standard deviation squared)"?
    
    Otherwise, I fail to follow it.
    
    I think he is giving himself an unnecessarily hard time by allowing the
    slope to be a variable in his fitting routine. What's more, he is diverging
    from an important prior-constraint of the problem, which is that the true
    slope must represent an altitude change of 32' over a 5 minute period, and
    NOTHING ELSE WILL DO. To that extent, his analysis is inappropriate to the
    problem.
    
    Knowing that variation with time, we can eliminate time from the problem
    before we even start to tackle it, by subtracting from each altitude value
    an amount that increases linearly with time, with a slope of 32',from some
    arbirary base-value, chosen for convenience.
    
    This then results in a set of nine simple numbers, of which the time and
    even the ordering is now unimportant. Peter's task then is to find some way
    of processing those numbers to determine a result that represents the true,
    unperturbed, initial value, better than a simple mean-of-9 does. In the
    case we're presented with, there's no evidence that the distribution is
    anything other than a simple Gaussian, which makes his task more difficult.
    If there were obvious "outliers", it could be more straightforward.
    
    Now for Peter's weighting function. In a simple least-squares analysis the
    weight given is the same to each observation, so the weight factor is a
    constant =1, whatever the deviation.
    
    If a limit is set, outside which data is excluded, it becomes a square
    distribution, around a best-estimate of a cenral value, within which the
    weighting is taken as 1, and outside it, either side of the centre beyond a
    deviation specified, for example, as 3 standard deviations, the weighting
    is zero. It's somewhat unphysical and arbitrary, but at least the
    conditions can be clearly specified.
    
    Peter modifies a square-box weighting function, as above, to add an
    inverse-square fall-off beyond its shoulders. Those sharp shoulders also
    seem semewhat unphysical. What I would like to follow is how the half-width
    between those shoulders relates to the standard deviation, and to his
    "scatter" parameter.
    
    It seems that Peter wishes to leave it to the individual, to choose the
    scatter parameter that's most appropriate to a particular data set, after
    viewing some results. If I understand that right, he hasn't yet eliminated
    all the "magic" from the operation.
    
    George.
    
    contact George Huxtable, at  george{at}hux.me.uk
    or at +44 1865 820222 (from UK, 01865 820222)
    or at 1 Sandy Lane, Southmoor, Abingdon, Oxon OX13 5HX, UK.
    
    ----- Original Message -----
    From: "P H" 
    To: 
    Sent: Monday, January 03, 2011 1:17 AM
    Subject: [NavList] Re: Rejecting outliers
    
    
    | George,
    |
    | I forgot to mention that you can unlock the spreadsheet by turning off
    its
    | protection, there is no password.  I attach the unlocked version, so you
    can
    | skip that step.  The PNG file is an image, a screenshot of the
    color-coded
    | portion of the spreadsheet where input and output data are concentrated.
    This
    | is the part with which a user would interact; I attached it for those
    readers
    | who may be interested in the main points but don't want to bother with
    Excel in
    | detail.  My Excel is Office 2004 for Mac, so hopefully compatibility will
    not be
    | a problem.
    |
    | I gave the details of the procedure in:
    |
    | http://www.fer3.com/arc/m2.aspx?i=115086&y=201012
    |
    | Step 1: Calculate the standard (non-weighted) least-squares linear fit
    through
    | the data.
    |
    | Now iterate:
    | Step 2: Calculate altitude differences "diff" between the data and the
    latest
    | available linear fit.
    | Step 3: Calculate new weights as 1 / diff^2 for each data point.
    | Step 4: Calculate a new linear fit using weights from Step 3.
    | Repeat until convergence.
    |
    | "diff" could turn up small, or even zero, which would cause numerical
    problems;
    | that is why the weights have a ceiling controlled by the "Scatter"
    parameter.
    | The weight=1.000 means that the data point has hit this ceiling and
    contributes
    | to the result with maximum influence allowed by the procedure.  For
    | Scatter=7.0', all nine data points reach this ceiling and we are stuck at
    Step
    | 1.
    |
    | I have not gotten around to fitting Gaussian-scattered data, as you
    suggested.
    | I may do so in the future, time permitting.
    |
    | The procedure of weighted-least squares has a very solid theoretical
    background,
    | see, e.g.,
    |
    | http://en.wikipedia.org/wiki/Least_squares#Weighted_least_squares
    | http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
    |
    | The one admittedly heuristic detail is Step 3; weight should be
    1/variance, but
    | that is unknown in this case.  Step 3 seems like a reasonable
    replacement,
    | effectively substituting |diff| for the standard deviation of altitudes
    at the
    | given UT.
    |
    | This procedure is capable of eliminating lopsidedness to a certain
    extent, as I
    | have shown previously.  However, if there are too many "lopsided" data
    points,
    | the result will follow them to the new "middle" defined by them.  I don't
    know
    | how we would weed that out without additional information about where the
    | correct "middle" really is.  As I said earlier, in the absence of an
    independent
    | check, we must rely on the assumption that a sufficient majority of data
    points
    | are "good."
    |
    | My motivation was to see what information can be extracted from the data
    set
    | alone, without any additional information such as DR.  I think that Peter
    Fogg's
    | approach of precomputing the slope is fine, and would most likely give a
    better
    | practical result.  After all, "position tracking" is preferable to
    "position
    | establishing from scratch" and it is indeed what happens in real life.
    But I
    | think you will agree that academic curiosity has its benefits, too. :-)
    |
    |
    | Peter Hakel
    |
    |
    |
    |
    |
    | ________________________________
    | From: George Huxtable 
    | To: NavList@fer3.com
    | Sent: Sun, January 2, 2011 3:12:28 PM
    | Subject: [NavList] Re: Rejecting outliers
    |
    | I'm failing to understand some aspects of Peter Hakel's approach, though
    it
    | least it seems numerical, definable, and repeatable.
    |
    | First, though, I should say that problems arise when viewing his Excel
    | output. It looks as though some columns need to be widened to see their
    | contents, but it seems that the file is deliberately crippled to prevent
    me
    | taking many of the actions I am familiar with in Excel. But I should add
    | that mine is an old 2000 version, which may be part of the problem. The
    | alternative .png version opens with "paint" but only allows me to see the
    | top-left area of the sheet. Is that all I need?
    |
    | Now to more substantive matters-
    |
    | I don't understand how the various weighting factors have been derived,
    and
    | why many of them are exactly 1.0000, when others are much less.
    |
    | Presumably, they depend on the divergence of each data point from some
    | calculated line, which is then readjusted by iteration, but I have failed
    | to follow how that initial straight-line norm was assessed, or what
    | algorithm was used to obtain the weights. Answers in words rather than in
    | statistical symbols would be most helpful to my simple mind.
    |
    | You seem to have ended up with a best-fit slope of about 24' in a
    5-minute
    | period, as I did when allowing Excel to make a best-fit, when giving it
    | freedom to alter the slope as it thought fit. But the slope can be
    | pre-assessed with sufficient accuracy from known information, and unless
    | there is some error in the information given, such as an (unlikely) star
    | mis-identification, we can be sure that the actual slope is nearer to
    32',
    | and the apparent lower figure is no more than a product of scatter in the
    | data. This is a point that Peter Fogg keeps reiterating, perhaps the only
    | valid point in all he has written on this topic.
    |
    | As a result, we could, if we wished, subtract off that line of known
    | constant slope from all the data, and end up with a set of numbers, all
    of
    | which should be roughly equal, simply scattering around some mean value
    | that we wish to discover. Then the statistical task of weeding outliers
    | becomes somewhat simpler.
    |
    | =================
    |
    | If you apply your procedure to an artificially-generated  data-set,
    | scattering in a known Gaussian manner about a known mean (which could be
    | zero), and known to contain no non-Gaussian outliers, what is the
    resulting
    | scatter in the answer? How does it compare with the predicted scatter
    from
    | simple averaging? I suspect (predict) that it can only be worse, though
    | perhaps not by much.
    |
    | This is the way I picture it. If there is any lopsidedness in the
    | distribution, then each added observation that is in the direction of the
    | imbalance will be given greatest weight, whereas any observation that
    would
    | act to rebalance it, on the other side, will be attenuated, being further
    | from the trend. So there will be some effect, however small, that acts to
    | enhance any unbalance, though probably not to the extent of causing
    | instability. Does that argument make sense to you? It could be checked
    out
    | by some Monte Carlo procedure.
    |
    | I presume that the proposed procedure is entirely empirical, and has no
    | theoretical backing, though there may not be anything wrong with that, if
    | it works.
    |
    | George.
    |
    | contact George Huxtable, at  george{at}hux.me.uk
    | or at +44 1865 820222 (from UK, 01865 820222)
    | or at 1 Sandy Lane, Southmoor, Abingdon, Oxon OX13 5HX, UK.
    |
    |
    |
    
    
    
    
    

       
    Reply
    Browse Files

    Drop Files

    NavList

    What is NavList?

    Join NavList

    Name:
    (please, no nicknames or handles)
    Email:
    Do you want to receive all group messages by email?
    Yes No

    You can also join by posting. Your first on-topic post automatically makes you a member.

    Posting Code

    Enter the email address associated with your NavList messages. Your posting code will be emailed to you immediately.
    Email:

    Email Settings

    Posting Code:

    Custom Index

    Subject:
    Author:
    Start date: (yyyymm dd)
    End date: (yyyymm dd)

    Visit this site
    Visit this site
    Visit this site
    Visit this site
    Visit this site
    Visit this site