A Community Devoted to the Preservation and Practice of Celestial Navigation and Other Methods of Traditional Wayfinding
From: Frank Reed
Date: 2018 Oct 21, 06:56 -0700
Using polynomial fits for observational data can be grossly misleading and even dangerous. Noise in the data, which is always present with observational data, may be severely exaggerated and amplified by a polynomial fit. But you have discovered here one important use, which is interpolation (and narrowly-limited extrapolation) of exact, calculated data, like the positions in an ephemeris.
A use-case in celestial navigation: suppose you want to generate a Nautical Almanac equivalent database of positions of the Sun and planets. The standard N.A. lists the positions on an hourly basis. You interpolate linearly between hourly pairs to get intermediate values. That hourly tabulation of data is efficient and interpolation is easy, but it might lead to a database larger than you can manage in some device context. Instead, you can list positions at 6-hour intervals for Venus and Mars and daily intervals for the Sun and other navigational planets (even larger intervals are possible!) and then interpolate quadratically (equivalent to a quadratic "regression" for values in some interval of time), and you will reproduce the accuracy of the almanac. The result is a much smaller database. The cost is very low since computation is "cheap" in almost all contexts in the 21st century.
Decades ago, back when desktop computers were just becoming common and in the era when users still had to purchase optional "math coprocessors" for their machines, the Nautical Almanac office briefly published annual "Chebyshev polynomials" for the ephemerides of the planets. This was a compact and clever system, but the data distribution capabilities were not up to the task (diskettes were not as convenient as the internet) and the cost of computation was still very high. It was a worthy experiment, but it fizzled. Today, coders "roll their own" versions of this sort of thing, and there is a trade-off that between data size and the polynomial-order of interpolation that depends on the limitations of the system that you're coding for. No single solution fits all needs.
Back to the original case (running regressions on observed altitudes), there are many problematic issues with this concept. Consider an observer in the North Atlantic. This observer is not lost, so the latitude is known within half a degree of uncertainty, in the most extreme case of uncertainty. If you take some sights over a ten-minute interval of a star climbing in the east, you will know the rate at which its altitude will increase beforehand, by calculation. You'll be able to compare your observed altitudes of that star rising over ten minutes with a line that has a known slope. You slide your data points up or down until they have the best possible fit to that known slope. That is your best estimate of the behavior of the object's altitude in this case (note: not near the meridian in this case). From that you'll be able to interpolate or extrapolate (within limits) to some chosen time. By contrast if you run a regression on those altitude points, the math will determine its own slope for the line that "best fits" all the points, and that slope can easily be strongly skewed by normal noise in the data. You should not run simple regressions on observed altitudes.