Version 5, 2001 Feb 16:
Ancient and Historical Fits have been relegated
to the bottom of the page. Don't bother with them!
All sorts of things including basic fitting theory and
comparisons between least squares fitting methods and the
aperture fitting method used by "Star" have been moved to
a separate page
The Current Model
The current model uses the sum of two elliptical Gaussians.
The ellipses could be characterized by their major and minor axes,
a and b, and the angle of the major axis. However, this formulation
gives serious numerical problems with indeterminate angles for nearly
circular sources. Instead, an ellipse is here characterized by a
radius - the radius of the circle having the same area - and
two components of ellipticity:
The two-ellipse model thus has 8 parameters per source; radius and two ellipticities for each of the two ellipses plus the two components of the separation between their centres.
A realistic description of the variation of the radius requires a quadratic.
r^2 = quadratic in X1, X2
This requires six terms; constant, X1, X2, X1^2, X1X2, X2^2. These give a considerable range of behaviours from the plain and boring;
through the mundane:
to the downright bizarre i.e. just about enough range to cope with Tom's lenses.
The minimum number of parameters needed to describe the variation of each component of the displacement and ellipticity components across the image in a moderately realistic manner is three: constant plus X1 and X2 gradients.
Constant ellipticity can represent trail in RA while matched linear gradients can give either a radial pattern ...
... or a tangential one. Quite complicated behaviour can be modelled.
The 8 parameters per source thus require a minimum of 30 parameters to represent them across the image: 2x6 for the two radii and 6x3 for the 6 displacement and ellipticity components. Put them all together and one can do a fair job of representing not just a centred image but even the real thing!
The fit on the right above is for an actual V image. I've labelled it typical, which in this case does not mean the best I could find! The fit is still inadequate especially at the bottom (South) of the image where the central core and comatic tail don't overlap in this representation. The situation isn't disastrously bad - the components don't really have sharp edges and they extend well past the contour that is shown in the diagrams which is at exp(-0.5) or 0.6065 compared to the peak value for the component but one would like to add another component. Really, however, things are not too bad! The components have been magnified by a factor of 200 compared to the size of the image: the red circles in the synthetic fit on the left, supposedly on the same scale, are 2 pixels in diameter.
I originally used the non-linear Simplex method (Amoeba in "Numerical Methods") which may be charitably described as thrashing around until a fit is found. For N parameters, this required of order 6N^2 evaluations. Early attempts had horrible problems with false optima and the like. It is now realised that these problems arose from attempting to fit ill-chosen parameters or from fitting a single source to data that was really two bright sources. Careful choice of parameters, a few tricks like replacing x^2 by x^2 - 1/3 so as to have a zero mean over -1..1 and not using stars with close companions allows the use of simpler methods which do not work at all in the general case. If all the parameters were orthogonal over the image area, the optimum could be found in around 3N evaluations; with the current choice of parameters, a reasonable fit can be reached in around 9N to 20N evaluations starting from a good initial estimate - a large time saving for N = 30 over the 6N^2 of the Simplex method.
The current method consists of adjusting each parameter in turn by the standard naval gunner's method of one short, one over and one down the funnel. This takes just 3N+1 evaluations to do every parameter, if the starting guess is good enough. If the parameters were all orthogonal, this would arrive at the solution. In the real world, one iterates.
In detail, fitting proceeds as follows:
1) Sources are selected from the cleaned, Dark/Flat processed image that are below saturation (arbitrarily defined after examining values for obviously saturated sources) and above some arbitrarily chosen value for their brightest pixel. Mean X1 and X2 and amplitude A defined as the sum of all connected pixel values above some threshold are computed.
2) These sources are matched with the Tycho2 catalog. This gives preliminary astrometry and photometry.
3) The brightest 30 or so sources that do not have close bright neighbours according to the Tycho2 catalog are selected.
4) A trial PSF is set up.
5) For each source, a goodness-of-fit parameter is computed, optimizing X1, X2 and A using the current PSF.
6) An overall goodness-of-fit parameter is computed. The individual goodness-of-fit parameters are examined; if the worst source is much worse than the typical sources, that source is eliminated and the process restarted at step 4). This gets rid of, for example, close pairs that the catalog matching process missed.
7) PSF parameters are systematically adjusted on the basis of the overall goodness-of-fit parameters.
8) Steps 5), 6) and 7) are repeated until the change resulting from a complete pass through all the parameters is sufficiently small. Or until the owner of the computer gets tired of waiting.
9) The optimal positions for all the sources, obtained using the optimum PSF, are rematched with the Tycho2 catalog to give final astrometry and improved (one hopes) photometry. The probable errors in RA, DEC and magnitude then provide assurance that the whole process has been successful. Or do not, as the case may be.
The latest fits have overall goodness-of-fit parameters around 1%. That is, if I have the scalings right, the parameters that I am fitting account for 99% of the variance. This is actually not very good. 1% residual variance implies residuals around 10% which is nowhere near as good as is required for things like image subtraction, I think.
Those with long memories will remember that I was claiming 1 to 2% goodness-of-fit a long time ago ... a lot of bugs have been found since then, including quite a large factor in scaling residuals. These results are a lot better than the old ones. Honest!
Here is a key to the solid mass of figures. Or you may just look at the pretty pictures!
R1 R1_1 R1_2 R1_11 R1_12 R1_22 e11 e11_1 e11_2 e12 e12_1 e12_2 R2 R2_1 R2_2 R2_11 R2_12 R2_22 e21 e21_1 e21_2 e22 e22_1 e22_2 D1 D1_1 D1_2 D2 D2_1 D2_2
R1 is the radius of the core of the source. Well, actually it is the square of the radius, in pixels. R2 is the square of the radius of the halo. Both components are, as discussed above, taken to be elliptical Gaussians: the radius is the radius of the circular Gaussian of the same area.
The suffices _1, _2 etc denote the coefficients of the X1, X2 terms. X1 and X2 have been scaled to the range -1 to +1 so the numerical values of all terms are the peak contribution of that term, in pixels. e11, e12 are the two components of ellipticity of the core; e21, e22 the two components for the halo. D1, D2 are the two components of the displacement between core and halo.
An ideal system might give parameters:
R1 R1_1 R1_2 R1_11 R1_12 R1_22 e11 e11_1 e11_2 e12 e12_1 e12_2 R2 R2_1 R2_2 R2_11 R2_12 R2_22 e21 e21_1 e21_2 e22 e22_1 e22_2 D1 D1_1 D1_2 D2 D2_1 D2_2 0.6 0 0 0 0 0 0 0 0 0 0 0 1.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The best lens so far:
R1 R1_1 R1_2 R1_11 R1_12 R1_22 e11 e11_1 e11_2 e12 e12_1 e12_2 R2 R2_1 R2_2 R2_11 R2_12 R2_22 e21 e21_1 e21_2 e22 e22_1 e22_2 D1 D1_1 D1_2 D2 D2_1 D2_2 CD15: H3R1659.828 0.621 -0.067 0.101 0.026 0.101 0.085 -0.023 -0.042 0.007 -0.339 -0.194 0.059 1.232 -0.161 0.264 0.174 0.309 0.172 -0.026 -0.050 -0.008 -0.082 -0.084 0.037 1.080 -0.686 0.132 -0.143 -0.058 -0.841
The corresponding I-band lens.
H4R1659.828 0.880 0.066 0.209 -0.032 0.089 0.051 -0.020 0.006 0.062 -0.181 0.032 0.108 2.176 0.388 0.804 -0.109 0.109 0.040 -0.050 0.030 -0.027 0.047 -0.034 0.030 -0.788 -0.482 -0.028 -1.068 0.022 -0.767
For the old comatic lenses:
CD10: H3R1591.592 0.752 -0.019 0.158 0.064 0.067 0.190 0.006 -0.061 0.014 -0.093 -0.008 -0.062 3.014 -0.343 0.639 0.829 0.465 1.233 -0.021 0.092 0.076 -0.060 0.002 0.064 0.073 -2.495 0.217 -1.193 -0.118 -1.899
H4R1591.592 0.704 0.029 0.051 0.155 -0.025 0.067 -0.006 -0.015 -0.008 -0.067 0.023 -0.005 4.130 -0.076 0.193 0.637 0.024 1.070 -0.086 -0.073 -0.084 -0.034 0.062 -0.106 0.471 -2.073 -0.030 -0.344 -0.074 -1.910
The early MK IV images had a lot of coma. The stars in the corners were approximately circular with a bright core sitting on the edge of the circular comatic patch furthest from the centre of the image. The edge of the comatic image was quite well defined and the centre of the comatic patch was slightly less bright than the edge. My first model attempting to capture this behaviour consisted of superposed doughnuts.
All right: that's not a very clear diagram. Here is a cross section:
The big difference from the Mk III is that the MK IV PSF varies a lot with position in the image. My first model had just 2 radius dependent parameters; radial and tangential length scales for the coma. Now there are 8 externally adjustable parameters ... and lots of internal fiddles. My 8 parameters are:
Sigma. A width parameter for a gaussian core near the centre of the image.
SigmaFac Width = Sigma*Sqrt(1 + SigmaFac*R^2). This gives a good approximation to the variation of core width across the image. R is the radius, defined below.
The PSF is actually built up by summing "pancakes".
Herbert Johnson suggests:
Actually, a closer metaphor would be the "towers of Hanoi" stack of disks. It is also a child's toy, a simple stack of disks of increasing size on a peg.O.K. I'll go along with that. Thanks, Herbert.
All the disks have the same gaussian blurring parameter. The current model uses 4 disks with fluxes forming a geometric progression. The central "disk" is the gaussian core; it does not have a flat central region as the others do. The positions and sizes of the intermediate disks are tied to those of the outer disk and do not introduce additional parameters. Adding a fifth disk gave only an insignificant further improvement.
RScale, DScale
The radii of the disks are RScale*Sqrt(R^2 + ROff^2)*Internal parameter In previous versions of the model, the radii were taken as
RScale*R*Internal parameter
The factor ROff is required for two reasons. Firstly, the
original formula is not admissible - any physically admissible
form must have d/dR = 0 at R = 0.
Secondly, the new form gave a significantly improved fit for
small radius in a preliminary fit that gave ROff = 0.1. This value
appeared non-critical so it is currently fixed at this value
DScale controls the displacement of the
doughnut centre from the position of the core:
Displacement = DScale*R*Internal parameter.
It turns out that
DScale is numerically about the same size as RScale. As the coma
is towards the centre of the image, DScale is actually negative.
(L1[1],L2[1]), (L1[2],L2[2]). These two points specify
a 3-point convolution used to model trailing. In previous models,
uniform linear trailing was assumed. This method is more flexible.
The third point (L1[3],L2[3]) is obtained by setting
L1[3] = -(L1[1] + L1[2]); L2[3] = -(L2[1] + L2[2])
[2000 Feb 18: cut down to a 2-point convolution with only two
adjustable parameters. For the excellent images used, the improvement
with the extra point was insignificant. This would definitely not
be true for some of the other images.]
Two additional parameters have also been fitted but are currently held constant:
C1, C2. Actual optical centre of the image from which P is measured. Especially in the case of the V-band images, the four corners look somewhat different. Even with the V-band images, where the difference between the centre of the image and the optical centre is quite small, the fit is improved significantly. [These parameters were allowed to vary for the fits of 2000 Feb 18. This gave a significant improvement especially for Lens #2 (see below). This may or may not mean that the camera went together differently each time!]
Initial individual fits to bright (but not saturated) sources gave correlation coefficients of over 0.98 [2000 Feb 18: that is now up to 0.99 over the whole image.] between the fitted a_i and the directly measured response. I was astonished since I had a pretty steep learning curve to climb with the Mk III fits starting with correlation coefficients around 0.7. Better yet, the fit parameters didn't vary much between the different bright sources. Wow!! Coma can be modelled!
I have cleaned up my Simplex optimization code so that it can now do a global fit of the 8 PSF parameters at the same time as optimizing X1, X2 and amplitude for over a hundred bright sources. That's optimizing over 300 parameters. Over 400 if you count the implicit background estimates. The time is still around a minute per bright source for the global fit. As I have remarked before - "there are no slow computers; only slow algorithms". I don't intend to let it worry me that Tom can produce 30 images per camera in the time it takes me to analyse one!
Values of Neff computed for the fits to the V-images on CD7 are as follows
NOTE: 2000 Feb 18. There was a bug in the code that detected convergence; some of these fits are not as well converged as they should have been ...
Lens #1; no extra spacing; 0.010" focussing steps
File centre R=0.5 R=1 corner H3R1551.506 8.8 21.3 42.0 66.5 H3R1551.508 6.7 17.2 34.6 54.6 H3R1551.510 7.2 18.2 36.6 58.0 H3R1551.512 12.4 28.0 52.3 80.1 H3R1551.513 15.4 30.4 49.5 67.9 H3R1551.515 24.0 42.9 66.2 87.5 H3R1551.517 53.9 81.3 116.0 146.4 H3R1551.518 153.5 193.4 249.1 292.1 H3R1551.520 177.6 218.3 278.6 327.2
Lens #1; 0.016" extra spacing; 0.010" focussing steps
File centre R=0.5 R=1 corner H3R1551.463 13.3 29.2 55.4 86.5 H3R1551.465 10.4 24.1 46.8 73.6 H3R1551.466 13.7 30.1 55.2 83.1 H3R1551.468 8.0 19.8 39.6 62.9 H3R1551.470 8.0 19.8 39.3 62.3 H3R1551.471 9.3 22.2 43.3 68.0 H3R1551.473 90.3 129.4 167.6 185.2 H3R1551.475 83.3 133.5 178.9 197.5 H3R1551.477 135.4 206.3 272.4 300.2
Lens #1; 0.032" extra spacing; 0.010" focussing steps
File centre R=0.5 R=1 corner H3R1551.533 117.6 167.9 228.8 271.0 H3R1551.534 31.5 51.1 69.1 77.2 H3R1551.536 17.8 32.3 46.0 52.9 H3R1551.538 14.4 27.3 39.7 46.1 H3R1551.540 8.2 17.2 26.4 32.1 H3R1551.541 8.0 16.8 26.2 32.3 H3R1551.543 21.6 38.1 55.5 68.0 H3R1551.545 51.9 76.8 102.1 116.7 H3R1551.547 134.8 171.9 214.5 235.7
Lens #2; No extra spacing; 0.005" focussing steps
File centre R=0.5 R=1 corner H3R1551.707 22.8 40.8 58.8 69.8 H3R1551.709 14.7 28.6 42.6 50.8 H3R1551.711 14.6 28.3 42.3 51.0 H3R1551.713 9.5 20.1 30.6 36.5 H3R1551.715 8.8 18.8 28.7 34.3 H3R1551.717 10.2 21.1 32.3 39.4 H3R1551.719 13.1 25.9 39.4 48.9 H3R1551.721 21.3 38.2 56.4 69.5 H3R1551.723 36.7 59.3 82.8 98.3
More fully converged fits for the best focus cases: 2000 Feb 18
Lens File centre R=0.5 R=1 corner #1 no extra H3R1551.508 6.4 14.5 28.7 45.2 0.016" extra H3R1551.470 7.7 18.8 37.0 58.1 0.032" extra H3R1551.540 6.4 14.1 28.2 44.9 #2 H3R1551.715 5.5 13.8 29.6 48.6
1) The position of best focus (smallest Neff) is the same all the way from the centre to the corners in every case. This is good news: the field of the lens is flat.
2) As can be seen by comparing the original fits with the latest ones, there is a good deal of uncertainty. In the original fits, Lens #1 without any extra spacer came out sharper than #2 in the middle and rather worse in the corners. In the final fits, this is reversed, and #2 is a little sharper in the middle. If there is any real difference, it is small. The final fits appear to confirm the odd result for Lens #1 that the response is worse with the 0.016" extra spacer than it is either with no extra spacer or with 0.032" extra. I think this is a real effect but I am not sure. It would help if my fitting program provided some estimate of the accuracy of its fit. The residuals are converged to one part in ten thousand. This corresponds to no better than 1% at best for the most significant fitted parameter and probably 10% or worse for most. And even that convergence is overkill compared to the statistical uncertainty even with 40,000 degrees of freedom ...
The least squares fitting program uses a 3-point convolution to take out the effect of trailing. Two points are fitted independently and the third is calculated from them so as to keep the centre of gravity of the central core of the source fixed. If all three points were allowed to roam, the fitted source position wouldn't mean much! The fitting program does not care how the points are numbered so, for easier comparison, I have shuffled them into X2 order. The estimated trail parameters for the four sharpest images are
File L1[1] L2[1] L1[2] L2[2] L1[3] L2[3] H3R1551.508 -0.259 -0.717 0.052 0.088 0.207 0.630 H3R1551.470 -0.279 -0.525 -0.130 -0.140 0.409 0.664 H3R1551.540 -0.273 -0.561 0.004 -0.033 0.270 0.594 H3R1551.715 -0.199 -0.829 0.248 0.387 -0.049 0.443
The total trail is a bit over one pixel (7.5 arc seconds) which is impressively good and very much better than for earlier images. The trail is also very consistent, which implies that it will get even better when Tom finally has nothing better to do than tweaking up the alignment and drive rates! Some of the other CD6 and CD7 images, not selected here, show larger trail (up to 3 or 4 pixels) so not everything is perfect. Yet.
The other four fitted parameters for the same four sharpest images are
File Sigma SigFac RScale DScale H3R1551.508 0.549 1.113 5.9 -6.4 NEff 6.4,14.8,30.2,48.6 H3R1551.470 0.594 0.926 5.9 -6.5 NEff 7.7,18.8,37.0,58.1 H3R1551.540 0.566 0.962 5.3 -5.4 NEff 6.3,13.7,27.6,44.2 H3R1551.715 0.573 0.699 5.3 -5.4 NEff 6.9,14.8,28.1,43.2 11 parameters, with C1, C2 H3R1551.508 0.556 0.976 5.1 -6.4 NEff 6.4,14.5,28.7,45.2 H3R1551.540 0.563 0.955 5.3 -5.4 NEff 6.4,14.1,28.2,44.9 H3R1551.715 0.514 1.222 5.3 -5.4 NEff 5.5,13.8,29.6,48.6
RScale and DScale vary very little and are both numerically about 6 pixels; the comatic images are about 12 pixels in diameter at the edges of the images. The negative values of DScale reflect the fact that the coma points towards the centre of the image. The similar numerical values of RScale and DScale expresses the fact that the bright core is on the edge of the approximately circular comatic image. The broad comatic disk/pancake contains about one third of the total flux.
Computation time is still awful; an hour or two per image on a Pentium. Ah well - there are no slow computers, only slow algorithms. My Simplex code got pretty creaky with 8 parameters and now I am fitting hundreds. There are faster methods. Computers are getting faster. Not a long term problem.
The non-linear Simplex method is quite robust (after one removes various bugs such as the one which occasionally came up with a negative sum of squares ...) but it can (and does) still home in on a local minimum which is quite different from the desired optimum. Every "solution" therefore has to be checked - the standard technique is to run the whole thing twice with different starting points but even that is not foolproof, let alone PhD-proof.