TN 0075: Data Set T

Andrew Bennett
Mar 2, 2001
Keywords: photometry astrometry

Version 1, 2001 Feb 19.

Version 2, 2001 Feb 26: Photometric Corrections

Version 3, 2001 Mar 2:

This is a first write-up of work on the 65 CD ROM data set that Tom has given out to four of us to keep us amused ...

The first analysis run

The current version of my program does the following:

1) Reads the name of the first data file on the CD and checks to see if this CD has already been processed. If it has not, a new CD information file is set up with a list of all the data files together with their median data values and the median value of some edge pixels; from these, flags are generated and stored in the same information file. The flags are include the colour band and a best guess as to whether a particular file is image or dark data. If the program gets it wrong (one detected case so far) one can manually edit the CD information file and try again ... hopefully before too much screwed up data has been merged into the final output!

2) Finds the first dark file of each colour. The names of these files give the names of the processed Dark and Flat files. If these files are not found, they are generated by the usual median process. If there are no dark files on the CD (one case so far) the most recent previous Dark file is used instead.

3) Processes the image data files as listed in the information file set up in 1) above. For each image file, there is a corresponding image information file: if one does not yet exist, it is set up with bare minimum data from the FITS header. If the image information file exists, the program checks for a 'processing complete' flag. If this is set, processing procedes to the next file.

4) The image file, after Dark/Flat processing, is processed as previously described in Coma to give a least-squares estimate of the PSF and of the amplitudes of the stars. The amplitudes are converted to magnitudes using a calibration based on the Tycho2 catalog: a simple median of all the sources which match. A much better conversion is done in the post-processing (below). I see that Arne has come up with a new version of the magic numbers required to convert Tycho B & V to TASS V and I. I am still using his older numbers

Pictures

One of the numbers that the analysis program comes up with is a measure of the total system noise, sigma. This depends on the sky background and exposure time and on the underlying system noise that Tom has worked so hard to reduce. Here are plots of the consecutive measures, in time sequence.

V Sigma I Sigma

For comparison, here are the magnitude calibrations: the estimated magnitude corresponding to a one adu source. The larger the value of Mag_0, the longer the exposure must have been - I have not yet parsed Tom's log files to find what the exposure really was.

V Mag0 I Mag0

I think these make some kind of sense (apart from the details!): the lowest noise corresponds to the lowest Mag_0 values which are the shortest exposures. The higher sigma levels are with the system still cooling down or with higher sky brightness near dusk and dawn. In most cases, where the system has cooled down properly, the overall noise is dominated by the sky contribution. Only those with much darker skies need worry about the electronic noise.

The only things I read from the FITS header at the moment are the reputed RA and Declination - these are needed by the astrometry program so as to know where to start matching. The FITS header positions are not very good ...

V position error I position error

Oops! That didn't come out too well the first time on my browser! Check what you are getting! RA is red and Dec is blue. The scale on the left running -6 degrees to +6 degrees applies to both sets of RA and the scale on the right, 0.6 to 1.8 degrees is for declination. My browser initially plotted the two figures overlapping a bit in the middle. Your mileage may vary - let me know if they are coming out garbled on your browser and I'll see what I can do! Anyway, it looks as though the RA in the header, which changes from image to image even when the images are really at the same RA, is the RA that the telescope would have been looking at if it had been pointing south rather than where it is actually pointing. Something for Tom's list of things to fix.

Mostly this doesn't matter too much. Michael Richmond's amazing matching package doesn't really care so long as one specifies a big enough search box. I have not yet managed to get these programs to run with my Pascal system (it is now said to be possible ...) and my replacement routines are a lot fussier. So I have spent too much time hand-feeding my matching routines. This is mostly because my programs assume that the orientation stays the same. It doesn't.

Rotation

This plot gives the angle of the image relative to N-S/E-W. The units are funny: 0.0218 of my funny units = 0.01 radians counterclockwise. The V data is plotted in green and I in red. The trends within each set of images and during the course of a night's observations suggest that the alignment is less than perfect. This would presumably begin to show in the image quality for longer exposures but doesn't make much difference for the lengths of exposure used in this data set.



Assuming that the star images should be more or less round, the E-W elongation of the central cores of the star images gives an estimate of the RA driving errors. These used to be huge: 4 pixels peak to peak for some of the early CDs that Tom released and who knows how much worse on the others! This data set ain't at all bad.

Trail

The units are pixels. Yes - there is a minus sign and I think I may have plotted peak rather than peak-to-peak. But it's still pretty small apart from one night's data. V is green and I is red, as before.



The overall quality of an image is best judged by the Neff factor (which I discussed elsewhere)

V Neff I Neff

Focus

Looking at just the central core gives a measure of how well focussed things are. The measure plotted is the square of the radius measured in pixels. The central core is arbitrarily defined as containing half the total flux, the other half being the halo.


Halo

This is the radius of the outer part of the star image. Or rather, the square of the radius. If you click back, you will see that it correlates quite strongly with Neff: far more strongly than the inner core radius does. I have no idea what this means - whatever the changes in the lens are, they leave the central core sharp while defocussing the outer halo.

Yet more confusion is yet to come;



V Misalignment I Misalignment

The misalignment plotted is the separation in pixels between the centres of the core and of the halo at the centre of the image. For the V images, misalignment correlates very strongly with Neff. For I, the correlation is much stronger between halo radius and Neff.

I have labelled this quantity "misalignment" because obvious sources are off-centre lens elements, elements mounted at an angle and tilt of the lens assembly. The measured values seem rather large to me. The variation between different night's runs is a puzzle and the variation within some night's runs a complete mystery!

Photometric Corrections

Cleaning and sorting the raw star list.

Various horrid things happened during the processing - in accordance with the well-known rule that all significant programs have at least one bug. Despite my best endeavours, the analysis had a few problems getting from 359 degrees to zero degrees and the routine that sorted final results in RA continues to believe that, for example, 123.999875 lies between 124.0 and 125.0. It also believes that 359.999875 lies between 0.0 and 1.0. The first steps in processing the raw star lists, therefore, is to resort the garbled star lists back into order of RA. Oddly, the code that does this successfully is identical to the code that produced the xxx.999 problem in the first place.

Cleaning Finding List.

The Finding List started out as the Tycho2 catalog; stars were added as they were found during processing. This is a hairy enough procedure at best, because of problems with close pairs. With the RA folding problems, things got distinctly messy. And then there is a bug, so far not tracked down, which occasionally loses large chunks of the Finding List. I have been reduced to writing a new program to regenerate the Finding List from the raw star list. Not nice. The current version starts from the Tycho2 catalog, as before, and adds any star seen in the same place a sufficient number of times - "the same place" being defined as a circular area fairly large compared to the 0.2 arcsec astrometric accuracy but smaller than the 15 arcsec image size. Yes - that's pretty subjective. The current version ignores the flags set by the fitting routines.

At this stage, I still have all the observations, good bad and indifferent, and a list of places where I might find enough observations to do something with. The observations include the flags set at various stages of the processing. These range from disastrous ("Star outside image area", "No fit") through warnings ("Bad pixel excluded from analysis") to the informative {"One of pair"). It was hoped that some analysis could be done on some of the flagged sources. For this preliminary analysis, it quickly emerged that there were problems with some sources even with the mildest of "informative" flags set so all flagged sources have been excluded from further analysis. This is a pity: the most serious loss is the weak sources that were sometimes found (having a pixel above threshold) but more often added from the finding list (setting a flag). This applies to a large number of very red sources for which the V magnitude is close to the analysis limit even though the I magnitude is bright enough for precise analysis.

Extraction of paired observations: Preliminary Welch-Stetson statistics.

It is then possible to extract all the data for stars that are seen in both V- and I-bands on more than a specified number of occasions. Welch-Stetson statistics can be calculated for these batches of paired data. A quick search for GCVS stars at this point gave some matches! Wow!! The whole system works ... well, more or less. Closer examination showed lots more "variables" - far more than could possibly have escaped notice! Most of these were relatively bright; stars for which the estimated random error in magnitude was around 0.002 magnitudes. This happens because I was using the estimated random error in computing the Welch-Stetson statistics when the error at this point is actually dominated by the systematic variations from image to image, which are a lot worse than 0.002 magnitudes.

Photometric correction

The sets of paired observations already extracted give mean V and I magnitudes. Subtracting these give an estimate of the V and I residuals. Not a perfect estimate because there is no reason to believe that the systematic errors average to zero - we may have to revisit this point! Anyway, pretending for the moment that MV - MVbar does give a useful estimate of the error, we can collect all the estimates for one image. Each estimate comes with an estimated standard deviation calculated during the fitting process. Then we can fit e.g. 2-D polynomials in X1, X2 giving a mean error polynomial for each image that has enough stars meeting the criterion of a minimum number of paired measurements. Subtracting the polynomials gives, hopefully, corrected magnitudes.

The first fits turned out to be fairly plausible. This is actually a bit of a surprise since the data do not really determine a 2-D correction! For most of the images, the telescope was set to the same declination so that most observations of one star have about the same value of X1. The observations do not determine the X1 variation. Quite a few images have paired observations meeting the selection criterion only for a limited range of RA/X2 and so do not determine the X2 variation either. There were, indeed, some spectacularly awful "fits": coefficients of 30,000 for terms expected to be less than 0.05.

Replacing the regular polynomial fit with a Singular Value Decomposition routine lifted from "Numerical Methods" completely solved this problem. This routine produces identical results to the regular method if it can; unlike the regular method, it knows when it is failing! It then does the equivalent of fitting fewer coefficients until things work again. Absolutely painless with just one adjustable parameter that it uses to judge when things have fallen apart. All I had to do was crank up this parameter till the divergent cases were recognised. Over the complete set of images, fitting the 21 parameters of a 2-D 5th order polynomial, up to 10 parameters were dropped in the worst cases but the average loss was only 0.2 parameters. Amazing!

I expected this process to pick out the images with the tree branches. It doesn't. The fits all have remarkably similar residuals whereas one would expect tree branch images to give significantly different fits with larger residuals. Um! Maybe the tree branch effect is being masked by the ice crystal effect?

All data: errors normalised to the estimated random error.
          V     I
Raw  pe  1.44  2.39
Fit  pe  1.00  1.15 expect 0.674
     sd  1.74  2.13 expect 1.000

Outliers pruned at 3 sigma.
Raw  pe  1.43  2.37
Fit  pe  0.98  1.12 expect 0.674
     sd  1.56  1.80 expect 1.000

Lots of things to notice here!

Outlier pruning removed points outside 3 standard deviations. With the numbers of points involved, one would have expected to lose about one point per image for a Gaussian distribution. The actual loss was around 4 points per image. This is about as close to a Gaussian as I have ever seen in the real world! The ratios of probable error to standard deviation are also quite close to those expected for a Gaussian.

The probable error after fitting is well under twice the theoretical amount. The initial probable error is only some 40% greater. Most of this difference is absorbed by the first order fit. Adding orders of fit beyond the first gives small but statistically significant improvements. For example, the mean coefficients of all orders including the 5th are mostly significantly different from zero. There is some suggestion in this steady improvement of fit with order that there is more to be gained by a more complicated correction process. However, one would not expect to achieve the theoretically expected values:

1) There really are some variable stars out there!

2) I think my code underestimates the "theoretical" values ... where "Star" overestimates the error in estimating the background level, my code probably underestimates it. I will get back to this when I have fixed the more serious bugs!

This suggests that there is not a great deal more to be had. Not quite true!

Standard Deviations of the Paired Observations

Here are some standard deviation measures for an uncrowded region (RA 0 to 10 degrees):

V sd

V-band

Yes: we have the notorious noise floor. One expects far smaller errors for the brightest sources. The standard deviation should be around 0.002 magnitudes for the brightest sources. Instead it is more like 0.01 magnitudes.



I sd

I-band

Same sort of thing in I with the added twist that the observed standard deviation actually increases for the brightest sources. They can't all be variables. These sources are not close to saturation - all such sources were flagged and were not used in the analysis. Suggestions, please!



V sd: crowded

Now for a crowded area (290 to 300 degrees). I-band.

Quite a bit more scatter. Not too unexpected.



I sd: crowded

I-band

That strange increase in error for the brightest sources is still there. It's just masked by the greater scatter.



Welch-Stetson statistics.

The corrected magnitudes give much better looking Welch-Stetson statistics: more GCVS variables turn up where they should and a lot fewer bogus variables are found. I still have a fiddle factor in my code to boost the error estimates for the bright sources: even after the photometric correction, there are still some problems ...

Welch-Stetson

Here are the results for the uncrowded area. With around 40 pairs, one expects most Welch-Stetson I-parameters to fall in the range -0.5 to 0.5 for constant sources of any magnitude. The actual range is rather greater than this for the faint sources. This is exactly what one would expect in view of the observed higher than theoretical noise. The observed range is a good deal smaller for the bright sources: this is the direct result of my having artificially boosted the error estimates for these sources. Obviously, I have rather overdone things. I'll do better next time. Most of the positive outliers are variable stars.



Welch-Stetson; crowded

Pretty similar for the crowded area. Not much suggestion that the increased standard deviations for bright stars is the result of there being lots of bright variables.

Most of the points with a Welch-Setson paramater above two are probably variable. Yes - there are lots of them. Not to mention the really variable ones off the top of the plot; several hundred of them in all.

NOTE: they are not all variables. I've found at least one case where it looks as though a pair of stars was sometimes seen as one (with no error flags) and sometimes as two (without setting the "pair" warning flag.) That's one out of the 60 I've looked at ... out of 240,000. There may be a few more.


Variable Stars

The Welch-Stetson statistics picked out 60 stars out of 240,000 with an I-parameter greater than 20, even with my fiddle factor. Without the manual increase in the error estimate supplied for the brightest sources, the number would have been very much greater: too great for me to look at them all. 24 out of the 60 are in Arne Henden's collection Tassvars.cat; most of these are in GCVS. Of the remaining 36, at least one is probably the result of the varying resolution that also plagued TENXCAT. Most, however, look like real variables with similar properties to those of the catalog variables. Here are three of the catalog variables; for these, I have periods so I am able to plot out the cyclic variation. Since these are all short period variables, the raw plot against time is pretty messy. Aliased to hell and back.

I have plotted all the data twice so as to bring out the cyclic pattern. I'm sorry to say the magnitude increases upwards which is backwards. I do know how to get it to go the other way but I'd have to change all 63 plots and I am too lazy.


RY Psc

RY Psc: period = 0.5297 days



RR Cet

RR Cet: period = 0.553 days

During the very rapid rise in brightness (decrease in magnitude) the change in magnitude between exposures is well resolved.



CY Aqr

CY Aqr: period = 0.06104 days

The period is so short that one can see changes in magnitude between adjacent exposures even in the less rapidly changing part of the cycle. Times are not heliocentric so this might clean up a bit if things were done right.

Yes - there are two kinds of points plotted. The solid ones are the ones with no flags and the crosses had some kind of information flag. Points with more serious error flags have been removed.