Technical Note 81: Data Set 20

Andrew Bennett

Version 1, 2002 Feb 3

Version 2, 2002 Feb 11

Version 3, 2002 Feb 25

Older versions below

I've made some major changes in the Ensemble code and things are beginning to work a little better. The biggest change is to add in an estimate of how bad each image is: the analysis code generates an estimate of the magnitude error for each source and it looks as though this is fairly reasonable for a good image but wildly low for the poorer images - the ones with high sky background levels indicating haze or clouds. I am now scaling the estimated error by a somewhat arbitrary factor based on the measured noise level and sky backround level. It works! Weighting down the poorer data of the D2239 and D2246 observations allows them to be combined with the excellent D2199 data. This, as will be seen, allows the detection of longer period variables.


New raw Welch-Stetson


Here are the new raw Welch-Stetson statistics. The improvement over the original processing is very significant even without the Ensemble correction. Compare this with the original plot below.  The scatter is much reduced: note especially the big reduction in the number of spurious negative outliers. Every point is plotted here whereas extreme outliers in both directions were cut out of the original plot (it can now be revealed ... )


New ensemble Welch-Stetson


Ensemble corrections, which amount to as much as 0.2 magnitudes in some places (flat fielding? what's that??) and first order colour corrections give further improvement. Note the reduction in the number of points with Welch-Stetson statistic around one to two. Close examination showed that almost all of these points (39 out of the 42 I looked at before I got tired of looking) were in one corner of the D2239 image where the ensemble correction was around 0.2 magnitudes in both Visual and Infrared. When plotted, the raw magnitudes look just like a longer period variable but everything straightens out with the ensemble correction.


The Latest Variables ... and Maybes

The final Ensemble-corrected Welch-Stetson plot has just 10 with the statistic greater than 1.0 so I have plotted all of them. Note that the X-axis is "image number" which is a count starting from one. 1-42 are D2199; 58-89 are D2239 and 100 up are D2246 (I didn't analyse all of D2246 - I got impatient to look at the results!) Note also that there are 22 or 23 days between D2199 and D2239, not 40 as one might suppose. And maybe 8 days between D2239 and D2246. Or thereabouts.


SP Variable


This is the same one found previously below.  A large amplitude short period variable.


Eclipser?


This one was also seen before: see below.  This one appears constant, give or take the scatter, on the second and third days. An eclipser?. Remember I am still plotting magnitude increasing upwards so that rise means the star is getting fainter.


Another SP


The third short period variable seen before.


Another SP


A new one, not measured on D2199. The variation in magnitude on D2239 is highly significant. This is another short period variable.


LP n1271


Longer period - not varying appreciably in two hours.


LP n1147


Longer period. The variation during the two hours of observations on D2199 is probably significant so the period is not all that long.


LP n1148


Longer period.


LP n1162


Or is this one just residual calibration errors?


LP n2448


And another doubtful one.


LP n1288


Um! Constant except for those ragged points on D2246.


End of New Section

Changes and improvements in processing

I am slowly getting things organised to run from a DOS Batch File. Not very exciting stuff and things are still pretty fluid, needing lots of manual intervention, so I won't call it a "Pipeline" just yet and I won't bore you with all the details - they will all have changed by next week. One major change is that all data organisation is done manually, by putting files into systematically named directories. The attempt to write a program that could work out for itself what was on a CD and do the right things with the data, as attempted with the Data Set T data has been abandoned. Now, when V data comes out with an I file name, one simply sticks it in a properly named V directory and the problem is solved. Bad files (clouds, sunrise ...) are simply deleted: not easy to do with the old system that worked directly from the CD.

There have been some major changes in the programs. My touchy matching routines have been replaced at last with Michael Richmond's robust ones, which I translated into Pascal. They got through Data Set 20 without a glitch, though I have not yet tested them on some of the earlier data that completely bamboozled my routines.

The entire PSF based magnitude evaluation has been rewritten to do things more consistently. The previous version evaluated the PSF for points that were above a threshold. This meant that more points were used for the brighter stars, giving magnitude dependent errors. Overlapping of two stars was also judged by examining points above a threshold so faint sources had to be much closer together before their closeness was detected. The new algorithm first computes an estimate of position and then defines the PSF evaluation window and possible overlap relative to this position. New improved code was also supposed to cope with a source having up to two close neighbours ... but see below!

Results of 2002 Feb 3

Does all this do any good? Well - here is a plot of the V magnitude standard deviation:

V Sigma

If you compare this with the corresponding plot for Data Set T, you might persuade yourself that I am doing a little better. Not exactly the major improvement I'd been hoping for.


I Sigma

Here's the corresponding I data


Ensemble Photometry

Another major improvement should result from recalibrating the photometry using the same ensemble of calibration sources on each image. Using the same sources each time should eliminate systematic errors arising from errors in the calibration source magnitudes and using an ensemble centred on the position of the source being measured should drastically reduce errors arising from the variation of the true calibration across the image that arises from the inadequacies of the flat fielding. Does it?

Ensemble V Sigma

Well - if you compare this closely with the plot above, you might see a small improvement around magnitude 11 ... and a small deterioration around magnitude 9! But what is really obvious is that there are far fewer sources in the ensemble plot. The ensemble consists only of sources that appear in all the images containing the source being calibrated and which have no error, warning or information flags. This gets rid of most sources!

It therefore looks as though I have to look at inhomogeneous ensembles - calibration groups that do not contain exactly the same sources on each image. I just don't know how to program this. Given stars S_i on images Im_j with measurements MV_ij, MI_ij with flags FV_ij, FI_ij, select a subset of i, j giving the optimum calibration ... or even any reasonable calibration. My simple minded algorithm - throwing away any star with any flag set - left insufficient stars for calibration. So I need a more complicated algorithm. I presume that this is a known problem and somebody can give me a solution.


To give some idea of the problem, consider the following:

Observed flags

170076 No Flags 
 43235 BadLow: fit diverged to small amplitude
  6119 BadFit: large fit residuals
  4117 BadPix: One or more hot/noisy pixels
  4370 OddM: Too few pixels. Probable cosmic ray
     5 OddP: Too many pixels. Galaxy or blend
   715 TooBright: One or more pixels close to saturation
 81054 MultProc: Processed as pair or group
  2194 Multi: More than two close neighbours - not processed

Most of the "BadLow" flags are also "MultProc" sources. The multiple source processing is fairly good at getting rid of misidentified peaks.

So most of the flags involve stars that were judged to be close enough together to interfere. As will be seen below, including these gives a big increase in the magnitude scatter: my multi-star processing just does not work very well yet. The ability to cope with crowded fields is supposed to be the big advantage of the PSF fitting technique. Ah well - I must just try harder!


Ensemble I Sigma

The I band data similarly shows a small improvement around magnitude 10 and no improvement for the few surviving bright sources. Clearly, I'm going to have to work on this because the potential improvement is as much as a factor of five for the brightest sources!


The above plots were all obtained by rejecting all the sources with any flags, including the information flags such as the flag saying that the source was analysed with the new improved multi-source analysis code that is supposed to cope when a source has one or two close neighbours. If one accepts sources with information flags set the results are:

V sigma

Um! Back to the drawing board!

Including the sources with information flags - mostly, as discussed above, those involved in multi star processing - has produced a large number of scattered, high standard deviation measurements. There are correspondingly large numbers of high Welch-Stetson estimates (not shown,) virtually all of them bogus. A major improvement is therefore needed in my multi-star processing.


Variables ... and otherwise!

Welch-Stetson stats

As with the corresponding plot with the Data Set T data, the Welch-Stetson statistic has been "cooked" somewhat by adding some variance in an attempt to allow for the noise floor of the magnitude observations. With the Data Set T data, I overdid things a bit. This time I have erred in the other direction, undercompensating for the (unexplained) extra noise. You will remember that typically a constant source will give a Welch-Stetson statistic in the range -0.5 to + 0.5 approximately and larger positive values suggest variability. The plot shows that the scatter increases for the bright sources, which it shouldn't. I have arbitrarily taken 4.0 as the limit above which I look at the sources in detail. As you will see, this criterion is not really strict enough for the bright sources but does well enough for the faint ones.

Just 13 out of the 1696 sources seen often enough with no flags set to give a Welch-Stetson statistic give a value exceding 4.0. I plotted all of these in the previous version of this Note but I've taken the opportunity of pruning them - there's only one real variable. I've left in one of the others as a Horrible Warnings

They are still plotted with magnitude increasing upwards. I know that I boasted last time that I knew how to plot magnitudes the right way up. I was wrong: it would probably be easier if astronomers started measuring brightness in sensible units that increase when brightness increases ...

Note that there is only one day's data here. My automatic cloud rejection code eliminated all the data from the other two days of Data Set 20. Obviously, I need to go back and see if I can recover anything at all for the second source below - the one which I claim to be a real variable.


Variable

A real variable at last!


Oh! Nasty!

Oops! No - this is not a variable. It is a source transitting some bad pixels on the CCD. I have set up means in the program to mark these as bad pixels - but this has to be done manually, by setting up a text file giving the bad locations. I have now done this and the problem has gone away. Be warned!


One genuine variable, a Horrible Warning and eleven more that I am not going to show you: lots of noisy data.

Results of 2002 Feb 11

It turned out on closer inspection that some of the extra scatter in the magnitudes of the bright sources resulted from incomplete convergence of the PSF fitting iteration. This has been fixed by tightening the internal tolerance of the iteration and by adding a second outer iteration that continues until the source stops moving around. This ensures that the fitting window is properly centred on the source. This procedure approximately doubles the running time: the first inner iteration is a little slower than before because the tolerances are tighter but the second inner iteration is a little faster because it is starting from a good estimate. Most isolated sources converge in two passes so the total time is about twice what it would have been before if multiple sources had been ignored. The previous, non-working, multi-source processing was very slow so the actual time increase is quite small.

The other major problem was the poor processing of close sources. I previously tried to fit simultaneously all the sources which were close enough to affect each other as pairs, triples etc. This did not work very well and was very slow. A pair took something like five timas as long as two single sources and triples were much slower again. Using an eight pixel PSF fitting radius, stars interfered within 16 pixels radius or about 800 pixels area. 3,000 sources in four million pixels gives 1,333 pixels per source so a significant fraction of sources had to be analysed as multiples.

The new code uses succesive subtraction - not exactly original! At each iteration, every source is examined to check whether it has converged, as discussed above, and to determine whether it should be combined with another or split to make two sources. While most single sources converge on the second iteration, those involved in more complicated situations may require 20 iterations before they converge. Since the proportion requiring large numbers of iterations is small, this whole process adds only around 30% to the total time as compared with simply ignoring confused sources. The overall increase in total fitting time as compared with the previous processing with its poor multi-source processing is around 50%: around 3 minutes per image on a 1 GHz Pentium.

A higher percentage of sources emerge from this process without error flags.

Observed flags
New    Previous
213388 170076 No Flags 
  8236  43235 BadLow: fit diverged to small amplitude
  7419   6119 BadFit: large fit residuals
  5477   4117 BadPix: One or more hot/noisy pixels
   278   4370 OddM: Too few pixels. Probable cosmic ray
    63      5 OddP: Too many pixels. Galaxy or blend
  2981    715 TooBright: One or more pixels close to saturation
  1135  81054 MultProc: pair or group
    31   2194 Multi: More than two close neighbours
252348    n/a Total

A large number of changes have been made so some of these numbers have changed their significance.
The threshold for BadLow has been reduced, which is why there are fewer even though the combining/splitting process generates small amplitude sources.
Most of the BadFit sources are now generated by the splitting process: many of the large amplitude BadFits have been successfully split.
I have no idea why the number of OddM, OddP sources has changed so much. A number of thresholds and scale parameters have been adjusted ...
Bright streaks now generate TooBright sources where previously they were simply lost. Three per image, 84 images. OK - it doesn't account for the increased number.
MultProc and Multi are now simply informational flags, counting the sources found to have one or two or more than two neighbours respectively. In the previous processing, where distinct processing was used - which did not work very well - they had to be considered error flags.
I'm sorry that the total number of stars analysed didn't get recorded for the previous analysis ... well, actually it did but not in a form I can get at without doing some work.

Anyway, the proof of the pudding is in the pictures. The noise floors for the V- and I-magnitudes are marginally better than they were but not enough to be worth replotting. The plots above give an adequate picture. The ensemble correction works a little better because there are far more sources available that don't have error flags for any of the images. The noise floor for the ensemble correction is better but again it's not that much better than before. But the data is cleaner, allowing one to spot variables that were previously missed: here's one.


new variable

A new variable. Well - new to me anyway. It turns out that this one was in the previous results too, even though I missed it:


Previous version

Here's the previous version. Yes - it's a bit messy!


And that's not the end of it.

Variable #2


Variable #3
Wow! Three variables. But before I get too cocky, I should point out that the one I found before didn't turn up this time. My excuse? Well - I changed a few thresholds to try to go a little deeper and ended up not going quite as deep! The variable is there but it didn't get picked up until most of the way through the analysis: there just aren't enough points to analyse.



Bennett's TASS Page