TN 0082: Gaps in the magnitude histogram

Michael Richmond
April 8, 2002

Keywords: photometry

Tom Droege recently raised a question concerning some Mark IV data he had reduced with my pipeline: ``Why are there gaps in the distribution of magnitudes?'' The figure below explicates his concern; it plots calibrated magnitude for one set of Mark IV data on the y-axis versus time (or image index number) on the x-axis.

Each vertical column of dots represents the measured magnitudes of stars in a single image. Note the horizontal features in the diagram at small magnitudes: they represent ``gaps'' of magnitude values, values which no star have. Why should they be present? And why should they appear in image after image?

As Tom wrote:

> The mag distribution for a single frame is suspicious.  
> There are blank spots common from frame to frame.

I believe I know the explanation. The reason the blank spots appear is ... because there are no stars with those magnitude values.

No, really! In each frame, the pipeline detects between 1000 and 3000 stars. Most of the stars are faint, with magnitudes close to the plate limit. Only a few stars appear between, say, magnitude 7 and magnitude 10. If one makes a histogram of the observed distribution of magnitudes, counting the number of stars in bins of width, say, 0.05 mag, then there will be many bins at bright magnitudes which are empty. At fainter magnitudes, no bins will be empty, but the number of stars in each bin will be small enough that small-number statistics will cause large relative variations in number from bin to bin. Near the plate limit, the number of stars in each bin will be large, which reduces the relative size of the variations from bin to bin.

One way to look at it is like this: consider a single bin of some width in magnitudes. If the number of stars in that bin is a small number, then if we look at another area of the sky, we should see another small number. The two numbers probably won't be exactly the same, since there are random statistical variations in the exact number of stars in some little patch of the sky within a particular range of magnitudes. If the number of stars in the bin is drawn from a random distribution (probably a Poisson distribution) with some mean value N, then the number will fluctuate from sample to sample with a standard deviation sqrt(N).

When N is small, then the relative size of the fluctuations is large:

                       sqrt(N)
         N = 36        -------  =  1/6
                          N
That means we expect to see fluctuations from one bin to the next which look big compared to the value in each bin. A histogram of magnitude values would look "jagged".

On the other hand, when N is large, the relative size of the fluctuations is small:

                       sqrt(N)
         N = 3600      -------  =  1/60
                          N
In this case, the fluctuations from one bin to the next should look small. A histogram of the magnitude values would look "smooth".

Below are three graphs showing magnitude histograms of

The real data consist of measurements of 1584 stars in a single frame. The simulations have identical underlying populations of stars, but contain a different number of observed stars.

  1. 1500 stars
  2. 15,000 stars
  3. 150,000 stars

I've offset the real and simulated values so that they don't overlap; it's easier to compare them that way.

Simulated measurements of 1500 stars.

Simulated measurements of 15,000 stars.

Simulated measurements of 150,000 stars.

As you can see, the more distinct stars which one "observes", the smoother the magnitude histogram. The "jaggedness" of the real Mark IV histogram is similar to that of the simulation containing 1500 stars.

Now, you may notice that the two histograms have overall shapes which don't match very well at the faint end: the number of stars actually observed falls gradually over a span of about 2 magnitudes, while the simulated histogram has a very sharp edge at the faint end. That's a consequence of the simple nature of the simulation:

The sharp edge represents the fact that no stars were placed more than a certain distance from the Earth, and so none are fainter than a particular value.

In real life, stars have a range of absolute luminosities. In addition, stars are distributed to VERY large distances away from the Earth, but any particular instrument will only detect those above a certain limiting brightness. In fact, stars near this limit will sometimes be detected, but other times be missed: the probability of detecting a star near the limit decreases as it approaches the limit. Thus, a real histogram of magnitudes will always show a gradual decrease from its peak.

The bottom line is that a set of a few thousand stars will contain very few bright ones, so we should expect to see gaps and large relative variations from bin to bin in a histogram of magnitudes.

Now, one may look at Tom's plot again:

and ask, "But why should gaps appear at the SAME magnitudes for a whole bunch of frames?" The answer is (I hope!) that the frames Tom reduced all show the SAME FIELD. Therefore, although there are tens of thousands of measurements of magnitudes in the pipeline's .cal output file, they represent only a few thousand distinct stars. If the pipeline does its job properly, it will yield very nearly the same magnitudes for each star in each frame. Gaps should appear in a plot like Tom's, if one is analyzing a set of "follow frames" of the same field over and over again.