TN 55: Photometric-quality compression of TASS image data

Herb Johnson
July 5, 1999
Revision: #0 990705
......... #2 990608
Keywords: image, compression, astrometry, photometry, techniques, computation

Abstract:

This is a compilation of TASS maillist discussions in 1998-1999 of compression of Mark III and of estimated Mark IV star images. The discussion's focus was how to reduce the size of raw images for archiving, without compromising future photometry and astrometry produced from those images. This document lists my summary of discussion points, followed by highlights with links to the selected TASS email discussions from 20 Aug 1998 to 27 Feb 1999. The discussions contain numerous studies by TASS members and references studies by others on photometric images and software to compress them. Editorial comments from me are generally written within square brackets and identified with my name. My thanks to all the participants in our two-year discussion of this subject in the TASS newsgroup. Corrections and/or updates would be appreciated.

This note was reorganized and updated in June 2000 to revise some Web links and to add further discussion as regards to Mark IV imaging. Mark IV image data was available at that time. Discussion in this period is at the end of this note, I will likely just append new work to it rather than reedit it again.

Summary of points made in 1998-99 discussion

1) Herb Johnson et al find that ZIP will compress Mark III data without loss by 36%, and darks by about 50%.

2) Jure Skvarc did a through analysis of fcompress and Mark III image data. He suggested that a losy compression by a factor of 5 or even 10 will not produce a great loss of data, when decompressed data is analyzed by TASS methods. TASS members repond to his results in the discussion.

3) Arne offers this analysis, which I quote in part here:

    (1) compression for the mark III,[was] discarded because
        the images were small enough that the current storage media (CDROM)
        could hold several night's worth of data.....
    (2) For the mark IV, with its 1.6GB/night data rate, archiving would
        require multiple CDs or some other (most likely non-portable)
        storage medium.  For this case, you can (a) bite the bullet and
        use lots of CDROMs; (b) extract pertinent data and throw away the
        CCD frames; (c) extract pertinent data and keep some form of the
        CCD frames that might permit reprocessing at a later date.
    (3) While my original thoughts were to choose option(b), Chris suggested
        data compression during another discussion, and that made option(c)
        become viable if the compression ratio were large enough to bring
        us back into the realm of the mark III data rate.
    (4) Lossless compression ....is not sufficient for CD-R media
    (5) Lossy compression ....appears to not compromise greatly photometry or
        astrometry if the compression ratio is around 10:1.  This is exactly
        the ratio we need to store 4-5night of data on a single CDROM.
    (6) therefore, use a good lossy compression technique,
    (7) The real question in my mind has been what that precision loss is
        at various levels of compression and with real mark IV data.

4) Compression is discussed at length, including testing. Andrew Bennett does some tests with compressed/uncompressed Mark III data and Sextractor as follows:

 
       compression kbyte            (stars)
          Factor    Size        %        N   Common   Extra   Missing
            Raw     1408    100.0      549
              1      723     51.3      549      549        0        0
             10      515     36.6      557      544       13        5
            100      221     15.7      544      527       17       22
            200      142     10.1      526      504       22       45
            999       21      1.5      468      400       68      149

5) FITS does not support a compression scheme.

Highlights with links to the email discussion:

TASS Email Discussions

ZIP and Mark III

To: tass@wwa.com
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: CD vs. Travan: compression
Date: Sun, 16 Aug 1998 12:17:18 -0400

On Sat, 15 Aug 1998 15:43:28 -0400 (EDT), Nick Beser (beser@aplcomm.jhuapl.edu)  wrote:
*
* On Fri, 14 Aug 1998, Herbert R Johnson wrote:
*
*   ZIP is a pretty good compression scheme: my casual reading of compression
*   literature in years past is that it is hard to compete with for GENERAL
*   comression. But compression schemes that "know" about the actual
*   features of the data will beat it.
[snip]
*
*   A few tests on real data (and media) will show if any of the above
*   remarks hold. We certainly don't need yet another software project.
*
*   Herb Johnson
* Herb,
*
* I have to put my two cents in here. I teach image compression and packet
* video in Hopkins Electrical Engineering graduate school. ZIP is designed
* to perform very efficient text compression. [details snipped]
*
* Hcompress is a lossy compression method which approximates the original
* data. The amount of compression (ratio of compression) will indicate the
* number of compression artifacts caused by the algorithm.
*
* Hcompress has some very nice features. First off, it is capable of
* reading 16 bit FITS files... [and has some] acceptance by the astronomy
* community.
*
* I am planning to run an experiment using the star program..
* ... At some point the ratio of compression will start to produce
* errors in the program. This might be a good indicator as to the limits of
* hcompress.
*
* Nick

All this may be of some interest to some, but is it necessary in general?
If so, what are the goals, what is the need, are reasonable tools already
available, and what are "acceptable" losses in data quality? I have
some thoughts on all this, some from prior TASS discussions
on archiving data or throwing out raw data.

First, is there any specific "goal" in this discussion of
compression, other than to save archival storage space? If not, I'd
suggest casually that *50% compression* is so often achieved on ordinary
data and programs that it seems to be a reasonable criterion. In this regard
it has not been established that a LOSSLESS compression scheme (like ZIP)
will not accomplish the (undefined) task: this saves us from the trouble of
deciding what are "acceptable data losses" or, worst, "errors" in the
output of STAR. I'm not sure we even have a standard for "good" output
for STAR or other star list producing programs!

Now for some real tests. Here are my results of using Zip 2.04g
(PKZIP, MS-DOS shareware version) on some old TASS datafiles from
Tom Droege's site: all Mark III files are 1437K bytes, percentages
are from PKZIP's report

Data files: G0483977.FTS  36% to 927K
            G0493946.FTS  38% to 902K
            G1493669.FTS  36% to 929K
Darks:      D0500244.FTS  57% to 618K
            D2500244.FTS  56% to 636K

The dark files have a smaller range of values so I would expect more
compression. PKZIP of course is near-universal, loses no data, and
even checksums to insure ZIP files are not corrupted. (Other folks
who use compression schemes on their backups or on file systems or
compressed drives may want to report THEIR results.) So, it's clear
we can gain a third in archival storage with no sweat. A further
investigation of data representation in Mark III cameras may improve
this (or be comparable as a seperate scheme). And, as suggested, other
non-lossy schemes may do better.

If a TASS camera owner needs much more reduction in data storage,
like factors of 10 or 100, then lossy schemes are the only choice.
WHY this is necessary is not clear except as a matter of convenience
for archiving. So far, no Mark III site is discarding observations (I
raised the issue some time ago); and the costs of storage are dropping
by factors of two to several about every six months. (If the issue
is a simple distribution of sample data: why would anyone with a serious
interest want raw but "lossy" data? If so give them .JPG files.)

As for saving archival "space", mass storage on or off-line is a few
hundred dollars. In my mind, any problem that can be solved with a purchase
at your favorite consumer electronics store is a solved problem.
(The issue of "best media" has been well-hashed here before.)

Therefore, I don't see much point to yet another software project or
a discussion of archiving or "acceptable" losses in data. Of course,
I'm not in charge; and if someone wants to offer a piece of software
that saves storage space without compromising data quality - and they
can "prove" what they mean by "acceptable data" to someone else -
then of course some TASS camera sites may use it. Research into the
quality of star-list producing programs, the "prime customer" of
TASS Mark III data, would be in my opinion more useful than considering
the impact of compression IN ADDITION TO all the other factors of
TASS observations.

My opinions of course. - Herb Johnson

Rice compression and other notes

Date: Sun, 16 Aug 1998 15:26:28 -0400
From: Stupendous Man (richmond@stupendous.rit.edu)
To: tass@wwa.com
Subject: new Tech Note 44, plus short comments on compression

  On compression: one way to look at the issue is to ask "How much
loss-less compression is _possible_ for an image?"  In typical astronomical
images, the great majority of pixels are empty sky.  In that case,
one can compare

          how many bits of each pixel are filled with random noise
to
          how many bits there are in each pixel

  The Mark IV images will be sky-limited, even for short exposures.
That means that the main contributor to the noise in each pixel is the
shot noise of photons striking the pixel -- that is, even though the 
average number of photons might be 5000, one pixel might collect
5050 and another 4950, due simply to random fluctuations in the
number of photons entering the pixel each second.

  Now, in the simplest case, if N sky photons strike a pixel during an
exposure, the random fluctuations will be roughly sqrt(N) in size.
If a CCD produced one "count" in an image for each photon striking it,
this would mean that pixels in the empty sky would vary around the
mean value by +/- sqrt(N).

  Example: if N = 10,000 photons, then the image would show random
variations of +/- 100 counts in "blank sky" pixels.  This would mean
that the bottom 7 bits of each 16-bit pixel would vary wildly 
AND RANDOMLY from pixel to pixel.  This, in turn, would imply that
any compression scheme would be stumped by these 7 bits out of every 16.
Therefore, the maximum loss-less compression we might expect for such
an image would be
                    1 - 7/16  =  56 percent

  In real life, most CCDs do not produce one "count" per photon.  
Instead, they are set via a parameter called "gain" to produce perhaps
one "count" per 5 photons detected.  In fact, as shown in 
Tech Notes 8 and 9, the Mark III systems use a "gain" of about 5.
In this case, one can still calculate the maximum loss-less compression 
factor, but it takes an extra step.

  Example: N = 10,000 photons per pixel in the blank sky.
           random variations of +/- sqrt(N) photons = +/- 100 photons
           converted to random variations of 100/5 = +/- 20 counts
           this is about 4 bits per pixel
           hence, maximum loss-less compression ratio is about

                   1 - 4/16  =  75 percent

  Cameras located at dark sites will produce data which may be compressed
to a greater degree loss-lessly, since fewer sky photons will strike
their pixels during each exposure.  Likewise, images taken when the
moon is up and the sky is bright, will compress less well.

  I agree with Nick that hcompress is the best way to go, even though
I've written some compression code myself.  You can find a version
of Rice compression on my home page, designed specifically for 
16-bit FITS images:

           http://stupendous.rit.edu/richmond/rice/rice.html

But please do use hcompress -- it allows lossless or lossy compression,
and has much better support and documentation.


                                         Michael Richmond
[Editor's notes: Sometime in 1998, Michael Richmond wrote a compression document for the TASS Web page "software" section, including references to various "FITS data compression" schemes: look at
http://stupendous.rit.edu/tass/software/software.html#compress

In year 2000 Richmond's document is still available via this link. Some links in that document have been updated. One link is to the Space Telescope Institute where hcompress was developed.

Hcompress and sources at Space Telescope Institute

[Editor's note: Some TASS members (myself) had trouble accessing the Space Telescope Institute's links, some have not. Chris Albertson offered to supply their info on hcompress to those who have such access troubles as discussed below. As his note includes a description of hcompress I will quote it for convenience here. Other TASS members have found other sources for hcompress via Web searches. - Herb]

Chris Albertson says in May 2000, regarding the Space Telescope Institute's Web site:

The .. link 

ftp://ftp.stsci.edu/software/hcompress/

still works for me _if_ I use a "real" FTP client.
Netscape seems to have problems.  As a test I just downloaed
the entire directory ftp://ftp.stsci.edu/software/hcompress/*

If anyone has trouble I can supply the files.  Below is the
summary by the author. I downloaded it from the above URL as a test
two minutes ago:

  This directory contains HCOMPRESS, the image compression package
  written by Richard L. White for use at the Space Telescope Science
  Institute (rlw@stsci.edu).  Briefly, the method used is:

        (1) a wavelet transform called the H-transform (a Haar transform
                generalized to two dimensions), followed by
        (2) quantization that discards noise in the image while retaining
                the signal on all scales, followed by
        (3) quadtree coding of the quantized coefficients.

  The technique gives very good compression for astronomical images and
  is fast, requiring about 4 seconds for compression or decompression of
  a 512x512 image on a Sun SPARCstation 1.  The calculations are carried
  out using integer arithmetic and are entirely reversible....

Peter McCullough discusses lossy compression


From: "Peter R. McCullough" (pmcc@astro.uiuc.edu)
Date: Mon, 17 Aug 1998 12:13:04 -0500 (CDT)
To: tass@wwa.com
Subject: lossy compression

On lossy compression, I can't help but make this comment from my Stardial
experience: it is a pleasure to store only the compressed files.
 Let the compressed files be THE ONLY existing record
of the data; that way, there's no double-guessing what the results might be
if I go back to the original uncompressed data...because they don't exist.
It's not as nice scientifically, but it allows you to move on with your life.

If you run a data reduction program on the uncompressed data, and then
save only the lossy-compressed data, then you can't recreate the data
reduction anymore verbatim.

Another comment: I think that you can lossy-compress, decompress, lossy-compress
as many times as you want with the Hcompress code without successive lossiness
on each recompression so long as you use the same "scale" each and every time.

Final comment: we are debugging a JAVA applet prototype that does the
H-decompress algorithm for Stardial's archive. We'll let you know when it's
released. I think it will be in the public domain, but I'm not sure because
I'm not the one writing it.

- Peter McCullough


[Editor's note: Subsequently,  Peter R. McCullough (pmcc@astro.uiuc.edu) wrote to me in Feb 1999:]


"I noticed you are preparing a technical note on compression.
 I can add two things:
  A) I found the same sort of thing (but with less thorough analysis)
     for Stardial images and selected 8-10x lossy Hcompression for
     convenience, market share, and quality. And because I trusted
     Rick White to be good at that sort of thing. [See Richmond's note
     in the next quote - Herb.]
  B) I recently read an article by C.N. Sabbey (Yale) on a competitor
     to Hcompress called 'encode.' For lossless compression, encode
     runs 10x faster than hcompress and encode does slightly
     better at compression factor too. And (this is important for
     long skinny images like from drift scans) encode works line-by-line
     whereas hcompress works on rectangles (untimately as large as
     the entire image (I think)) - so encode can be embedded in the
     software that writes out the images:

              CCD --> memory --> 'encode' --> hard disk.

More on 'encode' can be found from C.N. Sabbey. I got his paper from
ADASS98, which was held here at UIUC. (You can email Sabbey at
sabbey@astro.yale.edu apparently). Or see PASP 110, 1067 (1998).
Available at ADS [Astronomical Data Center]
http://adsabs.harvard.edu/cgi-bin/
nph-bib_query?bibcode=1998PASP..110.1067S&db_key=AST&high=34e9d1214b18115

[Editor's note: Peter later provided this information. - Herb]

Title:  Data Acquisition for a 16 CCD Drift-Scan Survey
Authors:   Sabbey, C. N.; Coppi, P.; Oemler, A.
Journal:  The Publications of the Astronomical Society of the
Pacific, Volume 110, Issue 751, pp. 1067-1080.
                   (PASP Homepage)
Publication Date: 09/1998

[Stardial information can be found at www.stardial.com ]


From: aah@nofs.navy.mil
To: tass@wwa.com
Subject: Re: lossy compression
Date: Mon, 17 Aug 98 14:21:19 -0700

  Peter McCullough wrote that Stardial uses compression, which makes
sense due to the Internet accessibility to his data.  I agree that:
 If you run a data reduction program on the uncompressed data, and then
 save only the lossy-compressed data, then you can't recreate the data
 reduction anymore verbatim.
  Like I mentioned, my original thought for the mark IV was to throw away
the CCD images entirely after extracting starlists.  I am in the process
of modifying my philosophy, especially if Nick/Chris/etc. show that there
is minimal scientific loss in using moderate compression, so that you can
afford to archive something that might have use later.  I see two real needs to
look at the old data:  (1) someone wants to see an image of some non-extracted
feature (galaxy, faint star, trailed object, etc.), and (2) a real extraction
error that necessitates a complete reprocessing experience.  Compression
should not hurt #1 very much, and if things are so bad that you have to
reprocess, then you can afford to lose a little from #2 (but not a lot, so
that is why you have to check how much effect compression has).
  I disagree with Peter regarding when you extract the scientific data.
He feels you compress and then extract, so that reprocessing will use
the same exact, archived, image.  I feel you want to do the best job you
can at the beginning and so should use the real data.
  The Java applet would be really neat if we are to 'serve the pixels' to the
world (an interesting idea, but one that would require 18GB of storage for
a single filter of the entire sky at 7.5arcsec resolution).
Otherwise, I'd rather stick with C-based hcompress subroutines
so that we have an absolute standard language that should last through
a few years of software upgrade.  Plus, the compression/uncompression
will most likely be a part of the reduction software and therefore should
be implemented in the same language.
Arne


From: aah@nofs.navy.mil
To: tass@wwa.com
Subject: Re: lossy compression
Date: Tue, 18 Aug 98 08:20:37 -0700

Peter Mount wrote, and gave some interesting uses of Java for displaying
the database.  Note that Michael has most of these features already
running on the Web interface.  However, Peter goes on to say:

 You wouldn't need the 18Gb of storage, and the usercould have the image of
 interest in their local cdrom drive.

This I don't quite understand.  How do you know ahead of time what will be
the 'image of interest'?  For those without TASS cameras, either you would
have to have a complete collection (the 18GB I mentioned covers the sky once
in a single filter, uncompressed), or have someone load images for a few
specific fields onto a CD for you.  That is the problem with serving
pixels: if you serve any, you have to serve them all.  The DSS covers the
entire sky at 2 arcsec resolution (I think), and takes 100 CDs with
compression.  A TASS set would not be any improvement over this (brighter
limiting magnitude, bigger pixels), except that you could calibrate it
photometrically and perhaps include more than one filter.  I'm not
proposing that we do this -- I just wanted to point out that serving
images can turn out to be a complex and expensive proposition.  Peter
has it much easier by only scanning a single strip with a single camera,
and has different scientific goals.  At the same time, it would be kinda
fun to have the TASS sky on-line, and since the pixels are 16x larger
than the DSS, you could theoretically get the whole sky on 8 CDs now
or one DVD later.  How one then builds a GUI for access is TBD, and two
years from now when such a database would be available, there may be
yet another contender besides Java.
  An equally fun exercise is to think about how one could build a master
mark III image of the equatorial zone.  My first thoughts would be to make
an array, say 9degrees x 360 degrees, with 15x15arcsec 'pixels' (this would
be 2160 x 86400 x 16bits, plus an ancillary byte count array).  Then,
take each image, flatten it, undistort it using the astrometric reference
frame, adjust the DN level by using the Tycho reference stars, and add it to
the master array (16bits is enough as long as you only keep 'averaged'
pixels in the master array.  There would be some truncation, so you might
consider using floats for the array, and then write it as 16bits in the
end).  Since this image would be 370MB in size, you could easily put it
on a CD.  Easy in concept, perhaps hard in execution, but doable now.
Arne
<\pre>

Arne summarizes some of the Mark IV compression issues


aah@nofs.navy.mil wrote to TASS, prior to 18 Aug 1998 :

  Just to summarize:
    (1) we discussed compression for the mark III, and discarded it because
        the images were small enough that the current storage media (CDROM)
        could hold several night's worth of data.  Lossless compression would
        be ok, but adds the complexity of compression/uncompression and the
        need to keep a version of the uncompress algorithm available on the
        current computer/OS as long as the old archived disks survive.
        Factors of 2 are never enough for those headaches.
    (2) For the mark IV, with its 1.6GB/night data rate, archiving would
        require multiple CDs or some other (most likely non-portable)
        storage medium.  For this case, you can (a) bite the bullet and
        use lots of CDROMs; (b) extract pertinent data and throw away the
        CCD frames; (c) extract pertinent data and keep some form of the
        CCD frames that might permit reprocessing at a later date.
    (3) While my original thoughts were to choose option(b), Chris suggested
        data compression during another discussion, and that made option(c)
        become viable if the compression ratio were large enough to bring
        us back into the realm of the mark III data rate.
    (4) Lossless compression has been studied extensively, and gives ratios
        of 1.4-2.0 for typical CCD frames.  This is not sufficient for
        efficient use of CD-R media with the mark IV.
    (5) Lossy compression has been used in a limited number of astronomical
        projects, such as the Digital Sky Survey.  It appears to not
        compromise greatly photometry or astrometry if the compression ratio
        is around 10:1.  This is exactly the ratio we need to store 4-5nights
        of data on a single CDROM.
    (6) My suggestion, therefore, was to use a good lossy compression
        technique, such as hcompress, for archival AFTER the best current
        extraction had been made on the original images.  This suffers from
        the requirement of keeping hcompress running for several years, but
        gives you reprocessing capability, albeit at some loss in precision.
    (7) The real question in my mind has been what that precision loss is
        at various levels of compression and with real mark IV data.  I don't
        like blanket statements like '"moderate" (10:1 lossy) compression has
        little effect on astometry or photometry.'  That may be the case for
        the POSS, where 0.3arcsec astrometry and 0.2mag photometry is ok.
        I want to see it tested on science-grade data, or for someone to
        find the appropriate published article that discusses this case.
    (8) Nick and Chris have started testing hcompress.  Good!  The compression
        ratios look about as mentioned above.  Now we need to check the
        astrometry and photometry.  As I said before, we can do that with
        typical science-grade CCD frames as a first cut, comparing astrometry
        and photometry as various compression levels.  By then, Tom may have
        some mark IV data to do a more definitive test.
    (9) Note that I keep talking about science-grade data.  Tom has made a
        number of improvements between the mark III and the mark IV, and the
        mark IV should be capable of 0.01mag photometry and 0.2arcsec
        astrometry.  You need to keep this in mind when looking at
        compression degradation.
  Arne

From: ankovacs@netcom.ca
Date: Tue, 18 Aug 1998 11:57:06 -0400
To: tass@wwa.com
Subject: [Re: CD vs. Travan: compression]

I am a "would be player" in the TASS game who would like to raise my
concerns about using lossey data compression to store this project's
data. Below I will explain why I am concerned. In addition, I will also
give a brief explanation of why I think CD-ROMs rather than tapes would
make a better storage media.

   Rather than just walking into the debate with talk and no action, I
would like to have an opportunity to experiment with real data and
losseless compression. Can anyone suggest a source for any quantity of
real data?

   Lossey Compression

   Lossey compression and looking for variable stars does not sound to
me like a good match. All compression algorithams look for patterns in
data, however lossey compression introduces patterns even if they are
not really there. Lossey compression does not introduce roundoff error
but instead what would appear as random error. This is fine for
photographs but of great concern for scientific data in general and
looking for patterns in variable stars in particular.

   Tape vs CD-ROM

   There are several CD-ROM standards, but they are all just that,
standards. Different manufacturers' tape drives may use the same tapes,
but this does not mean data is stored in a standard format. 

   In my work, our product has evolved through three manufacturers of
tape drives. All have used travan tapes but we have never been able to
reliably read data from one manufacturer's backup on another brand of
tape.
    Best regards
    Andy K.

Date: Tue, 18 Aug 1998 18:02:52 +0000
From: Chris Albertson (chrisja@jps.net)
To: ankovacs@netcom.ca
CC: tass@wwa.com
Subject: Re: compression

ankovacs@netcom.ca wrote:

     Tape vs CD-ROM

     In my work, our product has evolved through three manufacturers of
  tape drives. All have used travan tapes but we have never been able to
  reliably read data from one manufacturer's backup on another brand of
  tape.

That is a problem with low-end PC tape drives (like Traven) but not with
the tapes typical of larger systems.  DAT, 8mm, DLT, nine track and the
"old" 1/4 inch tapes are all interchangeable.  I have never heard of a
problem like you decribe with these tape media.  "tar" is a pretty 
universal data format.  It is about 20 years old an likely to stay
around.
FITS is another standard for writing to tapes that is likely to be
around for a while.

For keeping pixel data around I think camera operators will use whatever
they have.  Likely tapes.  But for data exchange CD-ROMs are the way to
go.  It is just that CDs are not big enough to hold the pixel data
unless you compress at least 10:1.

--    --Chris Albertson

Mark III and fcompress

Date: Thu, 20 Aug 1998 00:55:01 +0200 (MET-DST)
From: Jure Skvarc (SKVARC@eros.ijs.si)
Subject: Fcompress and photometry
To: tass@wwa.com
Cc: jure.skvarc@ijs.si

Hello

I made some analysis of the influence fcompress has on photometry of
stars in Mark III images.  Following this introduction are the
details.

The same text is also in HTML-ized version on
http://kronos.ijs.si/~jure/fitsblink/fcompress/report.html
In short, you really can compress Mark III images without much loss of
information.  

[Editor's note: the link as of June 2000 is:

http://www-rcp.ijs.si/~jure/fitsblink/fcompress/report.html

Herb]

Regarding the dilemma whether raw images produced by TASS should be
preserved, I would vote that they should be.  In a short time, this
will make an impressive image library of the night sky.  I am sure
that in few years the capacity of data storage media or data
transmission speeds will present no problem.

				regards,
				   Jure Skvarc



Following are the boring details of the analysis:
--------------------------------------------------------------------

Analysis of an effect of the fcompress program on photometry of
g0483977.fts and g0493955.fts images.  
Jure Skvarc


I analyzed two of the images (g0483977.fts and g0493955.fts) that
Michael R. kindly uploaded to his ftp server. First I used fitsblink
to make star lists and match them to the GSC catalog.  The two images
happen to be partially overlapped so here is an opportunity to compare
the same stars on the two images.  I made a small awk program to join
the records from the two lists which corresponded to the same GSC
stars.  For beginning I made a graph which compares the measured
magnitudes with the magnitudes of the GSC stars
(http://kronos.ijs.si/~jure/fitsblink/fcompress/mag-gsc.gif).  The
trasformation between the two magnitude values was

20 + 0.8 * (-2.5 log10(v)), 

where v is the instrumental star intensity and log10 is the base 10
logarithm.  At the moment I do not know why the transformation between
the intensities is not linear.

Below are the numbers of detected catalog stars which are present in
both images for each magnitude.  Note that fitsblink did not detect
stars close to the left and right edges.

=====================
 mag.   no. of stars
---------------------
   6    2
   7    0
   8    3
   9    8
  10   20
  11   41
  12   70
  13   54
  14   11
  15    2
====================

In the next step I compared star magnitudes of the GSC stars in the
two images.  The graph showing this can be found at
http://kronos.ijs.si/~jure/fitsblink/fcompress/two-image.gif 

The average magnitude differences and standard deviations are shown in
the table below:

=========================
 mag. n   dif.    std.
-------------------------
  6   2  0.0802  0.0044
  8   3  0.0575  0.0090
  9   8 -0.0056  0.0100
 10  21  0.0481  0.0167
 11  41 -0.0112  0.0092
 12  71  0.0091  0.0398
 13  55 -0.0360  0.0648
 14  11 -0.0620  0.0931
 15   2  0.2554  0.0022
=========================

Here mag. means the magnitude, n is the number of stars in that
magnitude range, dif. is the average distance and std. is the standard
deviation.


Now everybody wonders what has the whole story to do with data
compression.  This: I made a similar procedure as before, only that
this time I compared results of magnitude measurements for the same
image. 

In the first experiment, I used a value 60 for the -s (scale)
fcompress switch, what gave compression factor of 4.9 and 4.8 for
g0483977.fts and g0493955.fts, respectively.  The following tables
show average differences of star magnitudes in non-compressed and
compressed images for different magnitude values.  Only stars matched
with the GSC were taken into account.

g0483977.fts                  g0493955.fts
=========================     =========================
 mag. n   dif.    std.	       mag. n   dif.    std.   
-------------------------     -------------------------
  6   1 -0.0075  0.0000	        6   1 -0.0007  0.0000  
  7   2 -0.0013  0.0000	        7   1 -0.0023  0.0000  
  8   7 -0.0040  0.0000	        8   7  0.0065  0.0005  
  9  13 -0.0016  0.0002	        9  16 -0.0067  0.0000  
 10  40 -0.0080  0.0001	       10  44 -0.0079  0.0001  
 11  81 -0.0026  0.0010	       11  85 -0.0136  0.0008  
 12 138 -0.0102  0.0057	       12 168 -0.0191  0.0023  
 13 159 -0.0329  0.0120	       13 171 -0.0199  0.0046  
 14  31 -0.0216  0.0145	       14  76 -0.0229  0.0072  
 15   5 -0.0019  0.0046	       15   6 -0.0211  0.0094  
=========================     =========================


In the second experiment, I used fcompress -s 200, what gave
compression factors of 10.0 and 9.6. 


g0483977.fts                  g0493955.fts
=========================     =========================
 mag. n   dif.    std.	       mag. n   dif.    std.   
-------------------------     -------------------------
  6   2  0.0189  0.0004	        6   1 -0.0024  0.0000  
  7   1 -0.0047  0.0000	        7   1 -0.0053  0.0000  
  8   6 -0.0015  0.0000	        8   7  0.0284  0.0065  
  9  13 -0.0065  0.0001	        9  16 -0.0080  0.0002  
 10  40 -0.0101  0.0005	       10  44 -0.0138  0.0004  
 11  80 -0.0166  0.0052	       11  82 -0.0087  0.0017  
 12 125 -0.0301  0.0119	       12 160 -0.0192  0.0070  
 13 122 -0.0832  0.0353	       13 132 -0.0320  0.0237  
 14  21 -0.0905  0.0588	       14  56 -0.0632  0.0319  
 15   2 -0.0742  0.0003	       15   3  0.0330  0.0297  
=========================     =========================

We can see that image compression with fcompress lowers star
intensities and that compression factor of 10 gives 2-3 times larger
magnitude scatter than compression factor of 5.  However, even higher
compression factor has lower scatter than one found in comparison of
two different images.

The fdecompress program has an option (-s) which enables image
smoothing.  Let's check (for compression factor of 10) what happens if
we use it:

g0483977.fts                  g0493955.fts
=========================     =========================
 mag. n   dif.    std.	       mag. n   dif.    std.   
-------------------------     -------------------------
  6   2 -0.1881  0.0001         6   2 -0.1346  0.0008
  7   1 -0.2159  0.0000	        7   1 -0.1734  0.0000
  8   6 -0.1620  0.0002	        8   7 -0.1492  0.0019
  9  13 -0.1596  0.0005	        9  15 -0.1646  0.0004
 10  40 -0.1423  0.0011	       10  44 -0.1500  0.0006
 11  79 -0.1173  0.0054	       11  84 -0.1161  0.0017
 12 124 -0.0939  0.0139	       12 159 -0.0751  0.0095
 13 122 -0.0903  0.0330	       13 134 -0.0560  0.0292
 14  20 -0.0472  0.0373        14  48 -0.0762  0.0353
 15   2 -0.0812  0.0001        15   4  0.1059  0.0165
=========================     =========================


It seems that no significant change in magnitude scatter is achieved,
but we can see that few faint stars are missing in the statistic and,
more apparent, that star magnitudes decrease for 0.1 to 0.2 magnitudes
what leads to loss of some stars.


Conclusion: it seems to me that no significant loss in information
would happen if Mark 3 images were compressed by a factor of 5 or even
10 using lossy compression with the fcompress program.  Certainly,
this conclcusion can not be simply extrapolated and a similar analysis
should be performed for every specific telescope/CCD/observatory
combination.


Date: Thu, 20 Aug 1998 14:27:45 +0200 (MET-DST)
From: Jure Skvarc (SKVARC@eros.ijs.si)
Subject: Re: Fcompress and photometry
To: tass@wwa.com
Cc: jure.skvarc@ijs.si

I'd like to thank Arne and Herbert for their comments about my
fcompress analysis.  Here are some answers:

  ,,,
   Jure, how are your magnitudes calculated
 in fitsblink? Aperture?  Star is somewhat different, and it would be
  ...

When I started the development of star detection routines for
fitsblink, my primary interest was that it should resolve closely
lying stars and determine their magnitudes in a rather robust manner
but I didn't need a very high accuracy.  The images I needed to
analyze at that time had pixel size of some 12 arcseconds so we could
not talk about any profiles and consequently fitting was out of
question.  I also didn't like aperture photometry because closely
lying stars could influence each other too strongly.  Instead of this
I determine the background as a function of coordinate separately and
then detect areas which are above some threshold plus some tricks to
resolve conglomerations of closely lying stars.  The fcompress
analysis was actually the first time when I compared photometry data
more quantitatively because doing photometry was not my initial goal.

  ...
   I would have guess the compressed
 images to give a fainter image (compress - uncompress = positive#) since
 the smoothing would trim off the profile peak.  Likewise, the compressed
 image should spread the image out a little more, and therefore you would
 have less light in the aperture, again making the star fainter.
 Your results show just the opposite.  I wonder why.

The explanation for this is probably that fcompress also compresses
the background and thus reduces the variation of the background (I
checked this).  Since the detection threshold depends on the variation
of the background, you may gain some contrast instead of losing it.
It would be interesting to see how other algorithms respond to
smoothing by compression.

   Michael -- are these two images flattened?  Not flatcomp'd though.

Quite some time ago I proposed establishment of some image library for
the purpose of algorithm testing.  I still think it would be nice to
have at least an order of 100 or so images available online.  Also,
the images should be processed (dark and flat) and without the blind
pixels and telemetry information in the image area.

   As a professional, I honestly don't see a huge benefit in making
 'an impressive image library of the night sky' down to 14th magnitude.
 The DSS does a much better job, and surveys like SDSS will have even
 ....

I agree that having a library of images which go to mag 14 is not all
that impressive.  It is still my opinion that the impressive part
comes by adding a time dimension by imaging the sky over and over
again.  This is how you plan to find all that variable stars, anyway.

Jure

Date: Fri, 21 Aug 1998 14:27:49 +0200 (MET-DST)
From: Jure Skvarc (SKVARC@eros.ijs.si)
Subject: Re: Fcompress and photometry
To: hjohnson@pluto.njcc.com

Herbert

 "How many decimal places" is what I meant. Did STAR report 8.1234 or 8.123
 or 8.12? Were the GSC magnitudes 7.12 or 7.1234?

I see.  Fitsblink, which was used to extract magnitudes, actually
reports intensities, which were then trasformed into magnitude using
the formula presented in my report.  GSC has magnitudes stored in
Fortran F5.2 format, i.e. to two decimal places.

 ...
 and GSC magnitudes: it would be useful to say that "the error in
 compression is comparable to our errors in observing" for instance.

My impression on the basis of the results is, that compression errors
are smaller and probably negligible in comparison to observation
errors for the Mark III images.  For other images (with better optics)
it may be different.  Observation quality also depends strongly on
observation conditions which may change quite a lot during such long
exposures and may also change across the field of view.

				Jure

To: Jure Skvarc (SKVARC@eros.ijs.si)
Cc: SKVARC@eros.ijs.si
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: Fcompress and photometry
Date: Fri, 21 Aug 1998 17:07:18 -0400

On Fri, 21 Aug 1998 14:27:49 +0200 (MET-DST), Jure Skvarc wrote:
* Herbert
*
*  "How many decimal places" is what I meant. Did STAR report 8.1234 or 8.123
*  or 8.12? Were the GSC magnitudes 7.12 or 7.1234?
*
* I see.  Fitsblink, which was used to extract magnitudes, actually
* reports intensities, which were then trasformed into magnitude using
* the formula presented in my report.  GSC has magnitudes stored in
* Fortran F5.2 format, i.e. to two decimal places.

OK, I suggest in your "final" report you note the GSC values are to
two decimal places, and the results of STAR are to four decimal
places. By the way, when you "binned" the stars by magnitudes, I presume
"mag 7" is anything from 7.00 to 7.99, and so on. Again, a nice thing to
note in a complete report. (I've studied statistics and scientific
reporting, and old data, so these points are not always obvious.)
You might also confirm that TASS Mark III raw data is only "good" to
12 bits, that is one part in 4096. Magnitudes are a log scale so
I'm not sure what "precision" that corresponds to.

*  and GSC magnitudes: it would be useful to say that "the error in
*  compression is comparable to our errors in observing" for instance.
*
* My impression on the basis of the results is, that compression errors
* are smaller and probably negligible in comparison to observation
* errors for the Mark III images.  For other images (with better optics)
* it may be different.  Observation quality also depends strongly on
* observation conditions which may change quite a lot during such long
* exposures and may also change across the field of view.


Right, I would say (from paticipating in the email and my casual work
on some of the data) that observational errors predominate. Still,
it's nice to know the instrumental limits. Small apertures traditional
"punch through" atmospheric turbulence that would reduce seeing for
larger apertures (say several to 10 inches or more).

Herb

AFTTools discussion

To: TASS
Subject: Compression
From: andrew.bennett@ns.sympatico.ca (Andrew Bennett)
Date: Wed, 26 Aug 1998 17:39:43 GMT

I have been playing around with compression using the FITS
images provided. I have not been able to use Star etc (is there
a way to read FITS images back into Star?) but used SExtractor
to analyse the images - some of the large standard deviations
may be associated with problems with SExtractor?

I think the results below confirm my personal biasses:
archive the raw data if this is humanly possible! Any
useful degree of compression introduces uncertainties
that one would prefer to do without.

Lossless compression is, of course, fine.

Note the very large standard deviations as compared with
the probable errors. Not nice.

*************************************************************
       Image G0483977
       Compressed using ATFTools V1.0 H compression
       Processed with SExtractor V2.0.0 1998Mar30 using defaults
       SExtractor claimed to detect 624 and sextract 267 from the file.
       But there were 554 entries in .CAT:
       4 with M =3D 99 and 1 outside the image area, leaving 549.

          Factor    Size        %        N   Common   Extra   Missing
            Raw     1408    100.0      549
              1      723     51.3      549      549        0        0
             10      515     36.6      557      544       13        5
            100      221     15.7      544      527       17       22
            200      142     10.1      526      504       22       45
            999       21      1.5      468      400       68      149

         Factor       Probable error             Standard Deviation
                      DX       DY       DM       DX       DY       DM
             10    0.003    0.003    0.015    0.076    0.142    0.130
            100    0.021    0.023    0.028    0.088    0.132    0.136
            200    0.052    0.061    0.049    0.130    0.141    0.184
            999    0.310    0.332    0.188    0.868    1.022    0.538

       The standard deviations are dominated by the large errors.
       The distribution of errors is pretty wild.
       Errors are not confined to the weakest sources.

Andrew Bennett, Avondale Vineyard, Nova Scotia, Canada.

Subsequent discussion of ATFTools

A note in May 2000 from Andrew Bennett says he tried a program and method called "ATFTools V1.0 H compression". The note shows his results at various levels of compression. THere is/was no link to ATFTools.

Andrew says "I deliberately didn't give a link because if I did it would be out of date by the time somebody wanted to use it! Go to the TASS software page and look for the Automated Telescope Facility. The link today is http://www.tass-survey.org/tass/software/software.html#atf but that may well be a broken link by now ...

I think this compression is exactly the same algorithm as several of the others."

Meanwhile, in a subsequent message from Shawn Dvorak:

"The ATF Tools set (from my alma mater, the U of Iowa) can be downloaded from ftp://ftp-astro.physics.uiowa.edu/pub/software/atftools/. I've played with this in the past but haven't made much use of it. If I remember correctly, it includes an hcompress routine in addition to a number of photometry and WCS routines. I don't believe that source code is included."

FITS standards and "differences" compression

To: mgutzwiller@lanvision.com, tass@wwa.com
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: Puzzle Solution and Compression
Date: Thu, 22 Oct 1998 19:48:30 -0400

On Wed, 21 Oct 1998 14:32:53 -0400, mgutzwiller@lanvision.com wrote:
* Tom et al.,
*
* A co-worker of mine actually got interested in compressing Mark III images
* using a scheme similar to what Herb suggested.  By dropping the lower 4 bits
* and using differences between pixel values he was able to get compression
* ratios of about 3.5 to 1.  Dropping the lower 4 bits made no difference to
* the Star program's results since the noise was on the order of 6 bits anyway
* in my light polluted skies.  A similar approach might be usable for the Mark
* IV.
*
* Note that using only the upper 12 bits would still give us a dynamic range
* of 4096 to 1.
*
* Mike G.

Differences is not a bad scheme, that is what GIF formats do. If the
difference is only a few bits, you just use the deltas. If the difference
is large, you represent it directly and start the differences again.
It's a classic algorithm. And, since most of our image is dark, most of
the differences are small. Is there a FITS format that supports this?

Herb Johnson

From: aah@nofs.navy.mil
Date: Fri, 23 Oct 1998 08:22:05 -0700
To: tass@wwa.com
Subject: Re: Puzzle Solution and Compression

I agree with Tom -- the mark IV will have more dynamic range than the mark III, and
can be run in many different modes (rather than the single, 468sec exposure
mode of the mark III drift scan).  You cannot just throw the lower 4 bits away
for the mark IV.  Lossless compression of scientific CCD data is difficult beyond the
normal 1.5x that you get from Unix compress / hcompress / etc.

Herb:  there was considerable discussion when we were voting on the FITS standard
regarding the inclusion of compression, and no decision was made.  In other words,
you cannot compress internally a FITS image and have it meet any standard.  You
can take FITS images and zip/tar them together if you want.  The problem with
internal compression is how to make it archive-safe.  This would entail adding
the entire compression algorithm in the header.

Arne

Date: Fri, 23 Oct 1998 14:26:42 -0700
From: Chris Albertson (chris@topdog.pas1.logicon.com)
To: hjohnson@pluto.njcc.com, tass@wwa.com
Subject: Re: Puzzle Solution and Compression

Herbert R Johnson wrote:
       
  Differences is not a bad scheme, that is what GIF formats do. If the
  difference is only a few bits, you just use the deltas. If the difference
  is large, you represent it directly and start the differences again.
  It's a classic algorithm. And, since most of our image is dark, most of
  the differences are small. Is there a FITS format that supports this?

No, FITS is a plain n dimensional array.  In the FITS user guide
is a reference to a _proposed_ compression standard.(Warnock et al)
The proposal was in 1990 so likely it was dropped.  gzip and 
hcompress seem to be what is used.  (I just had to look this up
because I remembered reading about a FITS compression standard.)  

Didn't someone here on this list look into compression and it's
effect of photometry?  I'm sure I saw it written up complete
with some plots.


Delta encoding works OK for images (After whacking off 
the noisy low order bits) but it does not take advantage of the
2 (or more) dimensional-ness of the data.  In an image, if the left
and right pixels tends to be about equal then so are the ones just
above and below.  I think there are two classes of 2D compression.
Those that apply some 2d transform to the image then compress in
transform space and Quad Trees.

I don't think this is a big deal.  Currently you can keep _all_ the
raw MkIV data at a pretty low cost on tape or CD. 

To: Chris Albertson (chris@topdog.pas1.logicon.com) , tass@wwa.com
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: Compression
Date: Mon, 26 Oct 1998 13:43:17 -0500

On Fri, 23 Oct 1998 14:26:42 -0700, Chris Albertson chris@topdog.pas1.logicon.com  wrote:
*
* No, FITS is a plain n dimensional array.  In the FITS user guide
* is a reference to a _proposed_ compression standard.(Warnock et al)..

As Arne says, there is no FITS standard.

* Didn't someone here on this list look into compression and it's
* effect of photometry?  I'm sure I saw it written up complete
* with some plots.

It was well discussed.

* Delta encoding works OK for images (After whacking off
* the noisy low order bits) but it does not take advantage of the
* 2 (or more) dimensional-ness of the data.

That is certainly true. Compression schemes that "match" the data to
be compressed do better. Schemes that produce "acceptable" data loss
do better still.

* I don't think this is a big deal.  Currently you can keep _all_ the
* raw MkIV data at a pretty low cost on tape or CD.

As previously discussed, a night's viewing on Mark III will fit on ONE CD-ROM:
but Mark IV's proposed night's data may not. There was no consensus on tape
in our discussion; the consensus was that CD-ROM was a very convenient
medium that did not suffer from incompatibilities as did tape. I don't
care to reopen that discussion as I have nothing to add to it: but
I did want to introduce the notion to (eventually) establish the
dynamic range (i.e. reliable number of bits) from the Mark IV.

Herb Johnson

Mark IV compression Discussion for year 2000

recent astronomical papers on compression

On Wed, 17 May 2000 11:35:25 -1000 (HST), Jim Heasley 
(heasley@hoku.ifa.hawaii.edu) wrote:

Herb,

I don't know if you've seen it (presumably Mike has) but there are
several papers on data compression in the ASP volume Astronomical Data
Analysis & Software Systems VIII.  One of them is also discussed by
Sabbey, Coppi, and Oemler 1998, PASP, 1067 (which makes reference to
something on Mike's web site).  This later paper also has code available
to play with at

		ftp://www.astro.yale.edu/pub/sabbey/encode.tar.gz

Jim Heasley, Institute for Astronomy University of Hawaii

[Editor's note: a subsequent message from Jim to me follows below. - Herb]

Certainly feel free to includemy comments in your technote.  As it
happens, I'm sending this message from home and have the volume of the
ASP Conference Series here with me as I write. 

The volume is Astronomical Data Analysis Software and Systems III,
Astronomical Society of the Pacific Conference Series, Vol. 172,
Eds. Mehringer, Plante and Roberts.

The papers I thought to be of interest are:

White & Greenfeld, A Scheme for Compressing Floating-Point Images, p
125. 

Sabbey, Adaptive, Lossless Compression of 2-D Astronomical Images, p
129.

It is this second paper that is using a scheme that is due to a fellow
named Rice, and in the PASP paper I mentioned in my earlier email, Sabbey
makes a reference to Mike Richmond's web page. The PASP paper has the
web links to the new method, and other oldies but goodies like hcompress
and fitspress.

A friend in the computer sciences business tells me that the bzip2
program in linux does very well for general compression.  I tried it on
a couple of images and it did a bit better than hcompress (10% or so)
for lossless compression, but I haven't done the sort of extensive
testing Sabbey did on the "standard" set of FITS images.  I guess I
could check this out as I did download those images from NOAO for this
purpose. 

Jim Heasley