This note was reorganized and updated in June 2000 to revise some Web links and to add
further discussion as regards to Mark IV imaging. Mark IV image data was available
at that time. Discussion in this period is at the end of this note,
I will likely just append new work to it rather than reedit it again.
1) Herb Johnson et al find that ZIP will compress Mark III data without loss by 36%, and darks by about 50%.
2) Jure Skvarc did a through analysis of fcompress and Mark III image data. He suggested that a losy compression by a factor of 5 or even 10 will not produce a great loss of data, when decompressed data is analyzed by TASS methods. TASS members repond to his results in the discussion.
3) Arne offers this analysis, which I quote in part here:
(1) compression for the mark III,[was] discarded because
the images were small enough that the current storage media (CDROM)
could hold several night's worth of data.....
(2) For the mark IV, with its 1.6GB/night data rate, archiving would
require multiple CDs or some other (most likely non-portable)
storage medium. For this case, you can (a) bite the bullet and
use lots of CDROMs; (b) extract pertinent data and throw away the
CCD frames; (c) extract pertinent data and keep some form of the
CCD frames that might permit reprocessing at a later date.
(3) While my original thoughts were to choose option(b), Chris suggested
data compression during another discussion, and that made option(c)
become viable if the compression ratio were large enough to bring
us back into the realm of the mark III data rate.
(4) Lossless compression ....is not sufficient for CD-R media
(5) Lossy compression ....appears to not compromise greatly photometry or
astrometry if the compression ratio is around 10:1. This is exactly
the ratio we need to store 4-5night of data on a single CDROM.
(6) therefore, use a good lossy compression technique,
(7) The real question in my mind has been what that precision loss is
at various levels of compression and with real mark IV data.
4) Compression is discussed at length, including testing. Andrew Bennett does some tests with compressed/uncompressed Mark III data and Sextractor as follows:
compression kbyte (stars)
Factor Size % N Common Extra Missing
Raw 1408 100.0 549
1 723 51.3 549 549 0 0
10 515 36.6 557 544 13 5
100 221 15.7 544 527 17 22
200 142 10.1 526 504 22 45
999 21 1.5 468 400 68 149
5) FITS does not support a compression scheme.
To: tass@wwa.com
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: CD vs. Travan: compression
Date: Sun, 16 Aug 1998 12:17:18 -0400
On Sat, 15 Aug 1998 15:43:28 -0400 (EDT), Nick Beser (beser@aplcomm.jhuapl.edu) wrote:
*
* On Fri, 14 Aug 1998, Herbert R Johnson wrote:
*
* ZIP is a pretty good compression scheme: my casual reading of compression
* literature in years past is that it is hard to compete with for GENERAL
* comression. But compression schemes that "know" about the actual
* features of the data will beat it.
[snip]
*
* A few tests on real data (and media) will show if any of the above
* remarks hold. We certainly don't need yet another software project.
*
* Herb Johnson
* Herb,
*
* I have to put my two cents in here. I teach image compression and packet
* video in Hopkins Electrical Engineering graduate school. ZIP is designed
* to perform very efficient text compression. [details snipped]
*
* Hcompress is a lossy compression method which approximates the original
* data. The amount of compression (ratio of compression) will indicate the
* number of compression artifacts caused by the algorithm.
*
* Hcompress has some very nice features. First off, it is capable of
* reading 16 bit FITS files... [and has some] acceptance by the astronomy
* community.
*
* I am planning to run an experiment using the star program..
* ... At some point the ratio of compression will start to produce
* errors in the program. This might be a good indicator as to the limits of
* hcompress.
*
* Nick
All this may be of some interest to some, but is it necessary in general?
If so, what are the goals, what is the need, are reasonable tools already
available, and what are "acceptable" losses in data quality? I have
some thoughts on all this, some from prior TASS discussions
on archiving data or throwing out raw data.
First, is there any specific "goal" in this discussion of
compression, other than to save archival storage space? If not, I'd
suggest casually that *50% compression* is so often achieved on ordinary
data and programs that it seems to be a reasonable criterion. In this regard
it has not been established that a LOSSLESS compression scheme (like ZIP)
will not accomplish the (undefined) task: this saves us from the trouble of
deciding what are "acceptable data losses" or, worst, "errors" in the
output of STAR. I'm not sure we even have a standard for "good" output
for STAR or other star list producing programs!
Now for some real tests. Here are my results of using Zip 2.04g
(PKZIP, MS-DOS shareware version) on some old TASS datafiles from
Tom Droege's site: all Mark III files are 1437K bytes, percentages
are from PKZIP's report
Data files: G0483977.FTS 36% to 927K
G0493946.FTS 38% to 902K
G1493669.FTS 36% to 929K
Darks: D0500244.FTS 57% to 618K
D2500244.FTS 56% to 636K
The dark files have a smaller range of values so I would expect more
compression. PKZIP of course is near-universal, loses no data, and
even checksums to insure ZIP files are not corrupted. (Other folks
who use compression schemes on their backups or on file systems or
compressed drives may want to report THEIR results.) So, it's clear
we can gain a third in archival storage with no sweat. A further
investigation of data representation in Mark III cameras may improve
this (or be comparable as a seperate scheme). And, as suggested, other
non-lossy schemes may do better.
If a TASS camera owner needs much more reduction in data storage,
like factors of 10 or 100, then lossy schemes are the only choice.
WHY this is necessary is not clear except as a matter of convenience
for archiving. So far, no Mark III site is discarding observations (I
raised the issue some time ago); and the costs of storage are dropping
by factors of two to several about every six months. (If the issue
is a simple distribution of sample data: why would anyone with a serious
interest want raw but "lossy" data? If so give them .JPG files.)
As for saving archival "space", mass storage on or off-line is a few
hundred dollars. In my mind, any problem that can be solved with a purchase
at your favorite consumer electronics store is a solved problem.
(The issue of "best media" has been well-hashed here before.)
Therefore, I don't see much point to yet another software project or
a discussion of archiving or "acceptable" losses in data. Of course,
I'm not in charge; and if someone wants to offer a piece of software
that saves storage space without compromising data quality - and they
can "prove" what they mean by "acceptable data" to someone else -
then of course some TASS camera sites may use it. Research into the
quality of star-list producing programs, the "prime customer" of
TASS Mark III data, would be in my opinion more useful than considering
the impact of compression IN ADDITION TO all the other factors of
TASS observations.
My opinions of course. - Herb Johnson
Date: Sun, 16 Aug 1998 15:26:28 -0400
From: Stupendous Man (richmond@stupendous.rit.edu)
To: tass@wwa.com
Subject: new Tech Note 44, plus short comments on compression
On compression: one way to look at the issue is to ask "How much
loss-less compression is _possible_ for an image?" In typical astronomical
images, the great majority of pixels are empty sky. In that case,
one can compare
how many bits of each pixel are filled with random noise
to
how many bits there are in each pixel
The Mark IV images will be sky-limited, even for short exposures.
That means that the main contributor to the noise in each pixel is the
shot noise of photons striking the pixel -- that is, even though the
average number of photons might be 5000, one pixel might collect
5050 and another 4950, due simply to random fluctuations in the
number of photons entering the pixel each second.
Now, in the simplest case, if N sky photons strike a pixel during an
exposure, the random fluctuations will be roughly sqrt(N) in size.
If a CCD produced one "count" in an image for each photon striking it,
this would mean that pixels in the empty sky would vary around the
mean value by +/- sqrt(N).
Example: if N = 10,000 photons, then the image would show random
variations of +/- 100 counts in "blank sky" pixels. This would mean
that the bottom 7 bits of each 16-bit pixel would vary wildly
AND RANDOMLY from pixel to pixel. This, in turn, would imply that
any compression scheme would be stumped by these 7 bits out of every 16.
Therefore, the maximum loss-less compression we might expect for such
an image would be
1 - 7/16 = 56 percent
In real life, most CCDs do not produce one "count" per photon.
Instead, they are set via a parameter called "gain" to produce perhaps
one "count" per 5 photons detected. In fact, as shown in
Tech Notes 8 and 9, the Mark III systems use a "gain" of about 5.
In this case, one can still calculate the maximum loss-less compression
factor, but it takes an extra step.
Example: N = 10,000 photons per pixel in the blank sky.
random variations of +/- sqrt(N) photons = +/- 100 photons
converted to random variations of 100/5 = +/- 20 counts
this is about 4 bits per pixel
hence, maximum loss-less compression ratio is about
1 - 4/16 = 75 percent
Cameras located at dark sites will produce data which may be compressed
to a greater degree loss-lessly, since fewer sky photons will strike
their pixels during each exposure. Likewise, images taken when the
moon is up and the sky is bright, will compress less well.
I agree with Nick that hcompress is the best way to go, even though
I've written some compression code myself. You can find a version
of Rice compression on my home page, designed specifically for
16-bit FITS images:
http://stupendous.rit.edu/richmond/rice/rice.html
But please do use hcompress -- it allows lossless or lossy compression,
and has much better support and documentation.
Michael Richmond
[Editor's notes: Sometime in 1998, Michael Richmond wrote a compression document
for the TASS Web page "software" section, including references to various
"FITS data compression" schemes: look at
http://stupendous.rit.edu/tass/software/software.html#compress
In year 2000 Richmond's document is still available via
this link. Some links in that document have been updated. One link is to the
Space Telescope Institute where hcompress was developed.
[Editor's note: Some TASS members
(myself) had trouble accessing the Space Telescope Institute's links,
some have not. Chris Albertson offered to supply their info on hcompress
to those who have such access troubles as discussed below. As his note includes
a description of hcompress I will quote it for convenience here. Other TASS members
have found other sources for hcompress via Web searches. - Herb]
A note in May 2000 from Andrew Bennett says he tried a program
and method called "ATFTools V1.0 H compression". The note
shows his results at various levels of compression. THere
is/was no link to ATFTools.
Andrew says "I deliberately didn't give a link because if I did it
would be out of date by the time somebody wanted to
use it! Go to the TASS software page and look for the Automated
Telescope Facility. The link today is
http://www.tass-survey.org/tass/software/software.html#atf
but that may well be a broken link by now ...
I think this compression is exactly the same algorithm as
several of the others."
Meanwhile, in a subsequent message from Shawn Dvorak:
"The ATF Tools set (from my alma mater, the U of Iowa) can be downloaded from
ftp://ftp-astro.physics.uiowa.edu/pub/software/atftools/. I've played with
this in the past but haven't made much use of it. If I remember correctly,
it includes an hcompress routine in addition to a number of photometry and
WCS routines. I don't believe that source code is included."
Hcompress and sources at Space Telescope Institute
Chris Albertson says in May 2000, regarding the Space Telescope Institute's Web site:
The .. link
ftp://ftp.stsci.edu/software/hcompress/
still works for me _if_ I use a "real" FTP client.
Netscape seems to have problems. As a test I just downloaed
the entire directory ftp://ftp.stsci.edu/software/hcompress/*
If anyone has trouble I can supply the files. Below is the
summary by the author. I downloaded it from the above URL as a test
two minutes ago:
This directory contains HCOMPRESS, the image compression package
written by Richard L. White for use at the Space Telescope Science
Institute (rlw@stsci.edu). Briefly, the method used is:
(1) a wavelet transform called the H-transform (a Haar transform
generalized to two dimensions), followed by
(2) quantization that discards noise in the image while retaining
the signal on all scales, followed by
(3) quadtree coding of the quantized coefficients.
The technique gives very good compression for astronomical images and
is fast, requiring about 4 seconds for compression or decompression of
a 512x512 image on a Sun SPARCstation 1. The calculations are carried
out using integer arithmetic and are entirely reversible....
Peter McCullough discusses lossy compression
From: "Peter R. McCullough" (pmcc@astro.uiuc.edu)
Date: Mon, 17 Aug 1998 12:13:04 -0500 (CDT)
To: tass@wwa.com
Subject: lossy compression
On lossy compression, I can't help but make this comment from my Stardial
experience: it is a pleasure to store only the compressed files.
Let the compressed files be THE ONLY existing record
of the data; that way, there's no double-guessing what the results might be
if I go back to the original uncompressed data...because they don't exist.
It's not as nice scientifically, but it allows you to move on with your life.
If you run a data reduction program on the uncompressed data, and then
save only the lossy-compressed data, then you can't recreate the data
reduction anymore verbatim.
Another comment: I think that you can lossy-compress, decompress, lossy-compress
as many times as you want with the Hcompress code without successive lossiness
on each recompression so long as you use the same "scale" each and every time.
Final comment: we are debugging a JAVA applet prototype that does the
H-decompress algorithm for Stardial's archive. We'll let you know when it's
released. I think it will be in the public domain, but I'm not sure because
I'm not the one writing it.
- Peter McCullough
[Editor's note: Subsequently, Peter R. McCullough (pmcc@astro.uiuc.edu) wrote to me in Feb 1999:]
"I noticed you are preparing a technical note on compression.
I can add two things:
A) I found the same sort of thing (but with less thorough analysis)
for Stardial images and selected 8-10x lossy Hcompression for
convenience, market share, and quality. And because I trusted
Rick White to be good at that sort of thing. [See Richmond's note
in the next quote - Herb.]
B) I recently read an article by C.N. Sabbey (Yale) on a competitor
to Hcompress called 'encode.' For lossless compression, encode
runs 10x faster than hcompress and encode does slightly
better at compression factor too. And (this is important for
long skinny images like from drift scans) encode works line-by-line
whereas hcompress works on rectangles (untimately as large as
the entire image (I think)) - so encode can be embedded in the
software that writes out the images:
CCD --> memory --> 'encode' --> hard disk.
More on 'encode' can be found from C.N. Sabbey. I got his paper from
ADASS98, which was held here at UIUC. (You can email Sabbey at
sabbey@astro.yale.edu apparently). Or see PASP 110, 1067 (1998).
Available at ADS [Astronomical Data Center]
http://adsabs.harvard.edu/cgi-bin/Arne summarizes some of the Mark IV compression issues
aah@nofs.navy.mil wrote to TASS, prior to 18 Aug 1998 :
Just to summarize:
(1) we discussed compression for the mark III, and discarded it because
the images were small enough that the current storage media (CDROM)
could hold several night's worth of data. Lossless compression would
be ok, but adds the complexity of compression/uncompression and the
need to keep a version of the uncompress algorithm available on the
current computer/OS as long as the old archived disks survive.
Factors of 2 are never enough for those headaches.
(2) For the mark IV, with its 1.6GB/night data rate, archiving would
require multiple CDs or some other (most likely non-portable)
storage medium. For this case, you can (a) bite the bullet and
use lots of CDROMs; (b) extract pertinent data and throw away the
CCD frames; (c) extract pertinent data and keep some form of the
CCD frames that might permit reprocessing at a later date.
(3) While my original thoughts were to choose option(b), Chris suggested
data compression during another discussion, and that made option(c)
become viable if the compression ratio were large enough to bring
us back into the realm of the mark III data rate.
(4) Lossless compression has been studied extensively, and gives ratios
of 1.4-2.0 for typical CCD frames. This is not sufficient for
efficient use of CD-R media with the mark IV.
(5) Lossy compression has been used in a limited number of astronomical
projects, such as the Digital Sky Survey. It appears to not
compromise greatly photometry or astrometry if the compression ratio
is around 10:1. This is exactly the ratio we need to store 4-5nights
of data on a single CDROM.
(6) My suggestion, therefore, was to use a good lossy compression
technique, such as hcompress, for archival AFTER the best current
extraction had been made on the original images. This suffers from
the requirement of keeping hcompress running for several years, but
gives you reprocessing capability, albeit at some loss in precision.
(7) The real question in my mind has been what that precision loss is
at various levels of compression and with real mark IV data. I don't
like blanket statements like '"moderate" (10:1 lossy) compression has
little effect on astometry or photometry.' That may be the case for
the POSS, where 0.3arcsec astrometry and 0.2mag photometry is ok.
I want to see it tested on science-grade data, or for someone to
find the appropriate published article that discusses this case.
(8) Nick and Chris have started testing hcompress. Good! The compression
ratios look about as mentioned above. Now we need to check the
astrometry and photometry. As I said before, we can do that with
typical science-grade CCD frames as a first cut, comparing astrometry
and photometry as various compression levels. By then, Tom may have
some mark IV data to do a more definitive test.
(9) Note that I keep talking about science-grade data. Tom has made a
number of improvements between the mark III and the mark IV, and the
mark IV should be capable of 0.01mag photometry and 0.2arcsec
astrometry. You need to keep this in mind when looking at
compression degradation.
Arne
From: ankovacs@netcom.ca
Date: Tue, 18 Aug 1998 11:57:06 -0400
To: tass@wwa.com
Subject: [Re: CD vs. Travan: compression]
I am a "would be player" in the TASS game who would like to raise my
concerns about using lossey data compression to store this project's
data. Below I will explain why I am concerned. In addition, I will also
give a brief explanation of why I think CD-ROMs rather than tapes would
make a better storage media.
Rather than just walking into the debate with talk and no action, I
would like to have an opportunity to experiment with real data and
losseless compression. Can anyone suggest a source for any quantity of
real data?
Lossey Compression
Lossey compression and looking for variable stars does not sound to
me like a good match. All compression algorithams look for patterns in
data, however lossey compression introduces patterns even if they are
not really there. Lossey compression does not introduce roundoff error
but instead what would appear as random error. This is fine for
photographs but of great concern for scientific data in general and
looking for patterns in variable stars in particular.
Tape vs CD-ROM
There are several CD-ROM standards, but they are all just that,
standards. Different manufacturers' tape drives may use the same tapes,
but this does not mean data is stored in a standard format.
In my work, our product has evolved through three manufacturers of
tape drives. All have used travan tapes but we have never been able to
reliably read data from one manufacturer's backup on another brand of
tape.
Best regards
Andy K.
Date: Tue, 18 Aug 1998 18:02:52 +0000
From: Chris Albertson (chrisja@jps.net)
To: ankovacs@netcom.ca
CC: tass@wwa.com
Subject: Re: compression
ankovacs@netcom.ca wrote:
Tape vs CD-ROM
In my work, our product has evolved through three manufacturers of
tape drives. All have used travan tapes but we have never been able to
reliably read data from one manufacturer's backup on another brand of
tape.
That is a problem with low-end PC tape drives (like Traven) but not with
the tapes typical of larger systems. DAT, 8mm, DLT, nine track and the
"old" 1/4 inch tapes are all interchangeable. I have never heard of a
problem like you decribe with these tape media. "tar" is a pretty
universal data format. It is about 20 years old an likely to stay
around.
FITS is another standard for writing to tapes that is likely to be
around for a while.
For keeping pixel data around I think camera operators will use whatever
they have. Likely tapes. But for data exchange CD-ROMs are the way to
go. It is just that CDs are not big enough to hold the pixel data
unless you compress at least 10:1.
-- --Chris Albertson
Mark III and fcompress
Date: Thu, 20 Aug 1998 00:55:01 +0200 (MET-DST)
From: Jure Skvarc (SKVARC@eros.ijs.si)
Subject: Fcompress and photometry
To: tass@wwa.com
Cc: jure.skvarc@ijs.si
Hello
I made some analysis of the influence fcompress has on photometry of
stars in Mark III images. Following this introduction are the
details.
The same text is also in HTML-ized version on
http://kronos.ijs.si/~jure/fitsblink/fcompress/report.html
In short, you really can compress Mark III images without much loss of
information.
[Editor's note: the link as of June 2000 is:
http://www-rcp.ijs.si/~jure/fitsblink/fcompress/report.html
Herb]
Regarding the dilemma whether raw images produced by TASS should be
preserved, I would vote that they should be. In a short time, this
will make an impressive image library of the night sky. I am sure
that in few years the capacity of data storage media or data
transmission speeds will present no problem.
regards,
Jure Skvarc
Following are the boring details of the analysis:
--------------------------------------------------------------------
Analysis of an effect of the fcompress program on photometry of
g0483977.fts and g0493955.fts images.
Jure Skvarc
I analyzed two of the images (g0483977.fts and g0493955.fts) that
Michael R. kindly uploaded to his ftp server. First I used fitsblink
to make star lists and match them to the GSC catalog. The two images
happen to be partially overlapped so here is an opportunity to compare
the same stars on the two images. I made a small awk program to join
the records from the two lists which corresponded to the same GSC
stars. For beginning I made a graph which compares the measured
magnitudes with the magnitudes of the GSC stars
(http://kronos.ijs.si/~jure/fitsblink/fcompress/mag-gsc.gif). The
trasformation between the two magnitude values was
20 + 0.8 * (-2.5 log10(v)),
where v is the instrumental star intensity and log10 is the base 10
logarithm. At the moment I do not know why the transformation between
the intensities is not linear.
Below are the numbers of detected catalog stars which are present in
both images for each magnitude. Note that fitsblink did not detect
stars close to the left and right edges.
=====================
mag. no. of stars
---------------------
6 2
7 0
8 3
9 8
10 20
11 41
12 70
13 54
14 11
15 2
====================
In the next step I compared star magnitudes of the GSC stars in the
two images. The graph showing this can be found at
http://kronos.ijs.si/~jure/fitsblink/fcompress/two-image.gif
The average magnitude differences and standard deviations are shown in
the table below:
=========================
mag. n dif. std.
-------------------------
6 2 0.0802 0.0044
8 3 0.0575 0.0090
9 8 -0.0056 0.0100
10 21 0.0481 0.0167
11 41 -0.0112 0.0092
12 71 0.0091 0.0398
13 55 -0.0360 0.0648
14 11 -0.0620 0.0931
15 2 0.2554 0.0022
=========================
Here mag. means the magnitude, n is the number of stars in that
magnitude range, dif. is the average distance and std. is the standard
deviation.
Now everybody wonders what has the whole story to do with data
compression. This: I made a similar procedure as before, only that
this time I compared results of magnitude measurements for the same
image.
In the first experiment, I used a value 60 for the -s (scale)
fcompress switch, what gave compression factor of 4.9 and 4.8 for
g0483977.fts and g0493955.fts, respectively. The following tables
show average differences of star magnitudes in non-compressed and
compressed images for different magnitude values. Only stars matched
with the GSC were taken into account.
g0483977.fts g0493955.fts
========================= =========================
mag. n dif. std. mag. n dif. std.
------------------------- -------------------------
6 1 -0.0075 0.0000 6 1 -0.0007 0.0000
7 2 -0.0013 0.0000 7 1 -0.0023 0.0000
8 7 -0.0040 0.0000 8 7 0.0065 0.0005
9 13 -0.0016 0.0002 9 16 -0.0067 0.0000
10 40 -0.0080 0.0001 10 44 -0.0079 0.0001
11 81 -0.0026 0.0010 11 85 -0.0136 0.0008
12 138 -0.0102 0.0057 12 168 -0.0191 0.0023
13 159 -0.0329 0.0120 13 171 -0.0199 0.0046
14 31 -0.0216 0.0145 14 76 -0.0229 0.0072
15 5 -0.0019 0.0046 15 6 -0.0211 0.0094
========================= =========================
In the second experiment, I used fcompress -s 200, what gave
compression factors of 10.0 and 9.6.
g0483977.fts g0493955.fts
========================= =========================
mag. n dif. std. mag. n dif. std.
------------------------- -------------------------
6 2 0.0189 0.0004 6 1 -0.0024 0.0000
7 1 -0.0047 0.0000 7 1 -0.0053 0.0000
8 6 -0.0015 0.0000 8 7 0.0284 0.0065
9 13 -0.0065 0.0001 9 16 -0.0080 0.0002
10 40 -0.0101 0.0005 10 44 -0.0138 0.0004
11 80 -0.0166 0.0052 11 82 -0.0087 0.0017
12 125 -0.0301 0.0119 12 160 -0.0192 0.0070
13 122 -0.0832 0.0353 13 132 -0.0320 0.0237
14 21 -0.0905 0.0588 14 56 -0.0632 0.0319
15 2 -0.0742 0.0003 15 3 0.0330 0.0297
========================= =========================
We can see that image compression with fcompress lowers star
intensities and that compression factor of 10 gives 2-3 times larger
magnitude scatter than compression factor of 5. However, even higher
compression factor has lower scatter than one found in comparison of
two different images.
The fdecompress program has an option (-s) which enables image
smoothing. Let's check (for compression factor of 10) what happens if
we use it:
g0483977.fts g0493955.fts
========================= =========================
mag. n dif. std. mag. n dif. std.
------------------------- -------------------------
6 2 -0.1881 0.0001 6 2 -0.1346 0.0008
7 1 -0.2159 0.0000 7 1 -0.1734 0.0000
8 6 -0.1620 0.0002 8 7 -0.1492 0.0019
9 13 -0.1596 0.0005 9 15 -0.1646 0.0004
10 40 -0.1423 0.0011 10 44 -0.1500 0.0006
11 79 -0.1173 0.0054 11 84 -0.1161 0.0017
12 124 -0.0939 0.0139 12 159 -0.0751 0.0095
13 122 -0.0903 0.0330 13 134 -0.0560 0.0292
14 20 -0.0472 0.0373 14 48 -0.0762 0.0353
15 2 -0.0812 0.0001 15 4 0.1059 0.0165
========================= =========================
It seems that no significant change in magnitude scatter is achieved,
but we can see that few faint stars are missing in the statistic and,
more apparent, that star magnitudes decrease for 0.1 to 0.2 magnitudes
what leads to loss of some stars.
Conclusion: it seems to me that no significant loss in information
would happen if Mark 3 images were compressed by a factor of 5 or even
10 using lossy compression with the fcompress program. Certainly,
this conclcusion can not be simply extrapolated and a similar analysis
should be performed for every specific telescope/CCD/observatory
combination.
Date: Thu, 20 Aug 1998 14:27:45 +0200 (MET-DST)
From: Jure Skvarc (SKVARC@eros.ijs.si)
Subject: Re: Fcompress and photometry
To: tass@wwa.com
Cc: jure.skvarc@ijs.si
I'd like to thank Arne and Herbert for their comments about my
fcompress analysis. Here are some answers:
,,,
Jure, how are your magnitudes calculated
in fitsblink? Aperture? Star is somewhat different, and it would be
...
When I started the development of star detection routines for
fitsblink, my primary interest was that it should resolve closely
lying stars and determine their magnitudes in a rather robust manner
but I didn't need a very high accuracy. The images I needed to
analyze at that time had pixel size of some 12 arcseconds so we could
not talk about any profiles and consequently fitting was out of
question. I also didn't like aperture photometry because closely
lying stars could influence each other too strongly. Instead of this
I determine the background as a function of coordinate separately and
then detect areas which are above some threshold plus some tricks to
resolve conglomerations of closely lying stars. The fcompress
analysis was actually the first time when I compared photometry data
more quantitatively because doing photometry was not my initial goal.
...
I would have guess the compressed
images to give a fainter image (compress - uncompress = positive#) since
the smoothing would trim off the profile peak. Likewise, the compressed
image should spread the image out a little more, and therefore you would
have less light in the aperture, again making the star fainter.
Your results show just the opposite. I wonder why.
The explanation for this is probably that fcompress also compresses
the background and thus reduces the variation of the background (I
checked this). Since the detection threshold depends on the variation
of the background, you may gain some contrast instead of losing it.
It would be interesting to see how other algorithms respond to
smoothing by compression.
Michael -- are these two images flattened? Not flatcomp'd though.
Quite some time ago I proposed establishment of some image library for
the purpose of algorithm testing. I still think it would be nice to
have at least an order of 100 or so images available online. Also,
the images should be processed (dark and flat) and without the blind
pixels and telemetry information in the image area.
As a professional, I honestly don't see a huge benefit in making
'an impressive image library of the night sky' down to 14th magnitude.
The DSS does a much better job, and surveys like SDSS will have even
....
I agree that having a library of images which go to mag 14 is not all
that impressive. It is still my opinion that the impressive part
comes by adding a time dimension by imaging the sky over and over
again. This is how you plan to find all that variable stars, anyway.
Jure
Date: Fri, 21 Aug 1998 14:27:49 +0200 (MET-DST)
From: Jure Skvarc (SKVARC@eros.ijs.si)
Subject: Re: Fcompress and photometry
To: hjohnson@pluto.njcc.com
Herbert
"How many decimal places" is what I meant. Did STAR report 8.1234 or 8.123
or 8.12? Were the GSC magnitudes 7.12 or 7.1234?
I see. Fitsblink, which was used to extract magnitudes, actually
reports intensities, which were then trasformed into magnitude using
the formula presented in my report. GSC has magnitudes stored in
Fortran F5.2 format, i.e. to two decimal places.
...
and GSC magnitudes: it would be useful to say that "the error in
compression is comparable to our errors in observing" for instance.
My impression on the basis of the results is, that compression errors
are smaller and probably negligible in comparison to observation
errors for the Mark III images. For other images (with better optics)
it may be different. Observation quality also depends strongly on
observation conditions which may change quite a lot during such long
exposures and may also change across the field of view.
Jure
To: Jure Skvarc (SKVARC@eros.ijs.si)
Cc: SKVARC@eros.ijs.si
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: Fcompress and photometry
Date: Fri, 21 Aug 1998 17:07:18 -0400
On Fri, 21 Aug 1998 14:27:49 +0200 (MET-DST), Jure Skvarc wrote:
* Herbert
*
* "How many decimal places" is what I meant. Did STAR report 8.1234 or 8.123
* or 8.12? Were the GSC magnitudes 7.12 or 7.1234?
*
* I see. Fitsblink, which was used to extract magnitudes, actually
* reports intensities, which were then trasformed into magnitude using
* the formula presented in my report. GSC has magnitudes stored in
* Fortran F5.2 format, i.e. to two decimal places.
OK, I suggest in your "final" report you note the GSC values are to
two decimal places, and the results of STAR are to four decimal
places. By the way, when you "binned" the stars by magnitudes, I presume
"mag 7" is anything from 7.00 to 7.99, and so on. Again, a nice thing to
note in a complete report. (I've studied statistics and scientific
reporting, and old data, so these points are not always obvious.)
You might also confirm that TASS Mark III raw data is only "good" to
12 bits, that is one part in 4096. Magnitudes are a log scale so
I'm not sure what "precision" that corresponds to.
* and GSC magnitudes: it would be useful to say that "the error in
* compression is comparable to our errors in observing" for instance.
*
* My impression on the basis of the results is, that compression errors
* are smaller and probably negligible in comparison to observation
* errors for the Mark III images. For other images (with better optics)
* it may be different. Observation quality also depends strongly on
* observation conditions which may change quite a lot during such long
* exposures and may also change across the field of view.
Right, I would say (from paticipating in the email and my casual work
on some of the data) that observational errors predominate. Still,
it's nice to know the instrumental limits. Small apertures traditional
"punch through" atmospheric turbulence that would reduce seeing for
larger apertures (say several to 10 inches or more).
Herb
AFTTools discussion
To: TASS
Subject: Compression
From: andrew.bennett@ns.sympatico.ca (Andrew Bennett)
Date: Wed, 26 Aug 1998 17:39:43 GMT
I have been playing around with compression using the FITS
images provided. I have not been able to use Star etc (is there
a way to read FITS images back into Star?) but used SExtractor
to analyse the images - some of the large standard deviations
may be associated with problems with SExtractor?
I think the results below confirm my personal biasses:
archive the raw data if this is humanly possible! Any
useful degree of compression introduces uncertainties
that one would prefer to do without.
Lossless compression is, of course, fine.
Note the very large standard deviations as compared with
the probable errors. Not nice.
*************************************************************
Image G0483977
Compressed using ATFTools V1.0 H compression
Processed with SExtractor V2.0.0 1998Mar30 using defaults
SExtractor claimed to detect 624 and sextract 267 from the file.
But there were 554 entries in .CAT:
4 with M =3D 99 and 1 outside the image area, leaving 549.
Factor Size % N Common Extra Missing
Raw 1408 100.0 549
1 723 51.3 549 549 0 0
10 515 36.6 557 544 13 5
100 221 15.7 544 527 17 22
200 142 10.1 526 504 22 45
999 21 1.5 468 400 68 149
Factor Probable error Standard Deviation
DX DY DM DX DY DM
10 0.003 0.003 0.015 0.076 0.142 0.130
100 0.021 0.023 0.028 0.088 0.132 0.136
200 0.052 0.061 0.049 0.130 0.141 0.184
999 0.310 0.332 0.188 0.868 1.022 0.538
The standard deviations are dominated by the large errors.
The distribution of errors is pretty wild.
Errors are not confined to the weakest sources.
Andrew Bennett, Avondale Vineyard, Nova Scotia, Canada.
Subsequent discussion of ATFTools
FITS standards and "differences" compression
To: mgutzwiller@lanvision.com, tass@wwa.com
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: Puzzle Solution and Compression
Date: Thu, 22 Oct 1998 19:48:30 -0400
On Wed, 21 Oct 1998 14:32:53 -0400, mgutzwiller@lanvision.com wrote:
* Tom et al.,
*
* A co-worker of mine actually got interested in compressing Mark III images
* using a scheme similar to what Herb suggested. By dropping the lower 4 bits
* and using differences between pixel values he was able to get compression
* ratios of about 3.5 to 1. Dropping the lower 4 bits made no difference to
* the Star program's results since the noise was on the order of 6 bits anyway
* in my light polluted skies. A similar approach might be usable for the Mark
* IV.
*
* Note that using only the upper 12 bits would still give us a dynamic range
* of 4096 to 1.
*
* Mike G.
Differences is not a bad scheme, that is what GIF formats do. If the
difference is only a few bits, you just use the deltas. If the difference
is large, you represent it directly and start the differences again.
It's a classic algorithm. And, since most of our image is dark, most of
the differences are small. Is there a FITS format that supports this?
Herb Johnson
From: aah@nofs.navy.mil
Date: Fri, 23 Oct 1998 08:22:05 -0700
To: tass@wwa.com
Subject: Re: Puzzle Solution and Compression
I agree with Tom -- the mark IV will have more dynamic range than the mark III, and
can be run in many different modes (rather than the single, 468sec exposure
mode of the mark III drift scan). You cannot just throw the lower 4 bits away
for the mark IV. Lossless compression of scientific CCD data is difficult beyond the
normal 1.5x that you get from Unix compress / hcompress / etc.
Herb: there was considerable discussion when we were voting on the FITS standard
regarding the inclusion of compression, and no decision was made. In other words,
you cannot compress internally a FITS image and have it meet any standard. You
can take FITS images and zip/tar them together if you want. The problem with
internal compression is how to make it archive-safe. This would entail adding
the entire compression algorithm in the header.
Arne
Date: Fri, 23 Oct 1998 14:26:42 -0700
From: Chris Albertson (chris@topdog.pas1.logicon.com)
To: hjohnson@pluto.njcc.com, tass@wwa.com
Subject: Re: Puzzle Solution and Compression
Herbert R Johnson wrote:
Differences is not a bad scheme, that is what GIF formats do. If the
difference is only a few bits, you just use the deltas. If the difference
is large, you represent it directly and start the differences again.
It's a classic algorithm. And, since most of our image is dark, most of
the differences are small. Is there a FITS format that supports this?
No, FITS is a plain n dimensional array. In the FITS user guide
is a reference to a _proposed_ compression standard.(Warnock et al)
The proposal was in 1990 so likely it was dropped. gzip and
hcompress seem to be what is used. (I just had to look this up
because I remembered reading about a FITS compression standard.)
Didn't someone here on this list look into compression and it's
effect of photometry? I'm sure I saw it written up complete
with some plots.
Delta encoding works OK for images (After whacking off
the noisy low order bits) but it does not take advantage of the
2 (or more) dimensional-ness of the data. In an image, if the left
and right pixels tends to be about equal then so are the ones just
above and below. I think there are two classes of 2D compression.
Those that apply some 2d transform to the image then compress in
transform space and Quad Trees.
I don't think this is a big deal. Currently you can keep _all_ the
raw MkIV data at a pretty low cost on tape or CD.
To: Chris Albertson (chris@topdog.pas1.logicon.com) , tass@wwa.com
From: hjohnson@pluto.njcc.com (Herbert R Johnson)
Subject: Re: Compression
Date: Mon, 26 Oct 1998 13:43:17 -0500
On Fri, 23 Oct 1998 14:26:42 -0700, Chris Albertson chris@topdog.pas1.logicon.com wrote:
*
* No, FITS is a plain n dimensional array. In the FITS user guide
* is a reference to a _proposed_ compression standard.(Warnock et al)..
As Arne says, there is no FITS standard.
* Didn't someone here on this list look into compression and it's
* effect of photometry? I'm sure I saw it written up complete
* with some plots.
It was well discussed.
* Delta encoding works OK for images (After whacking off
* the noisy low order bits) but it does not take advantage of the
* 2 (or more) dimensional-ness of the data.
That is certainly true. Compression schemes that "match" the data to
be compressed do better. Schemes that produce "acceptable" data loss
do better still.
* I don't think this is a big deal. Currently you can keep _all_ the
* raw MkIV data at a pretty low cost on tape or CD.
As previously discussed, a night's viewing on Mark III will fit on ONE CD-ROM:
but Mark IV's proposed night's data may not. There was no consensus on tape
in our discussion; the consensus was that CD-ROM was a very convenient
medium that did not suffer from incompatibilities as did tape. I don't
care to reopen that discussion as I have nothing to add to it: but
I did want to introduce the notion to (eventually) establish the
dynamic range (i.e. reliable number of bits) from the Mark IV.
Herb Johnson
Mark IV compression Discussion for year 2000
recent astronomical papers on compression
On Wed, 17 May 2000 11:35:25 -1000 (HST), Jim Heasley
(heasley@hoku.ifa.hawaii.edu) wrote:
Herb,
I don't know if you've seen it (presumably Mike has) but there are
several papers on data compression in the ASP volume Astronomical Data
Analysis & Software Systems VIII. One of them is also discussed by
Sabbey, Coppi, and Oemler 1998, PASP, 1067 (which makes reference to
something on Mike's web site). This later paper also has code available
to play with at
ftp://www.astro.yale.edu/pub/sabbey/encode.tar.gz
Jim Heasley, Institute for Astronomy University of Hawaii
[Editor's note: a subsequent message from Jim to me follows below. - Herb]
Certainly feel free to includemy comments in your technote. As it
happens, I'm sending this message from home and have the volume of the
ASP Conference Series here with me as I write.
The volume is Astronomical Data Analysis Software and Systems III,
Astronomical Society of the Pacific Conference Series, Vol. 172,
Eds. Mehringer, Plante and Roberts.
The papers I thought to be of interest are:
White & Greenfeld, A Scheme for Compressing Floating-Point Images, p
125.
Sabbey, Adaptive, Lossless Compression of 2-D Astronomical Images, p
129.
It is this second paper that is using a scheme that is due to a fellow
named Rice, and in the PASP paper I mentioned in my earlier email, Sabbey
makes a reference to Mike Richmond's web page. The PASP paper has the
web links to the new method, and other oldies but goodies like hcompress
and fitspress.
A friend in the computer sciences business tells me that the bzip2
program in linux does very well for general compression. I tried it on
a couple of images and it did a bit better than hcompress (10% or so)
for lossless compression, but I haven't done the sort of extensive
testing Sabbey did on the "standard" set of FITS images. I guess I
could check this out as I did download those images from NOAO for this
purpose.
Jim Heasley