[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new tech note
Hey Chris,
Thanks for taking the time to review
and comment. One thing I wanted to do was
stay above implementation details. I think what I'll do is add a section for
implementation suggestions. I had envisioned linking each table with foreign keys,
and having cascading rules on so you could delete a night and have all the
associated data removed also. I like to do it the right (my? :-)) way before I
implement for speed.
>
> OK, some comments. I went through this procedure, asking for
> comments, for
the Mk III database. Got lots of good comments.
> I'll pass them
on plus some things I learned. I used Postgresql
> also. Seems
everyone here like it.
>
> 1) The orientation
of the chip "belongs" to an __image__ not a
> telescope. Telescops
are likely to be modified, fixed, moved
While I agree, I don't believe that
the data would change for a night, so I'd be more
likely to keep it in a separate grouping. This seems to be more of an implementation
detail though, as long as the information is stored. Personally, I tend to normalize
the data.
> or whatever. Also IMO the best way to descibe the orientation
> is to copy the
data direct from the FITS header.
I just look at 1 picture, hira2246788.fits,
and could not find chip orientation data, but I
may not know what I'm looking for :-) What info is this, or are we not currently
putting this in the header?
> The other option
is to assign each telescope a new serial number
> whenever some
part is swapped out, You'll never keep track.
Not a serial number, but a new entry
into one table (telescope), linked to the next
(detector).
> Best to just
remember the FITS keywords from the image.
> The pipeline
__should__ (must?) pass the image header through
> so that it shows
up in the starlist header.
How much though. For instance,
do we keep the ra/dec in the header, even when
it's a suggestion of where the telescope is thought to be pointed? In general though,
I agree that the fits header info should be in the dB in some form or another.
>
> 2) Again, where
you say "for each detector". It will be easyer to
> keep this data
for each image. (quick calculation: 10 year
Why to you say easier? And,
that normalization thing pops up again. I'd rather keep
300 copies of the information (if the detector info changed nightly), rather than 6
million copies. That will save lots of space (maybe at to large of a computational
penalty for the extra joins though)..
> project lifetime,
20 cameras, 300 nights per year, 100 images
> per night = 6
million images = no big deal. 6M row tables are
> easy and we will
not be at 6M for ten years.
>
> 3) Again, in
place of "for each night" use "for each frame". I'd
> keep the information
on processing along with the dataset.
> There is always
the option of deleting data, re-proceesing it and
> adding it back
in. Say we find a bug in version 2.3 or the
> data reduction
software. We will want to delete all data proceesed
> by the software.
I'd go the normalizing route again.
>
> So, given my
comments above. I'd have a table called "FRAMES"
> that lists eveything
we know about a frame. Which telescope
> was used, what
software was used, clouds cover, the filter
> the image scale
and so on. Also each frame is given an ID.
>
> Now, where you
say "for each star", I think you want to say
> "for each
observation of an object" because we will likely
I need to spell this out a little
better in the TN. While I wasn't showing a schema, I
was implicitly linking each "For each" to the previous one, so each "Star" would
be
linked to one "Image file", linked to one "night", etc...
> see the same
star many times. OK, next I think you need a
> "FRAME ID"
to be stored along with each row. Now you can trace
> every observation
back to the camera and FITS image file.
Implementation section - soon to come.
>
> Not done yet.
>
> Next we need
a table called "stars" or "catalog" or "index".
> This has one
row for ever object TASS has ever seen. (That
> is, one row for
each physical star.) We assign a arbitrary
> ID to each start
(use 1, 2, 3, or a name made up from the
> location. But
I like 1, 2, 3 because the exact location will
> change as we
average more observations or as we "split"
> dobles.) Call
it "TASS Catalog ID".
Yup, I gottcha there.
>
> Going back to
the table called "for each observation of an
> object" We need to assign a "TASS Catalog ID" to each
> observation. Now the "catalog" will contain summary statistics
> for each star
while the "observations" table contains the
> data points that
make up a light curve.
Yup, implementation, or maybe a derived
data section.
>
> ++++++
>
> The algorithm
I used to process a star list first added one row
> to the "frames"
table, just copy the file header. next for
> each row in the
starlist I added it to the big "observation"
> table. Next
I had to asign a "TASS Catalog ID". So I looked
> in the catalog
for a close match. If none was found I figured
> this was the
first time this star was seen an added a new
> catalog entry. If I found a match I'd recompute the summary
> statistics for
that star.
Similar to what I'm doing now. I
update the star_id in the observation table, and the
trigger on that table will update the statistics for that linked star.
>
> The above was
not easy to implement and I still don't like the
> way I did it. A run could take many hours. I would make a
> frame a "transaction"
in DBMS terms.
Hmm... The imports I do have
are transactioned. When I process a night (or many
nights), I find every observation which isn't matched to a star. Ah heck. I've attached
the perl/sql script I currently use. Amazingly, I actually commented this one as I went
along. Hopefully, it's readable enough to get the general idea. I transaction each set
of observations which update or create a star. Ignore the CVS comments in the
header, they came from a bad commit and I have not removed them yet...
>
> I also had a
set of "standard" catalogs. Get as many as you
> want. Tycho,
variable star catalogs and so on. The more the
> better. Now
for each catalog you make a "Cross ID Table"
> that has two
colums listing matches between the Tycho ID and
> the TASS Catalog
ID.
Not doing this one (yet?).
>
> With the above
you can see that with one SQL query I could
> quicky make a
light curve for any object I could name or even
> querry for stars
with a summary statistic within some range.
> like "SELECT
... where ColorIndex between 2.0 and 3.0;"
Again, thanks for the input. New
revision coming sometime... Tom also indicated
(and I agree) that a lot of data from his ADC system should be added, although I'm
thinking that this would be in a completely separate and unlinked (except for site,
which I've forgotten to add) as the reading of these values can (and should?) take
place independently of actually using the CCD's.
I've more work to do, that's for sure.
Thanks,
Rob
#!/usr/bin/perl
#
# $Name: $
#
# $Log: mergeStars.pl,v $
# Revision 1.2 2001/12/11 06:01:09 robc
# Add CVS header data and log from last commit
#
# Add compile time ability to do mean, median or inter quartile mean
# Add setting of bscale/bzero for SHORT_IMG so values are always positive
# 'Fix' asci column to accept all file names used in createion of flat
# Remove file before writing
# General clean up
#
# Revision 1.1 2001/11/23 16:11:29 robc
# First version
#
use warnings;
use strict;
use IO;
use DBI;
use constant ID => q$Id: mergeStars.pl,v 1.2 2001/12/11 06:01:09 robc Exp $;
use constant MIN_STARS => 11;
use constant VACUUM => 10000;
my $min_smag = .1;
my $radius = 0.004;
STDOUT->autoflush;
my $version = join ( ' ', ( split ( ' ', ID ) )[ 1 .. 3 ] );
$version =~ s/,v\b//;
$version =~ s/(\S+)$/($1)/;
my $file = "/tass/src/tassiv_reduce/TASSIV.data";
unless ( my $return = do $file )
{
die "Couldn't parse $file: $@" if $@;
die "Couldn't do $file: $!" unless defined $return;
die "Couldn't run $file: $!" unless $return;
}
my $dbh =
DBI->connect( "DBI:Pg:dbname=tassiv", "", "",
{ RaiseError => 1, AutoCommit => 0 } );
sub done;
$SIG{ INT } = \&done;
my $vacuum = <<__VACUUM__;
VACUUM ANALYZE comp_loc;
VACUUM ANALYZE comp_u;
VACUUM ANALYZE comp_b;
VACUUM ANALYZE comp_v;
VACUUM ANALYZE comp_r;
VACUUM ANALYZE comp_i;
VACUUM ANALYZE observations;
__VACUUM__
#
# This one looks for stars in the observations which haven't matched yet but
# are within the radius of 'this' star
my $sql = <<__END_FIND_MATCHING_STARS__;
SELECT ra, decl, obs_id
FROM observations
WHERE circle_contain_pt(
circle( point( ?, ? ), $radius ),
point( ra, decl ) )
AND ra BETWEEN (? - $radius) AND (? + $radius)
AND decl BETWEEN (? - $radius) AND (? + $radius)
AND star_id ISNULL
AND smag < $min_smag
__END_FIND_MATCHING_STARS__
my $sth_find_matching_stars = $dbh->prepare( $sql );
#
# This one looks into the frozen stars to make sure the 'current' star doesn't
# match an existing one
$sql = <<__END_FIND_EXISTING_STARS__;
SELECT star_id, ra, decl
FROM frozen_loc
WHERE circle_contain_pt(
circle( point( ra, decl ), $radius ),
point( ?, ? ) )
AND ra BETWEEN (? - $radius) AND (? + $radius)
AND decl BETWEEN (? - $radius) AND (? + $radius)
__END_FIND_EXISTING_STARS__
my $sth_find_existing_stars = $dbh->prepare( $sql );
#
# Set up some inserts now...
my $sth_insert_comp_loc =
$dbh->prepare( "INSERT INTO comp_loc( ra, decl ) VALUES( ?, ? )" );
my $sth_insert_comp_u =
$dbh->prepare( "INSERT INTO comp_u( star_id ) VALUES( ? )" );
my $sth_insert_comp_b =
$dbh->prepare( "INSERT INTO comp_b( star_id ) VALUES( ? )" );
my $sth_insert_comp_v =
$dbh->prepare( "INSERT INTO comp_v( star_id ) VALUES( ? )" );
my $sth_insert_comp_r =
$dbh->prepare( "INSERT INTO comp_r( star_id ) VALUES( ? )" );
my $sth_insert_comp_i =
$dbh->prepare( "INSERT INTO comp_i( star_id ) VALUES( ? )" );
#
# and a select...
my $sth_star_id_select =
$dbh->prepare( "SELECT currval('comp_loc_star_id_seq')" );
#
# and another select...
my $sth_ra_decl_select =
$dbh->prepare( "SELECT star_id, ra, decl, count FROM comp_loc" .
" WHERE star_id = ?" );
#
# and another insert...
my $sth_frozen_loc_insert =
$dbh->prepare( "INSERT INTO frozen_loc( star_id, ra, decl, count ) " .
"VALUES( ?, ?, ?, ? )" );
#
# and an update
my $sth_observation_update =
$dbh->prepare( "UPDATE observations SET star_id = ? WHERE obs_id = ?" );
#
# and a delete
my $sth_welch_index_delete =
$dbh->prepare( "DELETE FROM welch_index WHERE star_id = ?" );
#
# First off, lets find all the stars which have no star_id
my $sth_null_stars = $dbh->prepare(
"SELECT ra, decl, obs_id FROM observations WHERE star_id ISNULL".
" AND smag < $min_smag" );
print "Finding unassigned stars: ";
$sth_null_stars->execute;
#
# With the following bind, after each fetch, we'll end up with a hash of
# the columns from the select. If we add to the select, we don't have
# to change the bind either...
my %null_select;
$sth_null_stars->bind_columns(
\( @null_select{ @{ $sth_null_stars->{ NAME_lc } } } ) );
printf "found %d of them\n", $sth_null_stars->rows;
#
# This hash will keep track of obs_id's which have already been assigned
# to a star in this session, thus reducing the number of queries we need
my %seen;
my $new = 0;
my $new_total = 0;
my $update = 0;
my $many = 0;
my $not_enough = 0;
my $count = 0;
my $stats = 0;
my $last_vacuum = 0;
print " ";
while( $sth_null_stars->fetch )
{
#
# If we've seen it, remove it (obs_id's are unique, so we'll never see
# it again), and continue with the next one
if ( exists $seen{ $null_select{ obs_id } } )
{
delete $seen{ $null_select{ obs_id } };
next;
}
my $ra = $null_select{ ra };
my $dec = $null_select{ decl };
#
# If this star matches 1 existing one (from the frozen table),
# then we can add this star to it's influence. While we're at it,
# lets go ahead and add all appropriate stars to it's influence
$sth_find_existing_stars->execute( $ra, $dec, $ra, $ra, $dec, $dec );
if ( $sth_find_existing_stars->rows == 1 )
{
#
# Grab the id of what we're matching to
my ( $star_id, $lra, $ldec ) = $sth_find_existing_stars->fetchrow_array;
#
# Since we're adding a star, delete the welch index so it'll be
# calculated again
$sth_welch_index_delete->execute( $star_id );
#
# Here, we'll find all the stars to update
$sth_find_matching_stars->execute( $lra, $ldec, $lra, $lra, $ldec, $ldec );
my %matching_select;
$sth_find_matching_stars->bind_columns(
\( @matching_select{
@{ $sth_find_matching_stars->{ NAME_lc } } } ) );
#
# Now, go update all the stars which match. This will fire the
# trigger in the database to actually do the work of bringing the
# data from the observations table into the comp_X tables
while ( $sth_find_matching_stars->fetch )
{
#
# If we've seen it, remove it (obs_id's are unique, so we'll never see
# it again), and continue with the next one
if ( exists $seen{ $matching_select{ obs_id } } )
{
delete $seen{ $matching_select{ obs_id } };
next;
}
$seen{ $matching_select{ obs_id } } = 1;
++$count;
++$update;
printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
100 * $count / $sth_null_stars->rows;
$sth_observation_update->execute( $star_id,
$matching_select{ obs_id } );
}
$dbh->commit;
next;
}
#
# If it matches more than one, then we don't update anything (yet)
elsif ( $sth_find_existing_stars->rows > 1 )
{
$seen{ $null_select{ obs_id } } = 1;
++$count;
++$many;
printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
100 * $count / $sth_null_stars->rows;
next;
}
#
# Here, we matched no existing stars, so we'll see if we can create
# a new one, given we have enough stars matched. Note that this is
# total stars matching in all bands.
$sth_find_matching_stars->execute( $ra, $dec, $ra, $ra, $dec, $dec );
if ( $sth_find_matching_stars->rows >= MIN_STARS )
{
++$new;
my %matching_select;
$sth_find_matching_stars->bind_columns(
\( @matching_select{
@{ $sth_find_matching_stars->{ NAME_lc } } } ) );
#
# Here, we create the entries in the comp_X tables so the update
# trigger can do it's work
$sth_insert_comp_loc->execute( $ra, $dec );
$sth_star_id_select->execute;
my $star_id = $sth_star_id_select->fetchrow_array;
$sth_insert_comp_u->execute( $star_id );
$sth_insert_comp_b->execute( $star_id );
$sth_insert_comp_v->execute( $star_id );
$sth_insert_comp_r->execute( $star_id );
$sth_insert_comp_i->execute( $star_id );
#
# Now, go update all the stars which match. This will fire the
# trigger in the database to actually do the work of bringing the
# data from the observations table into the comp_X tables
while ( $sth_find_matching_stars->fetch )
{
#
# If we've seen it, remove it (obs_id's are unique, so we'll never see
# it again), and continue with the next one
if ( exists $seen{ $matching_select{ obs_id } } )
{
delete $seen{ $matching_select{ obs_id } };
next;
}
$seen{ $matching_select{ obs_id } } = 1;
++$new_total;
++$count;
printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
100 * $count / $sth_null_stars->rows;
$sth_observation_update->execute( $star_id,
$matching_select{ obs_id } );
}
$sth_ra_decl_select->execute( $star_id );
my @frozen_loc = $sth_ra_decl_select->fetchrow_array( );
$sth_frozen_loc_insert->execute( @frozen_loc );
$dbh->commit;
}
else
{
$seen{ $null_select{ obs_id } } = 1;
++$count;
++$not_enough;
printf "\b\b\b\b\b\b\b\b\b%8.4f%%",
100 * $count / $sth_null_stars->rows;
}
#
# All done with this round
if ( ($count - $last_vacuum) > VACUUM )
{
$last_vacuum = $count;
print " Vacuuming";
local $dbh->{AutoCommit} = 1;
$dbh->do( $vacuum );
print "\b\b\b\b\b\b\b\b\b\b";
print " ";
print "\b\b\b\b\b\b\b\b\b\b";
}
}
{
print " Vacuuming";
local $dbh->{AutoCommit} = 1;
$dbh->do( $vacuum );
print "\b\b\b\b\b\b\b\b\b\b";
print " ";
print "\b\b\b\b\b\b\b\b\b\b";
}
done( );
sub done
{
$sth_find_matching_stars->finish;
$sth_find_existing_stars->finish;
$sth_star_id_select->finish;
$sth_ra_decl_select->finish;
$sth_null_stars->finish;
$dbh->disconnect;
print "\nCreated $new new stars from $new_total\n";
print "Didn't use $not_enough stars because there were't enough to make new ones\n";
print "Updated $update stars\n";
print "Couldn't update $many stars because they matched more than one\n";
};