[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: size musings




When you have LOTS of data you have to choose between
storing it so that it is compact or so that is can be
searched quickly.  For example suppose you have 1E9
photometric measurements (not an unreasonable number)
the most compact storage method is (say) one 40 byte
record per observation in one large flat file.  This
sets a lower limit of 40GB.  Now suppose you need to
find "all stars with a color index > XX brighter than
mag=12."  Well, you are sunk.  The only method that would
work is to read all 1E9 records.  It would take a long
time.  As soon as you demand fast access to selected
parts of your data you need index files., you need many index
files and some method of keeping them up to date.
Now your data is not so compact.  The index files will
be at least as big as the data.  More likely three or
more times the size of the data itself.

The up side is that if you need a subset of the data the
time to read it out can be proportional to the size of the
subset and (mostly) independant of the total amount of data.

The good news is the 100 or 200GB disk subsystems and computers
with 1GB RAM are becomming personably afordable.   Yes I think
the Mk IV database will come in the 100~200GB range. 

> -----Original Message-----
> From: Creager, Robert S [mailto:CreagRS@LOUISVILLE.STORTEK.COM]
> Sent: Friday, February 16, 2001 1:48 PM
> To: 'Tass Mailing List'
> Cc: 'dgo'
> Subject: size musings
> 
> 
> 
> Pardon, but I was explaining the size problem to a co-worker, 
> and starting
> musing about possibilities of the entire system...  These 
> numbers might be
> on the low side - any comments?
> 
> 2000 stars per image
> 
> 2 images per picture set
> 100 picture sets per night
> 100 nights per year
> 
> 20,000 images to be analyzed a year
> 8Mb per image
> 160 Gigabytes of data per site to be analyzed per year
> 
> 40,000,000 stars per site per year, and if we keep every 
> observation in a
> database:
> 26 bytes per star (observation list only)
> 1 Gigabyte per site per year of star data.
> 
> 12 sites...
> 
> 
> Robert Creager
> Senior Software Engineer
> Client Server Library
> 303.673.2365 V
> 303.661.5379 F
> 888.912.4458 P
> StorageTek
> INFORMATION made POWERFUL
> 
> 
>