[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: We need to be 1,000 times faster.



I don't think the Sloan guys did a bad job.  (I better not say it, a good 
friend on the project may be monitoring this.) They were working with the 
material of their day and the thinking of their day.  The Sloan had 
specific goals and their design met them at the cost of the day the design 
was made.  (7-8 years ago?)  They do not need to scale up, just meet the 
old goals.  But the stuff has become obsolete during the time it took to 
get the telescope built.  Now they have to maintain really old stuff.

This is really an argument for speed in the engineering process.  One 
should think about it a lot.  Then design and build just as fast as you can 
with all the time lines converging at the same point.   It is also an 
argument for understanding exactly how long it takes to engineer 
something.  In big science there is an almost conscious effort not to do 
things in a standard way.  Thus no one learns how long it takes to do 
things.  Then when you have two or more groups working on a big project (as 
at Fermilab) the time lines just do not converge.  All the WBS systems in 
the world do not help as there is no engineering experience.

I am really thinking on paper about how we should do the design process, 
and I have had a plan all along.  Just build the hardware and deliver 
it.   This then forces the software writers  to do what can be done on a 
time scale of the attention span of the operator/programmers.  There is 
just no use in trying to do much until the hardware arrives.  But now it is 
close.

Since I started this, Moore's law has helped a lot.   It looks like disk 
drives are now cheap enough for this project.  At least at the home 
locations.  But I expect us to get smarter too.  The fact is the present 
(Mark III) software is good enough with present computers to do all the 
processing I can imagine on the data.

We really only have a problem it making the data available to the 
public.  Let's face it.   There are not so many users out there that we 
cannot respond to requests like "give me all the data with V-I > 2."  OK, 
so I might have to load a stack of CD ROMs into a juke box and run for a 
week.  That is OK with me.  As opposed to Arne who wants to do his own 
science, I am content to be a science server.

To those that might say that this is an awful way of doing science, I argue 
that a small data base will serve to explore concepts.  One could explore a 
1% or 5% on line data base to decide what data set is really 
interesting.  Then he could make a request as above.  Just brute force, but 
it should work.

So I don't think we have to start out with a full data base with all the 
data in it.  I still want to have a juke box that holds 200 CD ROMs.  They 
exist for music.  How hard can it be to interface them to a computer.  I 
keep hoping that the NAPSTER folks will make people want such a thing 
attached to their computer so they can hold a zillion titles.  Then I will 
be able to buy one for $199.95 and respond to data requests.

Tom Droege

At 09:59 PM 8/4/00 +0000, you wrote:

>Sounds like the Slone folks did not apply my rule of asking
>"What happens if we need to scale up for a huge factor?"
>What they did was design to a static requirement.  I'm doing
>this here at work right now.  We are saying our system is to
>contain "N" 1U tall rack mount computers, and will be a
>mix of Sun SPARC and Intel Pentium.  Initially we'll have
>six computers but we will make absolutly certain that the
>design will scale up in a huge way.  We could go to 50
>computers easy. Our disk arrary will start at 200GB but it
>capasity to scale by 10x or more.
>
>Moores law (100% growth every 18 mounts) will not help us.
>We can't just wait until this gives as a 1000x It would
>take to long.
>
>We can just continue on but what we will have in a bunch of
>camera systems all operated independently.  Just ask Michael
>what he would do if both You (six cameras) and Arne (two
>cameras but 300 clear nights) send it all your star lists
>to him.  Could he handle that much data.  Maybe if he
>invested in a much bigger PC and got some local help.  Now
>suppose All 20 Mk IV camera can on line could he handle that?
>Michael are you reading this?  What I'm afraid of is that
>the CDs would just pile up on the floor.
>
>In another sense you could be right.  If we just move on
>things will work as we will find some way to make it work.
>Yes we will and I outlined it in my last e-mail.  All that
>text biols down to just this:
>
>We don't send _all_ our data to Michael, we only send him
>our "best" data and only in the volume he can handle.  There
>is little choise here.
>
>As I said, I think cameracontrol and data reduction scales
>well.  There is a computer and an operator for each camera.
>Even with 10,000 we'd be OK here.  A problem only exists
>if we want to continue to run a central database using
>the current method.  We can't do this and we can't wait
>as current trends predict a 12 year wait.  So my suggestion
>was simply to not use the current method for this function.
>Use a smarter one.
>
>
> >I was just talking to someone about the Sloan.  The Fermilab computing
> >group designed the data acquisition system.  They nearly met the
> >schedule.  As a result, they were 3-5 years too early and the system is
> >designed around computers so obsolete that they have to be special ordered
>
> >for spares.  They find that they can spend x dollars and get 20% spares for
>
> >the old design.  The same x dollars would buy 1/5 the number of computers
>
> >that would run 5x as fast and do the whole job.  So they are faced with
> >buying something slightly different and upgrading software, etc., or buying
>
> >old stuff for spares and replacements.
> >
> >It seems like it will always be thus in the computer biz.
> >
> >I say don't worry about it.  We can now just barely do the Mark III
> >job.  By the time that the Mark IVs are spitting out all that data, we will
>
> >just barely be able to handle it.  Note that I thought about all this when
>
> >I started this project.  If anything the computer cost for the project had
>
> >decreased relatively from what I expected.  When I started this, I was
> >running with 20Mbyte disk drives.
> >
> >What we don't want to do is to be tied to any particular bit of
> >hardware.  Sloan has a lot of special hardware.  We only have the need for
>
> >a single ISA slot.  You can still buy computers with ISA slots (probably
> >not for long).  We only need this on the very front end so we should be
> >able to make do with old PCs as the telescope controller.
> >
> >So I say don't be afraid of the size of the problem.  It will get smaller
>
> >with time.  Let's take good data and do what we can to reduce it - at least
>
> >to calibrated star lists.  I think each of us can do that at our sites with
>
> >present technology.  We can then always process some of it to a data
> >base.  We just do what we can.  If this data base is useful, pressure will
>
> >grow to process all the data.  A way will be found.  Data in astronomy has
>
> >time value.  You can never again take data for the year 2000.  So if we
> >have it in an archive, some of it will be wanted.
> >
> >The big problem is to figure out how to store the data so that it can be
> >processed at a later date.
> >
> >Tom Droege
> >
> >At 11:52 AM 8/4/00 -0700, you wrote:
> >>Some problems with Chris' scheme, though in general
> >>I agree that distributed database processing is what
> >>will be needed.  First, a site such as NOFS or Tom's
> >>6X will produce more data per year than is currently in Michael's
> >>entire database.  If Michael has problems updating it, then
> >>at the same level each site will have problems in maintaining
> >>their own individual database.  Second, while
> >>FITS tables format is certainly a convenient,
> >>standard way to keep data available that might be
> >>needed for queries, such tables are not SQL-accessible
> >>and I wonder how a centralized server could handle them
> >>in an efficient manner.  The total database is not that
> >>much different in size from SDSS.  Let's see what ARNE
> >>brings to the table.  Perhaps the time really has come
> >>to get Microsoft, etc. involved.
> >>Arne
> >
> >
> >
> >