[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: We need to be 1,000 times faster.
Sounds like the Slone folks did not apply my rule of asking
"What happens if we need to scale up for a huge factor?"
What they did was design to a static requirement. I'm doing
this here at work right now. We are saying our system is to
contain "N" 1U tall rack mount computers, and will be a
mix of Sun SPARC and Intel Pentium. Initially we'll have
six computers but we will make absolutly certain that the
design will scale up in a huge way. We could go to 50
computers easy. Our disk arrary will start at 200GB but it
capasity to scale by 10x or more.
Moores law (100% growth every 18 mounts) will not help us.
We can't just wait until this gives as a 1000x It would
take to long.
We can just continue on but what we will have in a bunch of
camera systems all operated independently. Just ask Michael
what he would do if both You (six cameras) and Arne (two
cameras but 300 clear nights) send it all your star lists
to him. Could he handle that much data. Maybe if he
invested in a much bigger PC and got some local help. Now
suppose All 20 Mk IV camera can on line could he handle that?
Michael are you reading this? What I'm afraid of is that
the CDs would just pile up on the floor.
In another sense you could be right. If we just move on
things will work as we will find some way to make it work.
Yes we will and I outlined it in my last e-mail. All that
text biols down to just this:
We don't send _all_ our data to Michael, we only send him
our "best" data and only in the volume he can handle. There
is little choise here.
As I said, I think cameracontrol and data reduction scales
well. There is a computer and an operator for each camera.
Even with 10,000 we'd be OK here. A problem only exists
if we want to continue to run a central database using
the current method. We can't do this and we can't wait
as current trends predict a 12 year wait. So my suggestion
was simply to not use the current method for this function.
Use a smarter one.
>I was just talking to someone about the Sloan. The Fermilab computing
>group designed the data acquisition system. They nearly met the
>schedule. As a result, they were 3-5 years too early and the system is
>designed around computers so obsolete that they have to be special ordered
>for spares. They find that they can spend x dollars and get 20% spares for
>the old design. The same x dollars would buy 1/5 the number of computers
>that would run 5x as fast and do the whole job. So they are faced with
>buying something slightly different and upgrading software, etc., or buying
>old stuff for spares and replacements.
>It seems like it will always be thus in the computer biz.
>I say don't worry about it. We can now just barely do the Mark III
>job. By the time that the Mark IVs are spitting out all that data, we will
>just barely be able to handle it. Note that I thought about all this when
>I started this project. If anything the computer cost for the project had
>decreased relatively from what I expected. When I started this, I was
>running with 20Mbyte disk drives.
>What we don't want to do is to be tied to any particular bit of
>hardware. Sloan has a lot of special hardware. We only have the need for
>a single ISA slot. You can still buy computers with ISA slots (probably
>not for long). We only need this on the very front end so we should be
>able to make do with old PCs as the telescope controller.
>So I say don't be afraid of the size of the problem. It will get smaller
>with time. Let's take good data and do what we can to reduce it - at least
>to calibrated star lists. I think each of us can do that at our sites with
>present technology. We can then always process some of it to a data
>base. We just do what we can. If this data base is useful, pressure will
>grow to process all the data. A way will be found. Data in astronomy has
>time value. You can never again take data for the year 2000. So if we
>have it in an archive, some of it will be wanted.
>The big problem is to figure out how to store the data so that it can be
>processed at a later date.
>At 11:52 AM 8/4/00 -0700, you wrote:
>>Some problems with Chris' scheme, though in general
>>I agree that distributed database processing is what
>>will be needed. First, a site such as NOFS or Tom's
>>6X will produce more data per year than is currently in Michael's
>>entire database. If Michael has problems updating it, then
>>at the same level each site will have problems in maintaining
>>their own individual database. Second, while
>>FITS tables format is certainly a convenient,
>>standard way to keep data available that might be
>>needed for queries, such tables are not SQL-accessible
>>and I wonder how a centralized server could handle them
>>in an efficient manner. The total database is not that
>>much different in size from SDSS. Let's see what ARNE
>>brings to the table. Perhaps the time really has come
>>to get Microsoft, etc. involved.