[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TASS] New Data Handling Paradigms?



On Mon, 20 Sep 1999, Chris Albertson wrote:

> Peter Mount wrote:
> >
>
> > Looking at these figures, do you think I should look at stress testing
> > PostgreSQL to make sure it can handle this amount of data?
>
> Peter,
>
> It would be nice to know what to expect.  The test should
> be realistic.  Here is an idea of what (IMO) would make for a
> realistic test.
>
> Make two tables.
> 1) Observations: 200 million rows about 100 bytes per row.

Wow, thats just over 18.6 Gig!

> 2) Catalog:       20 million rows about 100 bytes per row
> Each table has a primary integer key "TASS_ID"
> Build index as required
>
> I'd be interested to know the number of transactions/minute
> you can do.  A transaction is
> 1) Append two or more rows to "Observations"
> 2) update on row of "Catalog"

This shouldn't be difficult to do, although I don't have the ammount of
free space yet. I can push about 60 million rows at 100 bytes per row.

> With a _good_ off-line (non DBMS based) pre-matching program we
> can do all appends to "Observations" for a given TASS_ID then only
> one update to "Catalog".  I have found COPY to be much faster
> then INSERT.

COPY always has been. This is because we have to parse every INSERT. If
prepared statements are implemented, we could cut out the query parse, but
unless someone (on the PostgreSQL side of things) takes it, it will be
some time.

For the ammount of data we have, COPY is the better solution.

> So the _basic_ algorithm I see using using is this:
> 1) Performed at each camera site.
>    a) Match all observations for a given period (one or more nights)
>       to a local copy to the TASS_Catalog.  Assign a TASS_ID to
>       each observation.
>    b) Sort all observations (I and V together) by TASS_ID.
>    c) Throw away any star that was not seen at least twice.
>    d) Build the list of UPDATE commands to be applied to the
>       central database's copy of "tass_catalog".
>    e) Transmit to Michael's site.
> 2) Performed at central database site.
>    a) START TRANSACTION
>    b) Copy to Observations table two or more rows
>    c) update tass_catalog
>    d) COMMIT TRANSACTION
>
> There are more special cases not covered by the above but to purposes of
> performance testing  just doing steps 2a --> 2d above with random data
> on a large database is enough.

Yes, 2a->2d would be the main area to look at.

> If you can script the test then others can try it out on their hardware.
> I'd be very interested to see how well this works on a range of hardware.

Yes this would be useful. Ok, I'll hack something up that's portable.

When I tested the 2Gb problem, I used a simple bash script that ran psql
and used COPY to load the data. I'll probably use a varient of that
script.

Peter

--
       Peter T Mount peter@retep.org.uk
      Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgres
 Java PDF Generator: http://www.retep.org.uk/pdf