[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Lots of data




Yo,

I'm keeping the timings from importing every frame (in records/second), and
the operations/second on the merge stage (currently in sets of 5000
operations), along with what types of merges occurred (new star, updating
current one, not enough to create a new one), as the types of operations
dramatically affect the processing speed.

You are right on the COPY.  I thought about using COPY, but modifying the
current code was quicker.  I'll get a COPY version this weekend, along with
the random shifting.  I'll shift the ra from 1 to 355 (no wrapping to worry
about), and the dec from -14 to +14 (or should dec be larger?).  I'm using
INSERT now 'cause I was thinking eventually of a multi user dB, and didn't
want to try to manage the SERIAL field I have in the observation table.

I know fsync is off, and the buffers are set to some very large number, but
I don't remember off the top of my head.  All told, I think I've allocated
nearly 500Mb to PostgreSQL, but I'll have to check.

Bye,
Rob

> -----Original Message-----
> From: Chris Albertson [mailto:chrisalbertson90278@yahoo.com]
> 
> I don't know the details of Postgresq's quadtree index algorithm
> but with identical data I suspect a kind of unnatural tree will
> be built.  I like your idea of adding an offset to the whole
> frame I'd mix small (arc seconds) offsets with large (degrees)
> offsets.  This way your database will simulate a real one.  My
> guess is index tree depth has an effect on timing.  Using identical
> data may create trees that look more like linear lists (one legged 
> trees) but maybe not as they claim to use a "balanced quad tree"
> 
> keep data so you can plot processing time vs.
> size of database.  It would be good to know the shape of the
> curve. linear, log or whatever.  In fact, I think the shape
> is of more interest then the absolute value.
> 
> On another subject, if all you want is a large database why use
> such a slow method to populate it?  I would just start a
> COPY and then write random data.  This typically goes at
> 500 rows per second.  In fact you could run this at the same
> time as your current test.
> 
> One more question:
> What command line options do you use for Postgresql?  Specifically
> the -B and -F (buffers and fsync) options.  Actually the fact that
> you are CPU bound ssys you have these about right as CPU bound is
> where you want to be.
>