[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Lots of data
Yo,
I'm keeping the timings from importing every frame (in records/second), and
the operations/second on the merge stage (currently in sets of 5000
operations), along with what types of merges occurred (new star, updating
current one, not enough to create a new one), as the types of operations
dramatically affect the processing speed.
You are right on the COPY. I thought about using COPY, but modifying the
current code was quicker. I'll get a COPY version this weekend, along with
the random shifting. I'll shift the ra from 1 to 355 (no wrapping to worry
about), and the dec from -14 to +14 (or should dec be larger?). I'm using
INSERT now 'cause I was thinking eventually of a multi user dB, and didn't
want to try to manage the SERIAL field I have in the observation table.
I know fsync is off, and the buffers are set to some very large number, but
I don't remember off the top of my head. All told, I think I've allocated
nearly 500Mb to PostgreSQL, but I'll have to check.
Bye,
Rob
> -----Original Message-----
> From: Chris Albertson [mailto:chrisalbertson90278@yahoo.com]
>
> I don't know the details of Postgresq's quadtree index algorithm
> but with identical data I suspect a kind of unnatural tree will
> be built. I like your idea of adding an offset to the whole
> frame I'd mix small (arc seconds) offsets with large (degrees)
> offsets. This way your database will simulate a real one. My
> guess is index tree depth has an effect on timing. Using identical
> data may create trees that look more like linear lists (one legged
> trees) but maybe not as they claim to use a "balanced quad tree"
>
> keep data so you can plot processing time vs.
> size of database. It would be good to know the shape of the
> curve. linear, log or whatever. In fact, I think the shape
> is of more interest then the absolute value.
>
> On another subject, if all you want is a large database why use
> such a slow method to populate it? I would just start a
> COPY and then write random data. This typically goes at
> 500 rows per second. In fact you could run this at the same
> time as your current test.
>
> One more question:
> What command line options do you use for Postgresql? Specifically
> the -B and -F (buffers and fsync) options. Actually the fact that
> you are CPU bound ssys you have these about right as CPU bound is
> where you want to be.
>