Radko, could you please search up some backing (like what you said you've already read), like what might be good initial guess for bunch sizes, or what should be taken into consideration?
I have been searching for the resources, [1] is the best document about the topic that I remember. Unfortunately it only deals with loading the data into an empty table with no indices. That is not the case for us. There was a measurement indicating that with indices, the advantage of multiple-value INSERTs is diminishing, not so for multiple-INSERT transactions. I cannot find it.
Then there is [2], showing (Images 3, 4) that for tables with many columns (such as ours), the multiple-value INSERTs are less efficient. It goes into some details of this, but basically it boils down to the size of cleartext representation of the query, which can be large in our case (such as the LaBrea events).
To sum it up, multiple INSERTs per transaction will help, prepared statements might also (although this will mostly lower the CPU load, not disk IO). Multiple-row INSERTs might not be as efficient in our case. Should we make measurements to confirm this?
However, second source is for MS SQL Server, isn't it?
How about COPY data FROM STDIN WITH BINARY? poc here
Programatically it would be similar concept, except instead of inserts, binary data would get appended to ByteIO and when counters expire, written into db in one COPY statement. However, depends on complexity of data conversion. Maybe simpler and still feasible could be COPY with CSV/TSV data.
At least crude test would be cool, to know we're not chasing ghosts, how much work would it take?
However, second source is for MS SQL Server, isn't it?
Yeah, obviously it is. Sorry, I made a quick search and thought to have found the sources I used before (quite some time ago). I will search for the relevant sources, pity I cannot find my notes on this topic.
How about COPY data FROM STDIN WITH BINARY? poc here
Programatically it would be similar concept, except instead of inserts, binary data would get appended to ByteIO and when counters expire, written into db in one COPY statement. However, depends on complexity of data conversion. Maybe simpler and still feasible could be COPY with CSV/TSV data.
I do not like the binary mode, our source data is in text format, not in binary as in the poc case. Also, I am not sure COPY is actually that much less load if indices are present. I will research this more.
At least crude test would be cool, to know we're not chasing ghosts, how much work would it take?
Not that much. And it can be generally done in parallel with some other, more involved, work.
Taking this over for testing. A test script is in the mentat-dbbench repo, preliminary results are interesting (grouping multiple events for one INSERT is more efficient even with indices and our table structure but the difference to grouping INSERTs in transaction is not big and as that is much easier to implement, I would choose it). Detailed results soon.
The tests were run at mentat-dev, here are the numbers: (the tests were done for metadata-only table - no event column)
Inserting 40000 rows in 40000 transactions, 1 INSERTs per transaction, 1 rows per INSERT, with indices > Took 231.945 s, 223.771 s, 235.124 s > WALwriter: 2.05 M > IO-wait: 6-7%, 200-250 writes/s, Utilization: 80% > User: 1-2%
This is modeling the current approach - all INSERTs are done in it's own transaction. The tests were done on 40k events as that is roughly the amassed amount during overload. Insertion like this can load the disk subsystem considerably and is heavily IO bound.
Inserting 40000 rows in 2000 transactions, 20 INSERTs per transaction, 1 rows per INSERT, with indices > Took 29.112 s, 28.976 s > WALwriter: 2.06 M, 2032.00 K > IO-wait: 2-3%, 70-80 writes/s, Utilization: 25-30% > User: 5-6%
First approach is to perform multiple INSERTs in one transaction. With 20, about the number received per second, we can see great reduction in both run time and disk load. Much less time is spent waiting for disk operations to finish.
Inserting 40000 rows in 1000 transactions, 40 INSERTs per transaction, 1 rows per INSERT, with indices > Took 24.888 s, 24.736 s > WALwriter: 2.88 M, 2.14 M > IO-wait: 2%, 40-50 writes/s, Utilization: 20% > User: 6%
With the increase to 2 seconds of data in each transaction, there is still some improvement, albeit just a marginal.
Inserting 40000 rows in 2000 transactions, 1 INSERTs per transaction, 20 rows per INSERT, with indices > Took 24.335 s, 24.145 s > WALwriter: 1176.00 K, 1200.00 K > IO-wait: 3-4%, 80-90 writes/s, Utilization: 35-40% > User: 4-5%
Another approach is to bundle multiple events into one INSERT statement. This allows for lower write-ahead log (sequential writes only) overhead as the amount of modification commands is lower but does not offer the disk-load reduction of the first approach. The process is still somewhat IO-bound at this level.
Inserting 40000 rows in 1000 transactions, 1 INSERTs per transaction, 40 rows per INSERT, with indices > Took 17.627 s, 17.297 s > WALwriter: 1216.00 K > IO-wait: 2-3%, 60 writes/s, Utilization: 25% > User: 6%
With the increase to 2 seconds of data, the disk is no longer the limiting factor, the disk-load is still somewhat higher than with the first approach.
Conclusion:
Performing bulk INSERTs is definitely a good idea, they offer much better performance. As for the two alternative ways, they look roughly the same with multiple INSERTs per transaction allowing for lower disk load and multiple rows per INSERT leading to better total run time. The difference is small but measurable, the biggest argument should be the ease of implementation. For the problem we are solving, bundling multiple INSERTs into one transaction seems to work somewhat better.
Remarks:
A combination of the two methods is also possible, the improvements are in line with the provided data. The implementation complication is not worth the effort in my opinion.
Another way of implementing data import in PostgreSQL is using COPY FROM. According to [1] (the whole thread is a good read; although older - 2005, no indication that this has since changes was found), this is not very different from bundling all the data into one transaction, so similar to the first approach. The differences are in harder error handling, skipping the parser (not that useful for IO-bound cases) and on-wire protocol (related to latency - not an issue on localhost).
Conclusion from meeting: Write performance during peaks might go down one order of magnitude, so let's implement.
Let's go for easier variant: txn start, N * execute insert, txn commit/rollback
Reasonable would be to have two limits - max event inserts per txn, max time per txn; close txn in whatever comes first. That should be doable by means of event driven daemon architecture,
Limits should be configurable, one second and some corresponding pretty high number of inserts (50? 100?)
This provides the same error reporting and recovery capabilities as the currently deployed approach, just the COMMITs will be batched, allowing for benefits mentioned in detail above.
The WHENEVER approach is an overkill, it would complicate the python code needlessly.
Why so? My understanding was that WHENEVER could fire stored procedure, which could swallow the error and note it somewhere, unaffecting other writes. Python code could then just pick up noted info after the transaction, log it and move affected events from tmp to error dir. Am I misguided?
The DB department suggests using the following strategy:
[...]
This provides the same error reporting and recovery capabilities as the currently deployed approach, just the COMMITs will be batched, allowing for benefits mentioned in detail above.