Split jsonb column into its own table

By Pavel Kácha on 2024-12-26T14:06:19

added C-Dev-Core Done 90% P4 S-New T-Feature labels

changed milestone to %Backlog

From telco brainstorming with Radko - upgrade of the db and code could be done with relatively short downtime (the longest will be CLUSTER, which can be delayed to off-hours).

create new empty table
enable API 'extend' code to be able to deal with jsonb in separate table, but resort to jsonb in the original table (to be able to cope with the transition period)
stop incoming data
run the last run of the reporter and statistician
alter table jsonb NULL (allow NULL data in the old table)
set up new storage code (to split into metadata table and jsonb data table)
enable incoming data (last column in metadata table now gets NULLs)
enable reporter and statistician
repeat: select old jsonb into new jsonb limit XXX; until done (limit is to prevent lock congestion)
alter table drop old jsonb; vacuum
CLUSTER when suitable
get rid of dual code

Back Relation 'relates' from: 4515

Back Relation 'blocks' from: 4230

Relation 'relates' to: 4253

Relation 'precedes' to: 4274

Relation 'blocks' to: 4276

Relation 'relates' to: 6054

By Pavel Kácha on 2018-08-21T11:51:14

Relation Relates - Bug 4253 Handling of too big events

By Pavel Kácha on 2018-08-21T11:51:43

Relation Blocks - Feature 4274 Minimize whole JSON IDEA events usage (jsonb column)

By Pavel Kácha on 2018-08-21T11:55:28

Remove relation 4274

By Pavel Kácha on 2018-08-21T11:56:59

Relation Precedes - Feature 4274 Minimize whole JSON IDEA events usage (jsonb column)

By Pavel Kácha on 2018-08-21T11:57:22

Statistics gathering was performed to better understand this. Following numbers are for mentat-hub on 20181024.
Current table size: 171GB
Number of rows: 125.3M (125 314 536)
Aggregate size of event BYTEA: 110GB (118 092 392 057) - obtained by SELECT sum(octet_length(event)) FROM events;
Aggregate indices size: 34.8GB (REM: No REINDEX was performed for a long time so the indices are inflated considerably)
ID index size: 11GB (this would be duplicated for the data table)

That leaves us with (171-110) + 34.8 + 11 = 106.8GB per 125.3M of events, effectively 0.85GB per 1M rows. With current physical memory and long term memory load, this would get us to at least 260M events fully cached in RAM. A considerable improvement.

By Radko Krkoš on 2018-10-24T20:07:56

Relation Relates - Bug 4515 Runaway query?

By Radko Krkoš on 2019-01-14T13:07:56

Relation Blocks - Feature 4276 Test database conversion time

By Radko Krkoš on 2019-01-17T16:53:30

Relation Blocked - Feature 4230 Make use of existing or implement own mechanism for handling SQL schema migrations.

By Pavel Kácha on 2019-01-18T13:01:35

changed milestone to %2.6

Hi Radko,

I would like to ask you to provide me with following, so I can proceed to implement this feature in Mentat system:

SQL code for creating new table.
Migration SQL function for transfering data from events.event column to new split table.

Thank you for your cooperation, I will handle the rest.

By Jan Mach on 2019-11-01T08:00:07

assigned to @radko_krkos and unassigned @Jan_Mach

By Jan Mach on 2024-12-26T14:15:54

added S-In Progress label and removed S-New label

Here you go:

CREATE TABLE IF NOT EXISTS events_idea(
	id text PRIMARY KEY REFERENCES events(id),
	event bytea
);

INSERT INTO events_idea(id, event) SELECT id, event FROM events;

ALTER TABLE events DROP COLUMN event;

VACUUM FREEZE;

CLUSTER events USING events_detecttime_idx;

The VACUUM and CLUSTER commands are very important as those are the points where the events table is compacted. They should be part of the automatic migration.

By Radko Krkoš on 2019-11-01T13:53:42

assigned to @Jan_Mach and unassigned @radko_krkos

By Radko Krkoš on 2024-12-26T14:15:54

I think this issue can be considered as resolved. In related issue #4274 there is a question of assessing the performance gains after this update.

By Jan Mach on 2019-11-12T08:32:37

assigned to @ph and unassigned @Jan_Mach

By Jan Mach on 2024-12-26T14:15:55

added Done 100% S-Resolved labels and removed Done 90% S-In Progress labels

Relation Relates - Task 6054 Explore the use of PostgreSQL views for easier event storing and querying

By Jan Mach on 2019-11-12T09:18:25

added S-Closed label and removed S-Resolved label

closed

marked this issue as related to #4230 (closed)

By Jan Mach on 2024-12-26T14:14:55

marked this issue as related to #4253 (closed)

By Pavel Kácha on 2024-12-26T14:15:34

Split jsonb column into its own table

Designs

Child items ...

Activity