Compressing dbmail_headervalue table

Author: Michael Monnerie

Currently, dbmail_headervalue stores a value to each header. And even if the value already exists, it is stored again for the next mail. For example, on a system with dbmail_messages count = 233936, I have dbmail_headervalue count = 8530030, that's 8.5 million headervalues for just 234k mails! Imagine how the table grows with more headers…

SQL schema changes

We currently have physmessage → headervalue → headername

If we split it up, we can do physmessage → headernamevalue → headername+headervalue

PostgreSQL 8.3

CREATE SEQUENCE dbmail_headervalue_id_seq;
CREATE TABLE dbmail_headervalue (
        value_id INT8 NOT NULL DEFAULT nextval('dbmail_headervalue_id_seq'),
        value   TEXT NOT NULL DEFAULT '',
        PRIMARY KEY (value_id)
);
CREATE UNIQUE INDEX dbmail_headervalue_1 ON dbmail_headervalue(value);

CREATE SEQUENCE dbmail_headername_id_seq;
CREATE TABLE dbmail_headername (
        name_id  INT8 NOT NULL DEFAULT nextval('dbmail_headername_id_seq'),
        name    VARCHAR(100) NOT NULL DEFAULT 'BROKEN_HEADER',
        PRIMARY KEY (name_id)
);
CREATE UNIQUE INDEX dbmail_headername_1 on dbmail_headername(lower(name));

CREATE TABLE dbmail_headernamevalue (
        physmessage_id      INT8 NOT NULL
              REFERENCES dbmail_physmessage(id)
              ON UPDATE CASCADE ON DELETE CASCADE,
        name_id  INT8 NOT NULL
              REFERENCES dbmail_headername(name_id)
              ON UPDATE CASCADE ON DELETE RESTRICT,
        value_id      INT8 NOT NULL
              REFERENCES dbmail_headervalue(value_id)
              ON UPDATE CASCADE ON DELETE RESTRICT,
        PRIMARY KEY (physmessage_id,name_id,value_id)
);

What to take care of

With that construct, a value or name only needs to be inserted once, and only it's reference is stored into headernamevalue. Shouldn't be a lot of work, but would save a lot of storage, and improve search speed on headers tremendously.

Currently there is substring(dbmail_headervalue(value),0,255) on the dbmail_headername_1 index, at least for PostgreSQL 8.3 that could be increased to 8191. We'd have to look into the other supported RDBMs, I'll do that if somebody steps up to make the C part.

C Coder needed

That shouldn't be a lot of change to the C code, so if you are able to code in C, please contact the dbmail mailing list on https://mailman.fastxs.net/mailman/listinfo/dbmail

 
compress_headervalue_table.txt · Last modified: 2012/02/27 21:51 by bas
 
DBMail is developed by Paul J Stevens together with developers world-wide