Enhancing performance with dbmail 2.2 and PostgreSQL

Author: Michael Monnerie

This a collection of performance enhancements we did on dbmail 2.2 using PostgreSQL 8.1.

Cleanup indices

Currently there are 2 indices dbmail_messageblks_physmessage_idx and dbmail_messageblks_physmessage_is_header_idx which are almost identical, so we can remove one. This saves accesses on every insert/update/delete on that table. It has no penalties, as the dbmail_messageblks_physmessage_is_header_idx has the same information. Also, on dbmail_headervalue there is one (very big) index that is actually never used, and can be remove to gain lots of disk space.

DROP INDEX

DROP INDEX dbmail_messageblks_physmessage_idx ;
DROP INDEX dbmail_headervalue_3 ;

After this, you might get this log entry in PostgreSQL:

query failed [DELETE FROM dbmail_physmessage WHERE id = 1676958] : 
[ERROR:  could not open relation with OID 102021 CONTEXT:  SQL statement 
"DELETE FROM ONLY "public"."dbmail_messageblks" WHERE "physmessage_id" = $1" ]

But I got the tip from the pgsql-admin mailing list, to just restart all db connections (aka restart postgresql *and* dbmail), and this worked. No problem arises from deleting these indices.

CREATE INDEX for speedup

CREATE INDEX zmi_messages_clusterindex ON dbmail_messages (mailbox_idnr, message_idnr);

This dramatically speeds up new message delivery.

Optimizing tables with CLUSTER

This is a huge benefit for the overall performance of dbmail. It tell PostgreSQL to resort all data in tables by a given index. If then dbmail accesses that table in that order, and need 5 consecutive entries from a table, PostgreSQL can transfer them all with a single physical disk read, as the OS and RAID controller offer read-ahead. On a server with 50+ users, you can even “feel” the performance gain. Everything is much more smooth.

CLUSTER commands for PostgreSQL 8.1, part 1

/* for the first statement, you must have created the index as described above */
CLUSTER zmi_messages_clusterindex ON dbmail_messages ;

CLUSTER dbmail_mailboxes_owner_name_idx ON dbmail_mailboxes ;
CLUSTER dbmail_headername_1 ON dbmail_headername ;
CLUSTER dbmail_physmessage_pkey ON dbmail_physmessage ;
CLUSTER dbmail_aliases_domain_alias ON dbmail_aliases ;
CLUSTER dbmail_users_domain ON dbmail_users ;
CLUSTER dbmail_subscription_pkey ON dbmail_subscription ;
CLUSTER dbmail_acl_pkey ON dbmail_acl ;
CLUSTER dbmail_datefield_2 ON dbmail_datefield ;
# Attention - this could take long, depending on your amount of headervalues:
CLUSTER dbmail_headervalue_1 ON dbmail_headervalue ;

CLUSTER: most performance from table dbmail_messageblks (part 2)

dbmail_messageblks is the one where all messages are stored. If you make PostgreSQL store messages in the order they are read, it makes the best out of this. But by default, there's no index that we could use for the CLUSTER command, so we have to create one:

# Attention - these two take rather long! Make it during no accesses, as it blocks the table!
CREATE UNIQUE INDEX zmi_messageblks ON dbmail_messageblks (physmessage_id, messageblk_idnr);
CLUSTER zmi_messageblks ON dbmail_messageblks ;

CLUSTER commands for PostgreSQL 8.3

All versions of PostgreSQL up to 8.2 used CLUSTER in this form:

CLUSTER indexname ON tablename;

While from 8.3, the format changed to

CLUSTER tablename [ USING indexname ];

So you need to change the formatting a little. It's just the parameter order that changed, the command still does the same.

/* for the first statement, you must have created the index as described above */
CLUSTER dbmail_messages USING zmi_messages_clusterindex;

CLUSTER dbmail_mailboxes USING dbmail_mailboxes_owner_name_idx;
CLUSTER dbmail_headername USING dbmail_headername_1 ;
CLUSTER dbmail_physmessage USING dbmail_physmessage_pkey ;
CLUSTER dbmail_aliases USING dbmail_aliases_domain_alias ;
CLUSTER dbmail_users USING dbmail_users_domain ;
CLUSTER dbmail_subscription USING dbmail_subscription_pkey ;
CLUSTER dbmail_acl USING dbmail_acl_pkey ;
CLUSTER dbmail_datefield USING dbmail_datefield_2 ;
# Attention - this could take long, depending on your amount of headervalues:
CLUSTER dbmail_headervalue USING dbmail_headervalue_1 ;
# Attention - this two take rather long! Make it during no accesses, as it blocks the table!
CREATE UNIQUE INDEX zmi_messageblks ON dbmail_messageblks (physmessage_id, messageblk_idnr);
CLUSTER dbmail_messageblks USING zmi_messageblks;

Optimizing backup time

We've always had a backup time of 5 hours, which didn't disturb as it was running at 1-6 in the morning. But then we got a customer needing 24/7 access, and they felt the performance drop, as they are heavy access users.

VACUUM

So we investigated, and found we used this in postgresql.conf:

vacuum_cost_delay = 250

This should help minimize the performance impact the VACUUM command has on the db, which we ran every night before the backup. But the problem is that the table is locked, and on the biggest table dbmail_messageblks it takes a very long lock then, which locks many other queries as well, as most use that table. The solution was to make VACUUM run at high speed:

vacuum_cost_delay = 0

Like this, our backup now lasts only 20-30 minutes, from which 5-6 minutes are needed for vacuum. But we've also implemented other changes, see below.

AUTOVACUUM

This very nice feature makes PostgreSQL optimize tables in the background during the day. Errorneously, we've had if off, as there are many places where people recommend this. But we've talked with the PostgreSQL devs, and they strongly suggest using it, as it has many andvantages. So now we use:

 autovacuum = on            # enable autovacuum subprocess?
 autovacuum_naptime = 1200  # time between autovacuum runs, in secs

This makes PostgreSQL autovacuum run every 20 minutes. As we have two other databases (template1, postgres) on that server, it means that every hour the dbmail database is optimized.

CLUSTER

Since we implemented the CLUSTER commands from above, we also want to have their advantage. Once a week, before backup, we run this simple command now:

CLUSTER

This tells PostgreSQL to optimize all tables where a CLUSTER was defined before. It takes about 15 minutes to run per 5GB e-mails.

dbmail-util

Of course, you must not forget to use dbmail-util, as this is the command that drops old mails. We run this daily:

dbmail-util -ubdsy

and once a week before the CLUSTER, but after the backup commands:

dbmail-util -tpubdsy ; dbmail-util -py

Drop messages from deleted mailbox immediately

Originally, dbmail doesn't delete messages from a mailbox when that mailbox gets deleted. Instead, you must run “dbmail-util -ay” to cleanup old messages. That can be a lot of occupied space, especially if you got many users. In PostgreSQL, it's simple to create a RULE to do the magic automatically:

CREATE RULE drop_messages_with_mailbox AS ON DELETE TO dbmail_messages DO
  DELETE FROM dbmail_physmessage WHERE id = OLD.physmessage_id AND id NOT IN (SELECT physmessage_id from dbmail_messages);

Clean up headervalues

dbmail_headervalues grows rapidly and quickly. You can remove entries which will not be IMAP SEARCHed from that table. This is a tip from Paul J Stevens on http://blog.dbmail.eu/:

CREATE VIEW header_count AS 
  SELECT count(1) AS count, n.id, n.headername 
  FROM dbmail_headervalue v 
  LEFT JOIN dbmail_headername n ON v.headername_id=n.id 
  GROUP BY n.id, n.headername;

After that you can do:

SELECT * FROM header_count ORDER BY count; 

and delete all headernames from dbmail_headername for those headers you deem unlikely to ever be used in IMAP search.

This is a cleanup for more than half of our headervalue records:

DELETE FROM dbmail_headername WHERE headername = 'received' 
or headername = 'x-virus-scanned' or headername like 'x-spam%';
 
performance.txt · Last modified: 2012/09/07 11:55 by zmi
 
DBMail is developed by Paul J Stevens together with developers world-wide