Email store
Emails are core to DBMail, and make up the bulk of database storage.
Optimising storage of an email is a non trivial task, but the benefits are worth the additional complexity. This is the first of a two part article and focuses on storage of an email, the second companion piece is on caching for fast and efficient retrieval.
When DBMail receives an email, it's split up into its constituent parts and stored in the database. DBMail relies heavily on the gmime library to extract those parts for storage.
If SIEVE automation is enabled, then users can store automation scripts to filter and process incoming mail.
The following services receive emails:
An email user agent (MUA) such as Mozilla Thunderbird or K-9 Mail can also store emails using IMAP, for example when a draft email is saved or when an email is sent via SMTP the email is usually copied to a sent folder.
When DBMail is asked for a list of emails, an email, or one of its mimeparts, that information is sent to the MUA via one of the following services:
It's also possible to export an email into mbox format from a command terminal using dbmail-export.
Where is my email? You'll find dbmail_mimeparts contains almost everything, but see dbmail_header and the next section on caching on how best to find what you're looking for.
Database structure
Users have one or more folders and emails are associated with folders. As an email is deconstructed into its constituent parts for storage, the database reflects these aspects in its schema.
All emails are included in the dbmail_messages table. This table links to a user's mailbox (dbmail_mailboxes) and the actual message (dbmail_physmessage). The message is divided into parts (dbmail_mimeparts) and those parts are itemised (dbmail_partlists), linking back to the message (dbmail_physmessage).
Table.Column | Description |
---|---|
dbmail_mailboxes.mailbox_idnr | Primary key |
dbmail_mailboxes.owner_idnr | Owner of the mailbox |
dbmail_mailboxes.name | Mailbox name |
dbmail_mailboxes.*flag | Various flags |
dbmail_mailboxes.permission | Mailbox permission |
dbmail_mailboxes.seq | Sequence used for Quick Mailbox Resynchronization |
dbmail_messages.message_idnr | Primary key |
dbmail_messages.mailbox_idnr | Mailbox |
dbmail_messages.physmessage_id | Reference to the message |
dbmail_messages.*flag | Various flags |
dbmail_messages.unique_id | Internally generated unique id |
dbmail_messages.status | 0 (new), 1 (seen), 2 (deleted) or 3 (purge) |
dbmail_messages.seq | Sequence used for Quick Flag Changes Resynchronization |
dbmail_partlists.physmessage_id | Reference to the message |
dbmail_partlists.is_header | Flag to identify mail headers |
dbmail_partlists.part_key | Used to reconstruct the message |
dbmail_partlists.part_depth | Used to reconstruct the message |
dbmail_partlists.part_order | Used to reconstruct the message |
dbmail_partlists.part_id | Reference to the part in dbmail_mimeparts |
dbmail_mimeparts.id | Primary key |
dbmail_mimeparts.hash | Cryptographic hash |
dbmail_mimeparts.data | The contents of the part |
dbmail_mimeparts.size | Size of this part |
It's straightforward to reconstruct an email using one of the following queries to select the data in the right order:
PostgreSQL
SELECT list.part_key, list.part_depth, list.part_order, list.is_header, ph.internal_date, convert_from(parts.data, 'utf-8') AS data FROM dbmail_mimeparts AS parts INNER JOIN dbmail_partlists AS list ON parts.id = list.part_id INNER JOIN dbmail_physmessage ph ON ph.id = list.physmessage_id WHERE list.physmessage_id = ? ORDER BY list.part_key, list.part_order ASC, list.part_depth DESC
Sqlite, MySQL
SELECT list.part_key, list.part_depth, list.part_order, list.is_header, ph.internal_date, parts.data AS data FROM dbmail_mimeparts AS parts INNER JOIN dbmail_partlists AS list ON parts.id = list.part_id INNER JOIN dbmail_physmessage ph ON ph.id = list.physmessage_id WHERE list.physmessage_id = ? ORDER BY list.part_key, list.part_order ASC, list.part_depth DESC
Simple listing of a user's emails
Note that the dbmail status is similar but not the same as the flags.
SELECT m.message_idnr, m.seen_flag AS seen, m.answered_flag AS answered, m.deleted_flag AS deleted, m.flagged_flag AS flagged, m.recent_flag AS recent, m.draft_flag AS draft, CASE m.status WHEN '0' THEN 'New' WHEN '1' THEN 'Seen' WHEN '2' THEN 'Deleted' WHEN '3' THEN 'Purge' ELSE 'Unknown' END AS status, ( SELECT hv.headervalue FROM dbmail_header AS hd INNER JOIN dbmail_headervalue AS hv ON hd.headervalue_id = hv.id INNER JOIN dbmail_headername AS hnd ON hd.headername_id = hnd.id AND hnd.headername = 'date' WHERE hd.physmessage_id = m.physmessage_id ) AS email_date, ( SELECT hv.headervalue FROM dbmail_header AS hs INNER JOIN dbmail_headervalue AS hv ON hs.headervalue_id = hv.id INNER JOIN dbmail_headername AS hns ON hs.headername_id = hns.id AND hns.headername = 'subject' WHERE hs.physmessage_id = m.physmessage_id ) AS subject FROM dbmail_messages AS m INNER JOIN dbmail_mailboxes AS b ON m.mailbox_idnr = b.mailbox_idnr INNER JOIN dbmail_users AS u ON u.user_idnr = b.owner_idnr
WHERE u.userid = 'User''s login id' AND b.name = 'INBOX' LIMIT 500;
For the curious the query plan confirms it's fast:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=29.35..20376.16 rows=500 width=116) (actual time=2.981..16.763 rows=500 loops=1) Buffers: shared hit=8151 -> Nested Loop (cost=29.35..20945.87 rows=514 width=116) (actual time=2.980..16.694 rows=500 loops=1) Buffers: shared hit=8151 -> Nested Loop (cost=0.28..12.29 rows=1 width=8) (actual time=0.050..0.051 rows=1 loops=1) Buffers: shared hit=5 -> Seq Scan on dbmail_users u (cost=0.00..3.30 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1) Filter: ((userid)::text = 'alan'::text) Rows Removed by Filter: 22 Buffers: shared hit=2 -> Index Scan using dbmail_mailboxes_owner_name_idx on dbmail_mailboxes b (cost=0.28..8.29 rows=1 width=16) (actual time=0.021..0.021 rows=1 loops=1) Index Cond: ((owner_idnr = u.user_idnr) AND ((name)::text = 'INBOX'::text)) Buffers: shared hit=3 -> Bitmap Heap Scan on dbmail_messages m (cost=29.08..1206.74 rows=1750 width=38) (actual time=2.879..3.078 rows=500 loops=1) Recheck Cond: (mailbox_idnr = b.mailbox_idnr) Heap Blocks: exact=64 Buffers: shared hit=132 -> Bitmap Index Scan on dbmail_messages_7 (cost=0.00..28.64 rows=1750 width=0) (actual time=2.119..2.119 rows=20494 loops=1) Index Cond: (mailbox_idnr = b.mailbox_idnr) Buffers: shared hit=68 SubPlan 1 -> Nested Loop (cost=0.86..19.17 rows=1 width=162) (actual time=0.007..0.014 rows=1 loops=500) Buffers: shared hit=4007 -> Nested Loop (cost=0.43..10.72 rows=1 width=8) (actual time=0.004..0.010 rows=1 loops=500) Buffers: shared hit=2007 -> Seq Scan on dbmail_headername hnd (cost=0.00..2.26 rows=1 width=8) (actual time=0.001..0.007 rows=1 loops=500) Filter: ((headername)::text = 'date'::text) Rows Removed by Filter: 100 Buffers: shared hit=500 -> Index Only Scan using dbmail_header_pkey on dbmail_header hd (cost=0.43..8.45 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=500) Index Cond: ((physmessage_id = m.physmessage_id) AND (headername_id = hnd.id)) Heap Fetches: 6 Buffers: shared hit=1507 -> Index Scan using dbmail_headervalue_pkey on dbmail_headervalue hv (cost=0.43..8.45 rows=1 width=170) (actual time=0.002..0.002 rows=1 loops=500) Index Cond: (id = hd.headervalue_id) Buffers: shared hit=2000 SubPlan 2 -> Nested Loop (cost=0.86..19.17 rows=1 width=162) (actual time=0.006..0.012 rows=1 loops=500) Buffers: shared hit=4007 -> Nested Loop (cost=0.43..10.72 rows=1 width=8) (actual time=0.003..0.009 rows=1 loops=500) Buffers: shared hit=2007 -> Seq Scan on dbmail_headername hns (cost=0.00..2.26 rows=1 width=8) (actual time=0.002..0.007 rows=1 loops=500) Filter: ((headername)::text = 'subject'::text) Rows Removed by Filter: 100 Buffers: shared hit=500 -> Index Only Scan using dbmail_header_pkey on dbmail_header hs (cost=0.43..8.45 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=500) Index Cond: ((physmessage_id = m.physmessage_id) AND (headername_id = hns.id)) Heap Fetches: 6 Buffers: shared hit=1507 -> Index Scan using dbmail_headervalue_pkey on dbmail_headervalue hv_1 (cost=0.43..8.45 rows=1 width=170) (actual time=0.002..0.002 rows=1 loops=500) Index Cond: (id = hs.headervalue_id) Buffers: shared hit=2000 Planning: Buffers: shared hit=112 Planning Time: 2.323 ms Execution Time: 16.911 ms (56 rows)