Email store

Emails are core to DBMail, and make up the bulk of database storage.

Optimising storage of an email is a non trivial task, but the benefits are worth the additional complexity. This is the first of a two part article and focuses on storage of an email, the second companion piece is on caching for fast and efficient retrieval.

When DBMail receives an email, it's split up into its constituent parts and stored in the database. DBMail relies heavily on the gmime library to extract those parts for storage.

If SIEVE automation is enabled, then users can store automation scripts to filter and process incoming mail.

The following services receive emails:

An email user agent (MUA) such as Mozilla Thunderbird or K-9 Mail can also store emails using IMAP, for example when a draft email is saved or when an email is sent via SMTP the email is usually copied to a sent folder.

When DBMail is asked for a list of emails, an email, or one of its mimeparts, that information is sent to the MUA via one of the following services:

It's also possible to export an email into mbox format from a command terminal using dbmail-export.

Where is my email? You'll find dbmail_mimeparts contains almost everything, but see dbmail_header and the next section on caching on how best to find what you're looking for.

Database structure

DBMail Message ER diagram

Users have one or more folders and emails are associated with folders. As an email is deconstructed into its constituent parts for storage, the database reflects these aspects in its schema.

All emails are included in the dbmail_messages table. This table links to a user's mailbox (dbmail_mailboxes) and the actual message (dbmail_physmessage). The message is divided into parts (dbmail_mimeparts) and those parts are itemised (dbmail_partlists), linking back to the message (dbmail_physmessage).

Table.Column Description
dbmail_mailboxes.mailbox_idnr Primary key
dbmail_mailboxes.owner_idnr Owner of the mailbox
dbmail_mailboxes.name Mailbox name
dbmail_mailboxes.*flag Various flags
dbmail_mailboxes.permission Mailbox permission
dbmail_mailboxes.seq Sequence used for Quick Mailbox Resynchronization
   
dbmail_messages.message_idnr Primary key
dbmail_messages.mailbox_idnr Mailbox
dbmail_messages.physmessage_id Reference to the message
dbmail_messages.*flag Various flags
dbmail_messages.unique_id Internally generated unique id
dbmail_messages.status 0 (new), 1 (seen), 2 (deleted) or 3 (purge)
dbmail_messages.seq Sequence used for Quick Flag Changes Resynchronization
   
dbmail_partlists.physmessage_id Reference to the message
dbmail_partlists.is_header Flag to identify mail headers
dbmail_partlists.part_key Used to reconstruct the message
dbmail_partlists.part_depth Used to reconstruct the message
dbmail_partlists.part_order Used to reconstruct the message
dbmail_partlists.part_id Reference to the part in dbmail_mimeparts
   
dbmail_mimeparts.id Primary key
dbmail_mimeparts.hash Cryptographic hash
dbmail_mimeparts.data The contents of the part
dbmail_mimeparts.size Size of this part

It's straightforward to reconstruct an email using one of the following queries to select the data in the right order:

PostgreSQL

SELECT
  list.part_key,
  list.part_depth,
  list.part_order,
  list.is_header,
  ph.internal_date,
  convert_from(parts.data, 'utf-8') AS data
FROM dbmail_mimeparts AS parts
INNER JOIN dbmail_partlists AS list
ON parts.id = list.part_id
INNER JOIN dbmail_physmessage ph
ON ph.id = list.physmessage_id
WHERE list.physmessage_id = ?
ORDER BY list.part_key, list.part_order ASC, list.part_depth DESC

Sqlite, MySQL

SELECT
  list.part_key,
  list.part_depth,
  list.part_order,
  list.is_header,
  ph.internal_date,
  parts.data AS data
FROM dbmail_mimeparts AS parts
INNER JOIN dbmail_partlists AS list
ON parts.id = list.part_id
INNER JOIN dbmail_physmessage ph
ON ph.id = list.physmessage_id
WHERE list.physmessage_id = ?
ORDER BY list.part_key, list.part_order ASC, list.part_depth DESC

Simple listing of a user's emails

Note that the dbmail status is similar but not the same as the flags.

SELECT
	m.message_idnr,
	m.seen_flag AS seen,
	m.answered_flag AS answered,
	m.deleted_flag AS deleted,
	m.flagged_flag AS flagged,
	m.recent_flag AS recent,
	m.draft_flag AS draft,
	CASE m.status
		WHEN '0' THEN 'New'
		WHEN '1' THEN 'Seen'
		WHEN '2' THEN 'Deleted'
		WHEN '3' THEN 'Purge'
		ELSE 'Unknown'
	END AS status,
	(
		SELECT hv.headervalue
		FROM dbmail_header AS hd

		INNER JOIN dbmail_headervalue AS hv
		ON hd.headervalue_id = hv.id

		INNER JOIN dbmail_headername AS hnd
		ON hd.headername_id = hnd.id
		AND hnd.headername = 'date'

		WHERE hd.physmessage_id = m.physmessage_id
	) AS email_date,
	(
		SELECT hv.headervalue
		FROM dbmail_header AS hs

		INNER JOIN dbmail_headervalue AS hv
		ON hs.headervalue_id = hv.id

		INNER JOIN dbmail_headername AS hns
		ON hs.headername_id = hns.id
		AND hns.headername = 'subject'

		WHERE hs.physmessage_id = m.physmessage_id
	) AS subject

FROM dbmail_messages AS m
INNER JOIN dbmail_mailboxes AS b
ON m.mailbox_idnr = b.mailbox_idnr

INNER JOIN dbmail_users AS u
ON u.user_idnr = b.owner_idnr
WHERE u.userid = 'User''s login id' AND b.name = 'INBOX' LIMIT 500;

For the curious the query plan confirms it's fast:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=29.35..20376.16 rows=500 width=116) (actual time=2.981..16.763 rows=500 loops=1)
   Buffers: shared hit=8151
   ->  Nested Loop  (cost=29.35..20945.87 rows=514 width=116) (actual time=2.980..16.694 rows=500 loops=1)
         Buffers: shared hit=8151
         ->  Nested Loop  (cost=0.28..12.29 rows=1 width=8) (actual time=0.050..0.051 rows=1 loops=1)
               Buffers: shared hit=5
               ->  Seq Scan on dbmail_users u  (cost=0.00..3.30 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1)
                     Filter: ((userid)::text = 'alan'::text)
                     Rows Removed by Filter: 22
                     Buffers: shared hit=2
               ->  Index Scan using dbmail_mailboxes_owner_name_idx on dbmail_mailboxes b  (cost=0.28..8.29 rows=1 width=16) (actual time=0.021..0.021 rows=1 loops=1)
                     Index Cond: ((owner_idnr = u.user_idnr) AND ((name)::text = 'INBOX'::text))
                     Buffers: shared hit=3
         ->  Bitmap Heap Scan on dbmail_messages m  (cost=29.08..1206.74 rows=1750 width=38) (actual time=2.879..3.078 rows=500 loops=1)
               Recheck Cond: (mailbox_idnr = b.mailbox_idnr)
               Heap Blocks: exact=64
               Buffers: shared hit=132
               ->  Bitmap Index Scan on dbmail_messages_7  (cost=0.00..28.64 rows=1750 width=0) (actual time=2.119..2.119 rows=20494 loops=1)
                     Index Cond: (mailbox_idnr = b.mailbox_idnr)
                     Buffers: shared hit=68
         SubPlan 1
           ->  Nested Loop  (cost=0.86..19.17 rows=1 width=162) (actual time=0.007..0.014 rows=1 loops=500)
                 Buffers: shared hit=4007
                 ->  Nested Loop  (cost=0.43..10.72 rows=1 width=8) (actual time=0.004..0.010 rows=1 loops=500)
                       Buffers: shared hit=2007
                       ->  Seq Scan on dbmail_headername hnd  (cost=0.00..2.26 rows=1 width=8) (actual time=0.001..0.007 rows=1 loops=500)
                             Filter: ((headername)::text = 'date'::text)
                             Rows Removed by Filter: 100
                             Buffers: shared hit=500
                       ->  Index Only Scan using dbmail_header_pkey on dbmail_header hd  (cost=0.43..8.45 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=500)
                             Index Cond: ((physmessage_id = m.physmessage_id) AND (headername_id = hnd.id))
                             Heap Fetches: 6
                             Buffers: shared hit=1507
                 ->  Index Scan using dbmail_headervalue_pkey on dbmail_headervalue hv  (cost=0.43..8.45 rows=1 width=170) (actual time=0.002..0.002 rows=1 loops=500)
                       Index Cond: (id = hd.headervalue_id)
                       Buffers: shared hit=2000
         SubPlan 2
           ->  Nested Loop  (cost=0.86..19.17 rows=1 width=162) (actual time=0.006..0.012 rows=1 loops=500)
                 Buffers: shared hit=4007
                 ->  Nested Loop  (cost=0.43..10.72 rows=1 width=8) (actual time=0.003..0.009 rows=1 loops=500)
                       Buffers: shared hit=2007
                       ->  Seq Scan on dbmail_headername hns  (cost=0.00..2.26 rows=1 width=8) (actual time=0.002..0.007 rows=1 loops=500)
                             Filter: ((headername)::text = 'subject'::text)
                             Rows Removed by Filter: 100
                             Buffers: shared hit=500
                       ->  Index Only Scan using dbmail_header_pkey on dbmail_header hs  (cost=0.43..8.45 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=500)
                             Index Cond: ((physmessage_id = m.physmessage_id) AND (headername_id = hns.id))
                             Heap Fetches: 6
                             Buffers: shared hit=1507
                 ->  Index Scan using dbmail_headervalue_pkey on dbmail_headervalue hv_1  (cost=0.43..8.45 rows=1 width=170) (actual time=0.002..0.002 rows=1 loops=500)
                       Index Cond: (id = hs.headervalue_id)
                       Buffers: shared hit=2000
 Planning:
   Buffers: shared hit=112
 Planning Time: 2.323 ms
 Execution Time: 16.911 ms
(56 rows)