I'm going to start this page by just pasting in three emails between Geo Carncross, Kevin Baker and Aaron Stone. These three seem to hash out the core of the mechanism. We should clean up this page and make it into an implementation document.
On Tue, 2005-05-31 at 17:15 +0000, Aaron Stone wrote:
I've added a page to the wiki with your last three emails. I think we're
at the point where we should hammer out a plan for the implementation.
There's just one piece left to properly refute, and I think that we need
an actual analysis (Geo, the one you provided regarding UIDVALIDITY,
inodes and file offsets would be ideal)
I'm not completely sure what you want here.
`` Originally, UW/IMAP could make an optimization of making the UID the binary offset within the mbox file and UIDVALIDITY the inode number. Even if it were never implemented as such because, it can be represented as such, it's going to be the same kind of problem.
That means any solution to one problem will work for the other.
So, think of it this way: Can you build a file system that distributes across multiple machines, and IF A MACHINE IS DOWN still guarantee that they increase the length of the file at the same time? ''
Is this what you were looking for?
Are you looking for a rigorous proof that the problems are identical? Or that they are generally unsolvable (without some kind of arbiter)?
Because on that page it is currently stated:
Contrary to earlier thoughts this would be consistent with the RFC. This
is due to the fact that the RFC does not require ´increment-only¡ idÿs as
defined in the current DBMail implemenation. The unique_idÿs detailed in
the below ´RFC Overview¡ can be guidÿs, they must be unique and ascending,
but do not have to be sequential.
I don't think I said sequential, but I did say “strictly ascending”. Note even in my similar-problem example, I point out that UID numbers could be file offsets in an mbox file- this doesn't suggest that they be sequential, does it?
In any event, since the UID numbers will be generated from a token ring, they might as well be sequential.
That said, dbmail has it's OWN NEED to maintain a certain amount of referential integrity and I think it's perfectly sane and wise to use GUIDs for that.
However. Users need to see UIDs, which means at some level, a map is required. One might as well use UID numbers as that reference, but if tokens are going to be lost frequently, this would mean more overhead.
On Mon, 2005-05-30 at 20:11 +0000, Aaron Stone wrote:
DBMail 2.0.x currently inserts a message with ownership set to an internal
user, then performs the delivery resolution and uid generation, and copies
the message over to any user scheduled to receive it.
In a clustered setting, we add one more part to the behind-the-scenes
phase: getting a token from the cluster. Once the machine has the token,
it can generate uid's for all of the messages it has queued, pass off the
token to the next machine, and then deliver all of messages.
Kind of, yeah.
Each machine would thus queue incoming messages which were otherwise ready
for delivery, wait for the token, then deliver them.
Sort-of. I don't suggest dbmail should implement the queue itself, but write it to the DB with an activation state field:
Database replication would be handled by the database, and would be
lockless multimaster because we have already done the work of guaranteeing
that uid's won't conflict.
As you mention, however, the APPEND command would also have to be
rewritten to wait for a token. Is that a problem from a user interactivity
perspective? I figure it might sometimes take more than a few seconds to
receive a token. Will this cause a problem if the OK response is delayed
until a token is received?
The client can get an OK- the message just won't be visible yet. RFC2060 doesn't say that a message APPENDed has to be visible to a resulting SEARCH/FETCH immediately :)
However I suspect clients are foolish here, so it might be wise to take an APPEND- set a flag, and if a SEARCH occurs on that mailbox with any state<2 (when APPEND was done recently!) that it block while state <2 in the box.
Also, for the token algorithm, is there a mechanism to ask for the token?
In order to prevent token-storming, each machine holds the token until it
either 1) processes a queue of messages or 2) waits for a timeout. If you
don't have a timeout, then you have tokens flying about eating up all
bandwidth and doing nothing. But if there's a machine on the other side of
the cluster that has a messages, but the token is several seconds away on
the other side of the cluster, perhaps it should be requested for with a
packet that flies full speed backwards, telling each machine that it
should pass the token forwards ASAP. Right?
That's why the token-passing algorithm has a regenerate-token operation built-in. You can simply attempt to regenerate the token every time you “want” the token. If a machine still has the token, it'll stop it :)
Now that timeout should only be a couple (max: 5) seconds.
On Mon, 2005-05-30 at 13:15 -0700, Kevin Baker wrote:
It might be good to make the machine with the token hold
onto it until another machine requests it. This way each
machine on the cluster will broadcast token requests when
it has messages queued for delivery. The machine with the
token can just hand off first come first server.. or give
it to the machine that has been without the token longest
to balance the token load a bit. This should prevent the
token from zippin around in circles for no reason. Token
What happens if the machine with the token dies?
No, the token pass acts as heartbeat and the global lock at the same time. Adding messages to ask for a token is unnecessary- if a machine thinks it's been long enough, it doesn't need to coordinate anything- it just attempts to regenerate the token. If a machine has the token legitimately, it can stop the regeneration- if a machine is dead, the regeneration will come back.
Broadcasts might be possible except for the requirement that nodes be possibly very distant from each-other. Besides, broadcasts don't guarantee delivery either.
Under the same delivery concept it might be good to have a
time to wait threshold for the token so that no one
machine can hold onto it for to long.
No. Implementation to defeat cheaters is not possible with a global lock. Adding a method to take away the token from a machine without the regeneration step is really bad. It's bad for the reason that pthread_pause() doesn't exist.
want to clarify here:
1. write the message - state=0
at state==0, a message doesn't have a valid UID.
2. wait for token - generate uid, state → 1
at state==1, a message might have a valid UID, but it isn't proven to be valid- that is, it's still not allowed to be seen by clients.
3. wait for token to prove validity, state → 2 (now visible)
This way, the token pass always writes the message with a junk UID and state=0,
then when it has the token, it moves to state=1 when it sees that same token the second time, it moves to state=2
You can encode the tokens seen in the database so you just say:
UPDATE db_tokens SET state=2 WHERE state=1 AND token=? AND host=? UPDATE db_tokens SET uid=?, token=?, state=1 WHERE state < 2 AND host=?
whenever you see a token.
I might not have specified adequately what the token actually contains:
the junk-string (#1) is generated with the token and remains the same as long as the token is valid. It's what my example db_tokens table refers to as “token”
UID number is the 31-bit unsigned IMAP UID number.
host is the local hosts identifier. Use whatever you like- just so long as two hosts will NEVER use the same value.
I hope this gives everyone a good idea on what exactly is going on.