Luke T. Shumaker » blog » message-threading

Notes on email message threading

I sent an email to Jamie Zawinski with feedback on his venerable email threading algorithm. Perhaps my commentary will be a useful reference to others implementing email threading.

You can see my implementation of his algorithm at https://git.lukeshu.com/www/tree/cmd/generate/mailstuff/thread_alg.go (and a use of it at https://git.lukeshu.com/www/tree/cmd/generate/mailstuff/thread.go).

To: Jamie Zawinski <jwz@jwz.org>
Subject: message threading
Date: Sat, 08 Jun 2024 22:34:41 -0600 Message-ID: <87tti2ybry.wl-lukeshu@lukeshu.com>

Hi,

I'm implementing message threading, and have been referencing both your document <https://www.jwz.org/doc/threading.html>; and RFC 5256. I'm not sure whether you're interested in updating a document that's more than 25 years old, but if you are: I hope you find the following feedback valuable.

You write that the algorithm in RFC 5256 is merely a restating of your algorithm, but I noticed 3 (minor) differences:

  1. In your step 1.C, the RFC says to check whether this would create a loop, and if it would to skip creating the link; your version only says to perform this check in step 1.B.

  2. The RFC says to sort the messages by date between your steps 4 and 5; that is: when grouping by subject, containers in the root set should be processed in date-order (you do not specify an order), and that if container in the root set is empty then the subject should be taken from the earliest-date child (you say to use an arbitrary child).

  3. The RFC precisely states how to trim a subject down to a "base subject," rather than simply saying Strip ``Re:'', ``RE:'', ``RE[5]:'', ``Re: Re[4]: Re:'' and so on.

Additionally, there are two minor points on which I found their version to be clearer:

  1. The RFC specifies how to handle messages without a Message-Id or with a duplicate Message-Id (on page 9), as well as how to normalize a Message-Id (by referring to RFC 2822). This is perhaps out-of-scope of your algorithm document, but I feel that it would be worth mentioning in your background or definitions section.

  2. In your step 1.B, I did not understand what If they are already linked, don't change the existing links meant until I read the RFC, which words it as If a message already has a parent, don't change the existing link. It was unclear to me what they was referring to in your version.

--
Happy hacking,
~ Luke T. Shumaker