dovecot sieve duplicates detection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

dovecot sieve duplicates detection

André Rodier-2
Hello,

I have tested the sieve duplicate script with success so far, but I have
a question.

I would like to know if the "duplicate" sieve flag in Dovecot is global
to all folders, or specific to one folder only.

For instance, if I copy an email from one folder to another, and I have
a discard action on duplicate email, is this action will be applied (in
this case, discard) or not.

If the duplicate is global to all folders, is there a way to restrict
the search in one folder only.

Thanks for your help.
André
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

Stephan Bosch-2


Op 11-4-2018 om 23:58 schreef André Rodier:
> Hello,
>
> I have tested the sieve duplicate script with success so far, but I have
> a question.

Sieve duplicate script? You mean the Sieve duplicate extension (RFC 7352)?

> I would like to know if the "duplicate" sieve flag in Dovecot is global
> to all folders, or specific to one folder only.

It uses the lda-dupes file in the user's home directory. So, it is not
normally related to folders, although the identifier used for duplicate
matching could be composed of the mailbox name if you want.

> For instance, if I copy an email from one folder to another, and I have
> a discard action on duplicate email, is this action will be applied (in
> this case, discard) or not.

Are you talking about IMAPSieve now? I am not sure "duplicate" is
currently even allowed in that context.

> If the duplicate is global to all folders, is there a way to restrict
> the search in one folder only.

You can set the :uniqueid parameter accordingly.

Regards,

Stephan.
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

André Rodier-2
On 23/04/18 14:18, Stephan Bosch wrote:

>
>
> Op 11-4-2018 om 23:58 schreef André Rodier:
>> Hello,
>>
>> I have tested the sieve duplicate script with success so far, but I have
>> a question.
>
> Sieve duplicate script? You mean the Sieve duplicate extension (RFC 7352)?
>
>> I would like to know if the "duplicate" sieve flag in Dovecot is global
>> to all folders, or specific to one folder only.
>
> It uses the lda-dupes file in the user's home directory. So, it is not
> normally related to folders, although the identifier used for duplicate
> matching could be composed of the mailbox name if you want.
>
>> For instance, if I copy an email from one folder to another, and I have
>> a discard action on duplicate email, is this action will be applied (in
>> this case, discard) or not.
>
> Are you talking about IMAPSieve now? I am not sure "duplicate" is
> currently even allowed in that context.
>
>> If the duplicate is global to all folders, is there a way to restrict
>> the search in one folder only.
>
> You can set the :uniqueid parameter accordingly.
>
> Regards,
>
> Stephan.

Thank you, Stephan.

Yes, I meant the Sieve duplicate extension.

I am using a program to import email (mbsync), which use the IMAP append
function. Sometimes, the import fail and I have to restart the program.
Unfortunately, the same emails are imported again.

I found a fix by using a dovecot IMAP sieve script executed on the
APPEND action
(https://wiki.dovecot.org/Pigeonhole/Sieve/Plugins/IMAPSieve). I wrote a
custom sieve script that "discard" the ones that are detected as
"duplicate". It worked very well and the emails were not any more
imported twice.

However, there was a huge side effect: archiving an email with
Thunderbird is not working any more, and even lost! I have been able to
understand the error as this:

1. When archiving an email with Thunderbird, it is first copied (APPEND)
into the archive folder, but the original folder is not expunged.
2. The sieve script detect the email as duplicate, and discard it.
3. When the original folder is expunged, the source email is lost...

My conclusion was the duplicate detection function is global to all folders.

If I could restrict the detection of duplicates in the current folder
only, this would let me run the import program again without error.

Kind regards,
André.
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

Stephan Bosch-2


Op 23/04/2018 om 22:03 schreef André Rodier:

> On 23/04/18 14:18, Stephan Bosch wrote:
>>
>>
>> Op 11-4-2018 om 23:58 schreef André Rodier:
>>> Hello,
>>>
>>> I have tested the sieve duplicate script with success so far, but I
>>> have
>>> a question.
>>
>> Sieve duplicate script? You mean the Sieve duplicate extension (RFC
>> 7352)?
>>
>>> I would like to know if the "duplicate" sieve flag in Dovecot is global
>>> to all folders, or specific to one folder only.
>>
>> It uses the lda-dupes file in the user's home directory. So, it is
>> not normally related to folders, although the identifier used for
>> duplicate matching could be composed of the mailbox name if you want.
>>
>>> For instance, if I copy an email from one folder to another, and I have
>>> a discard action on duplicate email, is this action will be applied (in
>>> this case, discard) or not.
>>
>> Are you talking about IMAPSieve now? I am not sure "duplicate" is
>> currently even allowed in that context.
>>
>>> If the duplicate is global to all folders, is there a way to restrict
>>> the search in one folder only.
>>
>> You can set the :uniqueid parameter accordingly.
>>
>> Regards,
>>
>> Stephan.
>
> Thank you, Stephan.
>
> Yes, I meant the Sieve duplicate extension.
>
> I am using a program to import email (mbsync), which use the IMAP
> append function. Sometimes, the import fail and I have to restart the
> program. Unfortunately, the same emails are imported again.
>
> I found a fix by using a dovecot IMAP sieve script executed on the
> APPEND action
> (https://wiki.dovecot.org/Pigeonhole/Sieve/Plugins/IMAPSieve). I wrote
> a custom sieve script that "discard" the ones that are detected as
> "duplicate". It worked very well and the emails were not any more
> imported twice.
>
> However, there was a huge side effect: archiving an email with
> Thunderbird is not working any more, and even lost! I have been able
> to understand the error as this:
>
> 1. When archiving an email with Thunderbird, it is first copied
> (APPEND) into the archive folder, but the original folder is not
> expunged.
> 2. The sieve script detect the email as duplicate, and discard it.
> 3. When the original folder is expunged, the source email is lost...
>
> My conclusion was the duplicate detection function is global to all
> folders.
>
> If I could restrict the detection of duplicates in the current folder
> only, this would let me run the import program again without error.

Specify the ID used for duplicate checking explicitly using the
:uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
Using the variables extenion, compose the uniqueid from the message-id
and the mailbox name.

Regards,

Stephan.

Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

André Rodier-2
On 25/04/18 20:20, Stephan Bosch wrote:

>
>
> Op 23/04/2018 om 22:03 schreef André Rodier:
>> On 23/04/18 14:18, Stephan Bosch wrote:
>>>
>>>
>>> Op 11-4-2018 om 23:58 schreef André Rodier:
>>>> Hello,
>>>>
>>>> I have tested the sieve duplicate script with success so far, but I
>>>> have
>>>> a question.
>>>
>>> Sieve duplicate script? You mean the Sieve duplicate extension (RFC
>>> 7352)?
>>>
>>>> I would like to know if the "duplicate" sieve flag in Dovecot is global
>>>> to all folders, or specific to one folder only.
>>>
>>> It uses the lda-dupes file in the user's home directory. So, it is
>>> not normally related to folders, although the identifier used for
>>> duplicate matching could be composed of the mailbox name if you want.
>>>
>>>> For instance, if I copy an email from one folder to another, and I have
>>>> a discard action on duplicate email, is this action will be applied (in
>>>> this case, discard) or not.
>>>
>>> Are you talking about IMAPSieve now? I am not sure "duplicate" is
>>> currently even allowed in that context.
>>>
>>>> If the duplicate is global to all folders, is there a way to restrict
>>>> the search in one folder only.
>>>
>>> You can set the :uniqueid parameter accordingly.
>>>
>>> Regards,
>>>
>>> Stephan.
>>
>> Thank you, Stephan.
>>
>> Yes, I meant the Sieve duplicate extension.
>>
>> I am using a program to import email (mbsync), which use the IMAP
>> append function. Sometimes, the import fail and I have to restart the
>> program. Unfortunately, the same emails are imported again.
>>
>> I found a fix by using a dovecot IMAP sieve script executed on the
>> APPEND action
>> (https://wiki.dovecot.org/Pigeonhole/Sieve/Plugins/IMAPSieve). I wrote
>> a custom sieve script that "discard" the ones that are detected as
>> "duplicate". It worked very well and the emails were not any more
>> imported twice.
>>
>> However, there was a huge side effect: archiving an email with
>> Thunderbird is not working any more, and even lost! I have been able
>> to understand the error as this:
>>
>> 1. When archiving an email with Thunderbird, it is first copied
>> (APPEND) into the archive folder, but the original folder is not
>> expunged.
>> 2. The sieve script detect the email as duplicate, and discard it.
>> 3. When the original folder is expunged, the source email is lost...
>>
>> My conclusion was the duplicate detection function is global to all
>> folders.
>>
>> If I could restrict the detection of duplicates in the current folder
>> only, this would let me run the import program again without error.
>
> Specify the ID used for duplicate checking explicitly using the
> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
> Using the variables extenion, compose the uniqueid from the message-id
> and the mailbox name.
>
> Regards,
>
> Stephan.
>

Thank you, I will try this.

André
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

James Cassell
In reply to this post by Stephan Bosch-2

On Wed, Apr 25, 2018, at 3:20 PM, Stephan Bosch wrote:

>
>
> Op 23/04/2018 om 22:03 schreef André Rodier:
> > On 23/04/18 14:18, Stephan Bosch wrote:
> >>
> >>
> >> Op 11-4-2018 om 23:58 schreef André Rodier:
> >>> Hello,
> >>>
> >>> I have tested the sieve duplicate script with success so far, but I
> >>> have
> >>> a question.
> >>
> >> Sieve duplicate script? You mean the Sieve duplicate extension (RFC
> >> 7352)?
> >>
> >>> I would like to know if the "duplicate" sieve flag in Dovecot is global
> >>> to all folders, or specific to one folder only.
> >>
> >> It uses the lda-dupes file in the user's home directory. So, it is
> >> not normally related to folders, although the identifier used for
> >> duplicate matching could be composed of the mailbox name if you want.
> >>
> >>> For instance, if I copy an email from one folder to another, and I have
> >>> a discard action on duplicate email, is this action will be applied (in
> >>> this case, discard) or not.
> >>
> >> Are you talking about IMAPSieve now? I am not sure "duplicate" is
> >> currently even allowed in that context.
> >>
> >>> If the duplicate is global to all folders, is there a way to restrict
> >>> the search in one folder only.
> >>
> >> You can set the :uniqueid parameter accordingly.
> >>
> >> Regards,
> >>
> >> Stephan.
> >
> > Thank you, Stephan.
> >
> > Yes, I meant the Sieve duplicate extension.
> >
> > I am using a program to import email (mbsync), which use the IMAP
> > append function. Sometimes, the import fail and I have to restart the
> > program. Unfortunately, the same emails are imported again.
> >
> > I found a fix by using a dovecot IMAP sieve script executed on the
> > APPEND action
> > (https://wiki.dovecot.org/Pigeonhole/Sieve/Plugins/IMAPSieve). I wrote
> > a custom sieve script that "discard" the ones that are detected as
> > "duplicate". It worked very well and the emails were not any more
> > imported twice.
> >
> > However, there was a huge side effect: archiving an email with
> > Thunderbird is not working any more, and even lost! I have been able
> > to understand the error as this:
> >
> > 1. When archiving an email with Thunderbird, it is first copied
> > (APPEND) into the archive folder, but the original folder is not
> > expunged.
> > 2. The sieve script detect the email as duplicate, and discard it.
> > 3. When the original folder is expunged, the source email is lost...
> >
> > My conclusion was the duplicate detection function is global to all
> > folders.
> >
> > If I could restrict the detection of duplicates in the current folder
> > only, this would let me run the import program again without error.
>
> Specify the ID used for duplicate checking explicitly using the
> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
> Using the variables extenion, compose the uniqueid from the message-id
> and the mailbox name.
>

In my experience with dovecot's implementation, you can set the ID only once in a script.  If you try to filter duplicates based on multiple IDs, only the first (or last, I don't remember) takes effect.

V/r,
James Cassell
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

Stephan Bosch-2


Op 25/04/2018 om 22:49 schreef James Cassell:
> On Wed, Apr 25, 2018, at 3:20 PM, Stephan Bosch wrote:
>>
>> Specify the ID used for duplicate checking explicitly using the
>> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
>> Using the variables extenion, compose the uniqueid from the message-id
>> and the mailbox name.
>>
> In my experience with dovecot's implementation, you can set the ID only once in a script.  If you try to filter duplicates based on multiple IDs, only the first (or last, I don't remember) takes effect.
>

Do you have a detailed example of the supposed wrong behavior?

Regards,

Stephan.
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

James Cassell


On Mon, May 14, 2018, at 4:52 PM, Stephan Bosch wrote:

>
>
> Op 25/04/2018 om 22:49 schreef James Cassell:
> > On Wed, Apr 25, 2018, at 3:20 PM, Stephan Bosch wrote:
> >>
> >> Specify the ID used for duplicate checking explicitly using the
> >> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
> >> Using the variables extenion, compose the uniqueid from the message-id
> >> and the mailbox name.
> >>
> > In my experience with dovecot's implementation, you can set the ID only once in a script.  If you try to filter duplicates based on multiple IDs, only the first (or last, I don't remember) takes effect.
> >
>
> Do you have a detailed example of the supposed wrong behavior?
>

I don't have them readily available. Basically, the result of the first duplicate test in a script is taken as the result of any future duplicate test, even if the parameters to that future duplicate test in the same script are different and would otherwise result in a different output. The duplicate test is only evaluated once and its results are substituted everywhere.

For example, I might want to flag a message as a new conversation if I have not seen another message with the same subject. In the same script, I might want to discard messages that are exactly identical including message ID among others. The dovecot behavior would be to discard all messages that match a subject of previously received message.

> Regards,
>
> Stephan.

V/r,
James Cassell
Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

Stephan Bosch-2


Op 14/05/2018 om 23:03 schreef James Cassell:

>
> On Mon, May 14, 2018, at 4:52 PM, Stephan Bosch wrote:
>>
>> Op 25/04/2018 om 22:49 schreef James Cassell:
>>> On Wed, Apr 25, 2018, at 3:20 PM, Stephan Bosch wrote:
>>>> Specify the ID used for duplicate checking explicitly using the
>>>> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
>>>> Using the variables extenion, compose the uniqueid from the message-id
>>>> and the mailbox name.
>>>>
>>> In my experience with dovecot's implementation, you can set the ID only once in a script.  If you try to filter duplicates based on multiple IDs, only the first (or last, I don't remember) takes effect.
>>>
>> Do you have a detailed example of the supposed wrong behavior?
>>
> I don't have them readily available. Basically, the result of the first duplicate test in a script is taken as the result of any future duplicate test, even if the parameters to that future duplicate test in the same script are different and would otherwise result in a different output. The duplicate test is only evaluated once and its results are substituted everywhere.
>
> For example, I might want to flag a message as a new conversation if I have not seen another message with the same subject. In the same script, I might want to discard messages that are exactly identical including message ID among others. The dovecot behavior would be to discard all messages that match a subject of previously received message.

I finally managed to review this issue and I can confirm that this is a bug.

Regards,

Stephan.

Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

Dovecot mailing list


On 17/08/2018 09:14, Stephan Bosch wrote:

>
>
> Op 14/05/2018 om 23:03 schreef James Cassell:
>>
>> On Mon, May 14, 2018, at 4:52 PM, Stephan Bosch wrote:
>>>
>>> Op 25/04/2018 om 22:49 schreef James Cassell:
>>>> On Wed, Apr 25, 2018, at 3:20 PM, Stephan Bosch wrote:
>>>>> Specify the ID used for duplicate checking explicitly using the
>>>>> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
>>>>> Using the variables extenion, compose the uniqueid from the
>>>>> message-id
>>>>> and the mailbox name.
>>>>>
>>>> In my experience with dovecot's implementation, you can set the ID
>>>> only once in a script.  If you try to filter duplicates based on
>>>> multiple IDs, only the first (or last, I don't remember) takes effect.
>>>>
>>> Do you have a detailed example of the supposed wrong behavior?
>>>
>> I don't have them readily available. Basically, the result of the
>> first duplicate test in a script is taken as the result of any future
>> duplicate test, even if the parameters to that future duplicate test
>> in the same script are different and would otherwise result in a
>> different output. The duplicate test is only evaluated once and its
>> results are substituted everywhere.
>>
>> For example, I might want to flag a message as a new conversation if
>> I have not seen another message with the same subject. In the same
>> script, I might want to discard messages that are exactly identical
>> including message ID among others. The dovecot behavior would be to
>> discard all messages that match a subject of previously received
>> message.
>
> I finally managed to review this issue and I can confirm that this is
> a bug.

Fix released in 2.3.9.

Regards,

Stephan.

Reply | Threaded
Open this post in threaded view
|

Re: dovecot sieve duplicates detection

Dovecot mailing list

On Wed, Dec 4, 2019, at 1:14 PM, Stephan Bosch via dovecot wrote:

>
>
> On 17/08/2018 09:14, Stephan Bosch wrote:
> >
> >
> > Op 14/05/2018 om 23:03 schreef James Cassell:
> >>
> >> On Mon, May 14, 2018, at 4:52 PM, Stephan Bosch wrote:
> >>>
> >>> Op 25/04/2018 om 22:49 schreef James Cassell:
> >>>> On Wed, Apr 25, 2018, at 3:20 PM, Stephan Bosch wrote:
> >>>>> Specify the ID used for duplicate checking explicitly using the
> >>>>> :uniqueid argument (https://tools.ietf.org/html/rfc7352#section-3.1).
> >>>>> Using the variables extenion, compose the uniqueid from the
> >>>>> message-id
> >>>>> and the mailbox name.
> >>>>>
> >>>> In my experience with dovecot's implementation, you can set the ID
> >>>> only once in a script.  If you try to filter duplicates based on
> >>>> multiple IDs, only the first (or last, I don't remember) takes effect.
> >>>>
> >>> Do you have a detailed example of the supposed wrong behavior?
> >>>
> >> I don't have them readily available. Basically, the result of the
> >> first duplicate test in a script is taken as the result of any future
> >> duplicate test, even if the parameters to that future duplicate test
> >> in the same script are different and would otherwise result in a
> >> different output. The duplicate test is only evaluated once and its
> >> results are substituted everywhere.
> >>
> >> For example, I might want to flag a message as a new conversation if
> >> I have not seen another message with the same subject. In the same
> >> script, I might want to discard messages that are exactly identical
> >> including message ID among others. The dovecot behavior would be to
> >> discard all messages that match a subject of previously received
> >> message.
> >
> > I finally managed to review this issue and I can confirm that this is
> > a bug.
>
> Fix released in 2.3.9.
>

Awesome! Thanks for the followup!

V/r,
James Cassell