Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Dovecot mailing list
I am wondering why sieve-filter is so slow compared to gnu sieve.

I run mpop (like getmail) to download from a pop3 server to a local
mbox file: ~/mail/email-incoming-unsorted

This step is very fast.

The next step, I throw the email-incoming-unsorted mbox file at a
sieve processor, to sort the emails from that mbox, into other
mboxes, according to the sieve rules file.

Up until a couple days ago I was using Gnu sieve.

Gnu sieve balks on emails which have no x-message-id (?? something
like this) header field, so after a few years, I finally decided to
switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.

Using Gnu sieve, this mbox sorting step was even faster than mpop (/
getmail) - and mpop and getmail are really fast (compared with
fetchmail), since they pipeline the email downloads.

Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
at most. Super fast.

Using sieve-filter, all emails are being processed - including those
without "message id header". This is good.

But also, using sieve filter, is really slower - slower than the
download step by an order of magnitude or two.

See below for details, any ideas appreciated.

To add to the below, I added:

mbox_very_dirty_syncs = yes

to the sieve-filter config, which slightly improves performance, but
not by much (in comparison with Gnu sieve).

TIA,



----- Forwarded message from Zenaan Harkness <[hidden email]> -----

From: Zenaan Harkness <[hidden email]>
To: [hidden email]
Date: Thu, 12 Sep 2019 08:06:12 +1000
Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:

> Why is Gnu sieve so extremely fast to batch process an mbox file, but
> while Dovecot's sieve-filter is an order of magnitude slower?
>
> Sequence:
>
>  - mpop or getmail to pipeline download emails into temp mbox file
>  - filter that file
>
> Gnu sieve just flies through a local mbox file and saving emails to
> other local mbox files.
>
> Gnu sieve rejects too many emails with "malformed" errors, so after a
> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
>
> Dovecot's sieve-filter, at present, is an order of magnitude slower.
>
> Here's my filter command (one line):
>
> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted
>
> The sieve script is fine now that I have the correct "require"
> clauses (hint: "capability strings").
>
> File ~/etc/email/sieve-dovecot-config.conf:
>
>   protocols = pop
>   lda_mailbox_autocreate = yes
>   lda_mailbox_autosubscribe = yes
>   mail_fsync = never
>
> There's no re-sending of emails into my local Postfix SMTP server - I
> checked the system logs and confirmed this (journalctl -f).
>
> I suspect that Gnu sieve was directly writing each email to the
> appropriate sieve-determined mbox file (perhaps with only a sync at
> the end of a single batch process - what I've attempted to achieve
> above with sieve-filter), and that sieve-filter is instead passing
> each email through some (dovecot) lda?
>
> Here's the output for a sieve-filter batch processing of 11 emails:
>
> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted
> # PS0 Timestamp: 20190912@07:02:23
> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: VentureBeat: The death of disk? H...'.
> info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=[hidden email]>: stored mail into mailbox 'l/cp/cp'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] xattr naming format in Zo...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: [zfs-devel] xattr naming format i...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: [Gluster-users] Issues with Geo-r...'.
> info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=[hidden email]>: stored mail into mailbox 'l/gl/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/deb/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: [zfs-devel] xattr naming format i...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: [asterisk-users] Playing MP3's in...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/as/users'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: [Hyperledger Fabric] a primitive ...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/hl/fabric'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<[hidden email]>: stored mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $
>
>
> So about 3/4 of a second is spent by dovecot's sieve-filter, on each
> email that it processes - watching it is painful given how fast Gnu
> sieve has been for the last few years - it's almost (but not quite)
> as slow as my previous fetchmail email download per-email time.
>
> Attached is a -D debug run of sieve-filter on 20 emails - slightly
> longer than the above, and took roughly 15 seconds to run.
>
> Any help appreciated...


On another test run of ~600 emails, sieve-filter is consistently
running ~100% of one CPU (for about 4 minutes) to process these
emails, which leads to the conclusion that despite what looks like
should be a batch process, sieve-filter is perhaps reloading the
rules for every single email that it processes, even though I gave it
a whole mbox, and not a single email, to process.

Can sieve-filter work the way it should / the way I want it / batch
process a whole mbox - without reloading the sieve rules for every
email?

----- End forwarded message -----
Reply | Threaded
Open this post in threaded view
|

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Dovecot mailing list
Don't use mbox.

It is very slow format when mails need to be deleted from the middle. Basically rewriting the whole mbox file each time.

Use sdbox instead.

Sami


> On 12 Sep 2019, at 9.57, Zenaan Harkness via dovecot <[hidden email]> wrote:
>
> I am wondering why sieve-filter is so slow compared to gnu sieve.
>
> I run mpop (like getmail) to download from a pop3 server to a local
> mbox file: ~/mail/email-incoming-unsorted
>
> This step is very fast.
>
> The next step, I throw the email-incoming-unsorted mbox file at a
> sieve processor, to sort the emails from that mbox, into other
> mboxes, according to the sieve rules file.
>
> Up until a couple days ago I was using Gnu sieve.
>
> Gnu sieve balks on emails which have no x-message-id (?? something
> like this) header field, so after a few years, I finally decided to
> switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
>
> Using Gnu sieve, this mbox sorting step was even faster than mpop (/
> getmail) - and mpop and getmail are really fast (compared with
> fetchmail), since they pipeline the email downloads.
>
> Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
> at most. Super fast.
>
> Using sieve-filter, all emails are being processed - including those
> without "message id header". This is good.
>
> But also, using sieve filter, is really slower - slower than the
> download step by an order of magnitude or two.
>
> See below for details, any ideas appreciated.
>
> To add to the below, I added:
>
> mbox_very_dirty_syncs = yes
>
> to the sieve-filter config, which slightly improves performance, but
> not by much (in comparison with Gnu sieve).
>
> TIA,
>
>
>
> ----- Forwarded message from Zenaan Harkness <[hidden email]> -----
>
> From: Zenaan Harkness <[hidden email]>
> To: [hidden email]
> Date: Thu, 12 Sep 2019 08:06:12 +1000
> Subject: Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)
>
> On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
>> Why is Gnu sieve so extremely fast to batch process an mbox file, but
>> while Dovecot's sieve-filter is an order of magnitude slower?
>>
>> Sequence:
>>
>> - mpop or getmail to pipeline download emails into temp mbox file
>> - filter that file
>>
>> Gnu sieve just flies through a local mbox file and saving emails to
>> other local mbox files.
>>
>> Gnu sieve rejects too many emails with "malformed" errors, so after a
>> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
>>
>> Dovecot's sieve-filter, at present, is an order of magnitude slower.
>>
>> Here's my filter command (one line):
>>
>> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions ~/etc/email/sieve.rc email-incoming-unsorted
>>
>> The sieve script is fine now that I have the correct "require"
>> clauses (hint: "capability strings").
>>
>> File ~/etc/email/sieve-dovecot-config.conf:
>>
>>  protocols = pop
>>  lda_mailbox_autocreate = yes
>>  lda_mailbox_autosubscribe = yes
>>  mail_fsync = never
>>
>> There's no re-sending of emails into my local Postfix SMTP server - I
>> checked the system logs and confirmed this (journalctl -f).
>>
>> I suspect that Gnu sieve was directly writing each email to the
>> appropriate sieve-determined mbox file (perhaps with only a sync at
>> the end of a single batch process - what I've attempted to achieve
>> above with sieve-filter), and that sieve-filter is instead passing
>> each email through some (dovecot) lda?
>>
>> Here's the output for a sieve-filter batch processing of 11 emails:
>>
>> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf -o mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions /home/zen/etc/email/sieve.rc email-incoming-unsorted
>> # PS0 Timestamp: 20190912@07:02:23
>> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: VentureBeat: The death of disk? H...'.
>> info: msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=[hidden email]>: stored mail into mailbox 'l/cp/cp'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] xattr naming format in Zo...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/z/zdev'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: [zfs-devel] xattr naming format i...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/z/zdev'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: [Gluster-users] Issues with Geo-r...'.
>> info: msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=[hidden email]>: stored mail into mailbox 'l/gl/user'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/deb/user'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/ansible/awx'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/ansible/awx'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: [zfs-devel] xattr naming format i...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/z/zdev'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: [asterisk-users] Playing MP3's in...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/as/users'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: [Hyperledger Fabric] a primitive ...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/hl/fabric'.
>> info: message expunged from source mailbox upon successful move.
>> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] `[awx-project] Re: AWX on Kubernetes m...'.
>> info: msgid=<[hidden email]>: stored mail into mailbox 'l/ansible/awx'.
>> info: message expunged from source mailbox upon successful move.
>> 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $
>>
>>
>> So about 3/4 of a second is spent by dovecot's sieve-filter, on each
>> email that it processes - watching it is painful given how fast Gnu
>> sieve has been for the last few years - it's almost (but not quite)
>> as slow as my previous fetchmail email download per-email time.
>>
>> Attached is a -D debug run of sieve-filter on 20 emails - slightly
>> longer than the above, and took roughly 15 seconds to run.
>>
>> Any help appreciated...
>
>
> On another test run of ~600 emails, sieve-filter is consistently
> running ~100% of one CPU (for about 4 minutes) to process these
> emails, which leads to the conclusion that despite what looks like
> should be a batch process, sieve-filter is perhaps reloading the
> rules for every single email that it processes, even though I gave it
> a whole mbox, and not a single email, to process.
>
> Can sieve-filter work the way it should / the way I want it / batch
> process a whole mbox - without reloading the sieve rules for every
> email?
>
> ----- End forwarded message -----

Reply | Threaded
Open this post in threaded view
|

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Dovecot mailing list
In reply to this post by Dovecot mailing list
(I did subscribe to this mailing list, albeit with zen at
freedbms.net, so either way I'm getting all your emails - thank you
-so- much for replying...)

MUA is mutt, reading email in a terminal (sorry, forgot to mention this before).

For many years now my email folder (mbox files) collection has grown
to many GiB, mostly mailing lists.

If I am to change email storage format, it should be mutt compatible;
looking at https://wiki2.dovecot.org/MailboxFormat I see that only
DJB's Maildir is compatible with both Dovecot ("a reliable choice"
says the wiki), and mutt.

I can imagine that sdbox or mdbox could be made "mutt compatible" so
to speak, by running some sort of local IMAP server, and accessing my
email from mutt that way; this is undesirable to my mind because this
would require:

 1) a new learning curve wrt mutt and reading email on IMAP servers
 2) a new learning curve to set up a local IMAP server (securely)
 3) the inability to use mutt without a local IMAP server to read my local email

but such a setup would also have some quite desirable benefits:

 1) once set up, multiple MUAs could be used, and I'd have a beginning
grasp on setting up an IMAP server and front ends (this is something
on my bucket list, to assist my local church with)
 2) simpler remote "online" access to my local "offline" email store
(e.g. using my mobile phone when on the road) by setting up a webmail
server (much simpler (read "possible" to use on a mobile phone) than
using a vpn and mutt...), thus freeing me up from the behemoth web
email providers...

Next, I do not know how to "pipe the messages to the dovecot lda".
After downloading from my POP3 provider into a local mbox file (this
is my step 1), then I sort the emails (this is my step 2): the
following should be on a single line:

/usr/bin/sieve-filter -veW -c
$HOME/etc/email/sieve-dovecot-config.conf -o
mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
~/etc/email/sieve.rc email-incoming-unsorted

As you can see from the above command, sieve-filter is given the name
of the mbox ("mail folder") to sort, as its very last argument on the
command line - so in this instance, sieve-filter really has no excuse,
and should be not be re-reading the sieve rules script for each email
- now perhaps that's not happening, I only made an assumption because
of a CPU hitting 100% for a minute or two just to process a few 100
emails...

What could also be happening (again, an assumption), is that
sieve-filter is written to assume dovecot index files to be in
existence.

I disabled those with the "INDEX=" clause you see in the command
above, which obviously has been given no value.

The reason I figured out how to disable the creation of the indexes in
the .imap directories, is that for my setup, Gnu sieve has proven that
I should not need such indexes - with mbox files, just append each
email to the end of the target "mailbox folder" mbox file, and we're
done! This literally should not cost 100% CPU, even for one
millisecond! But more importantly, because my working email folder is
~30GiB, without disabling this index creation step, sieve-filter
forced the creation of indexes, which "took so long I gave up and hit
CTRL-C, which did not work, so I kill -9'ed the sieve-filter and
whatever other process was not stopping".

Last year someone on debian-user recommended I upgrade to using
Dovecot/Pigeonhole's sieve-filter (rather than Gnu sieve) due to the
issues with Gnu sieve.

I am starting to think that I should perhaps try to figure out if it's
possible to (re)process the emails Gnu sieve has a problem with, to
massage them into a shape that Gnu sieve accepts - then my immediate
problem would certainly be solved...

Thank you all again..
Zenaan
Reply | Threaded
Open this post in threaded view
|

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Dovecot mailing list
Oh, one last bit for now regarding pipeing:

Given my current sieve-filter command:

MLOC="mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions"
SCRIPT=~/etc/email/sieve.rc

sieve-filter -veWD -c $SIEVE_CONF -o $MLOC $SCRIPT emails-incoming

I can imagine trying to do a pipe as suggested, like follows:

cat ~/mail/emails-incoming | sieve-filter -veW -c $SIEVE_CONF -o $MLOC $SCRIPT

But, I see no suggestion in the sieve-filter man page that this would
work. ISTM that sieve-filter just is not designed to work in a local
mbox email environment.
Reply | Threaded
Open this post in threaded view
|

Re: Gnu sieve vs Dovecot sieve-filter - sieve-filter extremely slow at lda (writing emails to local mbox files)

Dovecot mailing list
In reply to this post by Dovecot mailing list
On Sep 12, 2019, at 12:57 AM, Zenaan Harkness <[hidden email]> wrote:
> The next step, I throw the email-incoming-unsorted mbox file at a
> sieve processor, to sort the emails from that mbox, into other
> mboxes, according to the sieve rules file.

I would expect mbox is the worst possible format choice for this.

> Gnu sieve balks on emails which have no x-message-id (?? something
> like this) header field, so after a few years, I finally decided to
> switch "up" to Dovecot/Pigeonhole's "sieve-filter" command.
>
> Using Gnu sieve, this mbox sorting step was even faster than mpop (/
> getmail) - and mpop and getmail are really fast (compared with
> fetchmail), since they pipeline the email downloads.

Perhaps because of its reliance on the header allowing it to index?

> Even with 100s of emails, Gnu sieve would take only 10 to 20 seconds
> at most. Super fast.

That doesn’t sound fast. I processed a few thousand messages through sieve in less than 10 seconds, if I recall correctly.

> See below for details, any ideas appreciated.

The first thing I would do is download to Maildir and see what the difference is.



--
What we have here is a failure to communicate.