warning: NFS hangs with dovecot 2.3.8 on Debian buster

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

warning: NFS hangs with dovecot 2.3.8 on Debian buster

Dovecot mailing list
A warning to those considering to upgrade to Debian 10 (buster): we have seen occasional NFS hangs with dovecot when using the stock debian buster kernel (4.19.67-2+deb10u1).

When we downgrade to the debian stretch kernel (4.9.189-3+deb9u1), the issue does not occur. Note that we *only* downgraded the kernel, the rest of the OS is still debian buster. Dovecot 2.3.8.

A little more info: we have a dovecot cluster, using mdbox for storage, on an NFS server (netapp Cmode version 9.6P2). We use a dovecot director layer, so a user is always connected to the same back-end dovecot server. The NFS hang occurs on the back-end server.

Once the process hangs, other processes trying to write to the same mailbox, will get an error like this:

Timeout (180s) while waiting for lock for transaction log file /var/mail/.../index/storage/dovecot.map.index.log (WRITE lock held by pid XXXX)

The stuck process itself doesn't seem to do anything, is stuck in "D" disk state, "strace" doesn't show anything (and after attaching, strace itself needs a kill -KILL to stop). The only way to unwedge the process seems to be to do a kill -KILL of the stuck process. Reading from the mailbox is still possible.

We are in the process of contacting the linux-nfs folks about this, but I'm trying to reproduce this on a test-cluster first, to be able to present a well-documented case. Since this hang doesn't happen immediately, but takes a few hours to a day to occur in the wild, or a few thousand writes to the same mailbox, it's a bit hard to debug.

--
Jan-Pieter Cornet <[hidden email]>
Systeembeheer XS4ALL Internet bv
www.xs4all.nl



signature.asc (981 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: warning: NFS hangs with dovecot 2.3.8 on Debian buster

Dovecot mailing list
On 25-10-19 19:41, Jan-Pieter Cornet via dovecot wrote:
> We are in the process of contacting the linux-nfs folks about this, but I'm trying to reproduce this on a test-cluster first, to be able to present a well-documented case. Since this hang doesn't happen immediately, but takes a few hours to a day to occur in the wild, or a few thousand writes to the same mailbox, it's a bit hard to debug.

Just FTR, I finally sent mail to the linux-nfs list about this. See eg https://marc.info/?l=linux-nfs&m=157260601632323&w=2

No replies yet - if^H^Hwhen this gets resolved I'll report back to this list.

--
Jan-Pieter Cornet <[hidden email]>
Systeembeheer XS4ALL Internet bv
www.xs4all.nl



signature.asc (981 bytes) Download Attachment