Global FTS index?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Global FTS index?

Patrick Nagel-2
Hi,

I tried the FTS (and FTS Squat) plugin today, and it works as advertised.

But: On my 13000 folders with 160000 mails maildir I use for testing, the
speed increase is not as big as one would wish (it still takes several
minutes to complete a search).

Is my assumption correct, that there is no way to do a search over a big IMAP
folder hierarchy in a reasonable amount of time, because each folder has to
be 'selected', and only one folder can be selected at once?

Patrick.

--
STAR Software (Shanghai) Co., Ltd.            http://www.star-group.net/
Phone:    +86 (21) 3462 7688 x 826             Fax:   +86 (21) 3462 7779

PGP key:         https://stshacom1.star-china.net/keys/patrick_nagel.asc
Fingerprint:           E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Search over big folder hierarchy (was: Global FTS index?)

Patrick Nagel-2
On Tuesday 24 June 2008, Patrick Nagel wrote:

> Hi,
>
> I tried the FTS (and FTS Squat) plugin today, and it works as advertised.
>
> But: On my 13000 folders with 160000 mails maildir I use for testing, the
> speed increase is not as big as one would wish (it still takes several
> minutes to complete a search).
>
> Is my assumption correct, that there is no way to do a search over a big
> IMAP folder hierarchy in a reasonable amount of time, because each folder
> has to be 'selected', and only one folder can be selected at once?
>
> Patrick.
I think this subject is more suitable for the mail's content. At first I
focussed on the FTS index, and changed the mail later on, after which I
forgot to change the subject...

Patrick.

--
STAR Software (Shanghai) Co., Ltd.            http://www.star-group.net/
Phone:    +86 (21) 3462 7688 x 826             Fax:   +86 (21) 3462 7779

PGP key:         https://stshacom1.star-china.net/keys/patrick_nagel.asc
Fingerprint:           E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Search over big folder hierarchy (was: Global FTS index?)

Timo Sirainen
On Tue, 2008-06-24 at 15:29 +0800, Patrick Nagel wrote:

> On Tuesday 24 June 2008, Patrick Nagel wrote:
> > Hi,
> >
> > I tried the FTS (and FTS Squat) plugin today, and it works as advertised.
> >
> > But: On my 13000 folders with 160000 mails maildir I use for testing, the
> > speed increase is not as big as one would wish (it still takes several
> > minutes to complete a search).
> >
> > Is my assumption correct, that there is no way to do a search over a big
> > IMAP folder hierarchy in a reasonable amount of time, because each folder
> > has to be 'selected', and only one folder can be selected at once?
Yes, they have to be selected. There isn't any way currently in IMAP to
search from multiple mailboxes using a single command, so even if
Dovecot implemented a Squat index that indexed mails from all mailboxes,
you'd still have to implement a non-standard extension to use that.

Hmm. Or v1.2 has virtual mailboxes - you could create a single virtual
mailbox from all your other mailboxes and then search it. I think if
Squat is enabled it'll create a single index from all the mails. I'm not
sure if I want to leave it like that though..

I have also been thinking about making Squat indexes global for all
mailboxes. If done well it should reduce disk space as well as enable
fast multi-mailbox searches, but I'm a bit worried about memory usage
and other slowness when updating the index. The Squat building/updating
could use more work, but I haven't yet figured out a great solution for
it.

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Search over big folder hierarchy (was: Global FTS index? )

Patrick Nagel-2
On Tuesday 24 June 2008, Timo Sirainen wrote:
> Yes, they have to be selected. There isn't any way currently in IMAP to
> search from multiple mailboxes using a single command, so even if
> Dovecot implemented a Squat index that indexed mails from all mailboxes,
> you'd still have to implement a non-standard extension to use that.

I see. That's what I thought. :(

> Hmm. Or v1.2 has virtual mailboxes - you could create a single virtual
> mailbox from all your other mailboxes and then search it. I think if
> Squat is enabled it'll create a single index from all the mails. I'm not
> sure if I want to leave it like that though..

How about making it configurable? I'm sure there are scenarios where it's not
desirable to have an index for each virtual mailbox (which sounds like a very
cool concept, by the way) - but like in my case, it would be a great
workaround :)

> I have also been thinking about making Squat indexes global for all
> mailboxes. If done well it should reduce disk space as well as enable
> fast multi-mailbox searches, but I'm a bit worried about memory usage
> and other slowness when updating the index. The Squat building/updating
> could use more work, but I haven't yet figured out a great solution for
> it.

I'm not sure if it would reduce disk space usage... I'm thinking of the
following:

Now (fictitious, don't know how dovecot.index.search really looks like):

mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder/dovecot.index.search:
INDEX UID
word 12345
ord 12345
rd 12345
d 12345
...

Then (of course also fictitious):

dovecot.global.index.search:
INDEX MAILBOX UID
word mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345
ord mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345
rd mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345
d mailbox1.in.a.subfolder.of.a.subfolder.of.a.subfolder 12345
...

Of course this would be very compressible, but in an uncompressed form it
would probably be much bigger than now all dovecot.index.search files
together. This would cause the need for mailbox UIDs, so that the path names
only need to be stored in a map once... or something along those lines.

Anyway, I think improved (= faster) search capabilities are a huge plus for an
IMAP server, because the possibility to search in old mails is what makes
people keep their mails (available, on the server) in the first place...

Patrick.

--
STAR Software (Shanghai) Co., Ltd.            http://www.star-group.net/
Phone:    +86 (21) 3462 7688 x 826             Fax:   +86 (21) 3462 7779

PGP key:         https://stshacom1.star-china.net/keys/patrick_nagel.asc
Fingerprint:           E09A D65E 855F B334 E5C3 5386 EF23 20FC E883 A005

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Global FTS index?

Jason Fesler
In reply to this post by Patrick Nagel-2
> Is my assumption correct, that there is no way to do a search over a big IMAP
> folder hierarchy in a reasonable amount of time, because each folder has to
> be 'selected', and only one folder can be selected at once?

This isn't 100% what you're looking for, but.. consider looking at
"mairix".   It does do full text search of mbox and maildir;  it is a cli.
Give it a search, and it populates a mail folder (using hardlinks if
possible) with the results.

I use it when I want to search my archives; the speed makes up for it not
being native to the imap client.

http://www.rc0.org.uk/mairix/

Reply | Threaded
Open this post in threaded view
|

Re: Search over big folder hierarchy (was: Global FTS index?)

Asheesh Laroia
In reply to this post by Timo Sirainen
On Tue, 24 Jun 2008, Timo Sirainen wrote:

> Hmm. Or v1.2 has virtual mailboxes - you could create a single virtual
> mailbox from all your other mailboxes and then search it. I think if
> Squat is enabled it'll create a single index from all the mails. I'm not
> sure if I want to leave it like that though..

I hope that the index is shared - that you "index the index" by inode
number, not filename or message UID in a mailbox, since that way you can
avoid duplicate storage of index data between virtual mailboxes and normal
ones.

> I have also been thinking about making Squat indexes global for all
> mailboxes. If done well it should reduce disk space as well as enable
> fast multi-mailbox searches, but I'm a bit worried about memory usage
> and other slowness when updating the index. The Squat building/updating
> could use more work, but I haven't yet figured out a great solution for
> it.

Well, I think it would be okay - deploy it and we'll all tell you. (-;

(As always, thanks for this amazing software.)

-- Asheesh.

--
The wonderful thing about a dancing bear is not how well he dances,
but that he dances at all.