remove duplicate emails

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

remove duplicate emails

Martin Klaffenboeck
Hi there!

Duplicate Emails are a result of multiple (accidently) downloading an
email from a pop server, (inaccidently) if someone sends the mail to two
or more addresses from my own which will be collected into one folder,
or if someone presses 'apply to all' when answering to one of my mailing
list posts.

How can I get rid of this duplicate messages?

Balsa has a 'remove duplicates' menu entry which workes fine.  But is
there a possibility with evolution?

Thanks,
Martin

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Chenthill-2
We get a lot of duplicate emails when subscribed to many mailing list
too. There is a feature request filed for it already
http://bugzilla.gnome.org/show_bug.cgi?id=253244 . It would be a really
good to have this. Partha ?

thanks, Chenthill.

On Thu, 2005-12-01 at 12:40 +0100, Martin Klaffenboeck wrote:

> Hi there!
>
> Duplicate Emails are a result of multiple (accidently) downloading an
> email from a pop server, (inaccidently) if someone sends the mail to two
> or more addresses from my own which will be collected into one folder,
> or if someone presses 'apply to all' when answering to one of my mailing
> list posts.
>
> How can I get rid of this duplicate messages?
>
> Balsa has a 'remove duplicates' menu entry which workes fine.  But is
> there a possibility with evolution?
>
> Thanks,
> Martin
> _______________________________________________
> Evolution-list mailing list
> [hidden email]
> http://mail.gnome.org/mailman/listinfo/evolution-list
_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Martin Klaffenboeck
Oh, the second entry on this site is from myselfe.  I asked for this one
time...

thanks,
Martin

Am Donnerstag, den 01.12.2005, 18:13 +0530 schrieb chen:

> We get a lot of duplicate emails when subscribed to many mailing list
> too. There is a feature request filed for it already
> http://bugzilla.gnome.org/show_bug.cgi?id=253244 . It would be a really
> good to have this. Partha ?
>
> thanks, Chenthill.
>
> On Thu, 2005-12-01 at 12:40 +0100, Martin Klaffenboeck wrote:
> > Hi there!
> >
> > Duplicate Emails are a result of multiple (accidently) downloading an
> > email from a pop server, (inaccidently) if someone sends the mail to two
> > or more addresses from my own which will be collected into one folder,
> > or if someone presses 'apply to all' when answering to one of my mailing
> > list posts.
> >
> > How can I get rid of this duplicate messages?
> >
> > Balsa has a 'remove duplicates' menu entry which workes fine.  But is
> > there a possibility with evolution?
> >
> > Thanks,
> > Martin
> > _______________________________________________
> > Evolution-list mailing list
> > [hidden email]
> > http://mail.gnome.org/mailman/listinfo/evolution-list
> _______________________________________________
> Evolution-list mailing list
> [hidden email]
> http://mail.gnome.org/mailman/listinfo/evolution-list
>

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Graham Campbell
In reply to this post by Martin Klaffenboeck
On Thu, 2005-12-01 at 12:40 +0100, Martin Klaffenboeck wrote:

> Hi there!
>
> Duplicate Emails are a result of multiple (accidently) downloading an
> email from a pop server, (inaccidently) if someone sends the mail to two
> or more addresses from my own which will be collected into one folder,
> or if someone presses 'apply to all' when answering to one of my mailing
> list posts.
>
> How can I get rid of this duplicate messages?
>
Note that there is an ambiguity in how you define duplicate messages.
Multiple downloading from pop server results in true duplicates, but the
"reply to all" generally does not (the headers will have minor
differences).

Here is a perl script for removing duplicates (defined in yet a third
way). It requires the Mail::Utils package from cpan. Watch out for
folded lines. Find the relevant folders in .evolution, stop evolution,
run it and let evolution fix up the indexes, etc. the next time you open
it.

#!/usr/bin/perl -w
# Takes one argument, the name of a mailbox file (in standard unix mailbox
# format) It eliminates duplicate messages from this file by reading the file,
# then writing only one copy of the messages back to the file.
# It defines duplicate messages as those having identical "From\b" lines,
# which must be the first line of the message
use Mail::Util qw(read_mbox);
use Mail::Util qw(maildomain);

$file = $ARGV[0];

$| = 1;
my %uniques;
print "Working on file $file ";
@ans = read_mbox($file);
$cnt = $#ans + 1;
print "$cnt messages\n";
next if $cnt == 0;
foreach $a1 (@ans) {
    @nextmsg = @$a1;
# print "$#nextmsg lines in message\n";
# print "First line\n$nextmsg[0]";
    $uniques{$nextmsg[0]} = $a1;
    print "Malformed first line:\n$nextmsg[0]" unless $nextmsg[0] =~ /^From /;
}
@k = values %uniques;
$ucnt = $#k + 1;
print "$ucnt unique values\n";
open OUTF, ">$file" or die;
select OUTF;
foreach $mref (@k) {
    my @msg = @$mref;
    while (defined ($line = shift @msg) ) {
        print "$line";
    }
    print "\n";
}
select STDOUT;
#print "$cnt messages\n";
exit;



--
Graham Campbell <[hidden email]>

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Martin Klaffenboeck
Am Donnerstag, den 01.12.2005, 09:32 -0500 schrieb Graham Campbell:
>  
> Note that there is an ambiguity in how you define duplicate messages.
> Multiple downloading from pop server results in true duplicates, but the
> "reply to all" generally does not (the headers will have minor
> differences).

Thanks for that.  I didn't tell you that I use IMAP and the duplicates
have to deleted over there.  But indeed, there is a copy of the mails,
because I use evolution often in offline mode (on my notebook), and have
all my emails here.  Do you think the script will work there too?

Do you think the two emails 'reply to all' have different Message-ID's?
I think every email has an Message-ID.  Why not just delete duplicates
by Message-ID?  I think also to mailing lists, the message id is the
same.

Martin

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Lee Revell
In reply to this post by Graham Campbell
On Thu, 2005-12-01 at 09:32 -0500, Graham Campbell wrote:

> On Thu, 2005-12-01 at 12:40 +0100, Martin Klaffenboeck wrote:
> > Hi there!
> >
> > Duplicate Emails are a result of multiple (accidently) downloading an
> > email from a pop server, (inaccidently) if someone sends the mail to two
> > or more addresses from my own which will be collected into one folder,
> > or if someone presses 'apply to all' when answering to one of my mailing
> > list posts.
> >
> > How can I get rid of this duplicate messages?
> >
> Note that there is an ambiguity in how you define duplicate messages.
> Multiple downloading from pop server results in true duplicates, but the
> "reply to all" generally does not (the headers will have minor
> differences).
>
> Here is a perl script for removing duplicates (defined in yet a third
> way).

Perl scripts are all well and good but I thinh the original poster was
asking why there isn't a way inside Evolution to do this.  Mutt can do
it.

If you use Evo over dialup or other unreliable connection then multiple
downloading from the POP server is a real problem, often if Evo is
interrupted or hangs diring download and has to be killed it will just
download all the messages again.

Lee

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Karsten Bräckelmann-2
In reply to this post by Martin Klaffenboeck

> > Note that there is an ambiguity in how you define duplicate messages.
> > Multiple downloading from pop server results in true duplicates, but the
> > "reply to all" generally does not (the headers will have minor
> > differences).
>
> Thanks for that.  I didn't tell you that I use IMAP and the duplicates
> have to deleted over there.  But indeed, there is a copy of the mails,
> because I use evolution often in offline mode (on my notebook), and have
> all my emails here.  Do you think the script will work there too?
>
> Do you think the two emails 'reply to all' have different Message-ID's?

No. "The headers will have minor differences" does not refer to the
Message-Id, but to the Received headers at a minimum.


> I think every email has an Message-ID.  Why not just delete duplicates
> by Message-ID?  I think also to mailing lists, the message id is the
> same.

Yes, they will be the same when Replying to All. Anyway, Message-Id's
are not *guaranteed* to be unique. Although, granted, identical
Message-Id's for different mails are very rare. Yes, we had this
discussion pretty often in the past...

...guenther


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Lee Revell
On Sat, 2005-12-10 at 21:30 +0100, guenther wrote:

> > > Note that there is an ambiguity in how you define duplicate messages.
> > > Multiple downloading from pop server results in true duplicates, but the
> > > "reply to all" generally does not (the headers will have minor
> > > differences).
> >
> > Thanks for that.  I didn't tell you that I use IMAP and the duplicates
> > have to deleted over there.  But indeed, there is a copy of the mails,
> > because I use evolution often in offline mode (on my notebook), and have
> > all my emails here.  Do you think the script will work there too?
> >
> > Do you think the two emails 'reply to all' have different Message-ID's?
>
> No. "The headers will have minor differences" does not refer to the
> Message-Id, but to the Received headers at a minimum.
>
>
> > I think every email has an Message-ID.  Why not just delete duplicates
> > by Message-ID?  I think also to mailing lists, the message id is the
> > same.
>
> Yes, they will be the same when Replying to All. Anyway, Message-Id's
> are not *guaranteed* to be unique. Although, granted, identical
> Message-Id's for different mails are very rare. Yes, we had this
> discussion pretty often in the past...

Because we don't WANT a mailing list message that's also cc'ed directly
to the recipients to be considered a duplicate mail.  This is the
desired behavior on most linux development mailing lists like LKML where
most recipients procmail the list mail into a separate folder, so you
get messages CC'ed to you directly in your inbox, but the list folder
preserves the threading.

Personally I think we should just copy Mutt's implementation, I've never
had it do the wrong thing.

Lee

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Karsten Bräckelmann-2

> > > I think every email has an Message-ID.  Why not just delete duplicates
> > > by Message-ID?  I think also to mailing lists, the message id is the
> > > same.
> >
> > Yes, they will be the same when Replying to All. Anyway, Message-Id's
> > are not *guaranteed* to be unique. Although, granted, identical
> > Message-Id's for different mails are very rare. Yes, we had this
> > discussion pretty often in the past...
>
> Because we don't WANT a mailing list message that's also cc'ed directly
> to the recipients to be considered a duplicate mail.  This is the
> desired behavior on most linux development mailing lists like LKML where
> most recipients procmail the list mail into a separate folder, so you
> get messages CC'ed to you directly in your inbox, but the list folder
> preserves the threading.

Using the proper Mailing List Filters in Evo will result in the same
behavior. :)

> Personally I think we should just copy Mutt's implementation, I've never
> had it do the wrong thing.

Hmm, I don't know how exactly Mutt manages to do this, but...

Did you just say "volunteer"? ;-))

...guenther


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Lee Revell
On Sat, 2005-12-10 at 22:31 +0100, guenther wrote:

> > > > I think every email has an Message-ID.  Why not just delete duplicates
> > > > by Message-ID?  I think also to mailing lists, the message id is the
> > > > same.
> > >
> > > Yes, they will be the same when Replying to All. Anyway, Message-Id's
> > > are not *guaranteed* to be unique. Although, granted, identical
> > > Message-Id's for different mails are very rare. Yes, we had this
> > > discussion pretty often in the past...
> >
> > Because we don't WANT a mailing list message that's also cc'ed directly
> > to the recipients to be considered a duplicate mail.  This is the
> > desired behavior on most linux development mailing lists like LKML where
> > most recipients procmail the list mail into a separate folder, so you
> > get messages CC'ed to you directly in your inbox, but the list folder
> > preserves the threading.
>
> Using the proper Mailing List Filters in Evo will result in the same
> behavior. :)
>

I know, that's what I do.  I am just saying that preferring Reply-To-All
vs. stripping CC's is a convention that varies from mailing list to
mailing list so we need to make sure any "remove dupes" implementation
doesn't just blithely remove dupes by message ID, but rather can tell
that two messages with identical bodies, one which came directly from
the sender and one from the list, are not dupes.

> > Personally I think we should just copy Mutt's implementation, I've never
> > had it do the wrong thing.
>
> Hmm, I don't know how exactly Mutt manages to do this, but...
>
> Did you just say "volunteer"? ;-))

No I don't have the bandwidth, I was just sugesting that there's already
an IMHO ideal implementation out there so whoever implements this
doesn't reinvent the wheel (poorly).

Lee

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Karsten Bräckelmann-2
On Sat, 2005-12-10 at 16:50 -0500, Lee Revell wrote:

> On Sat, 2005-12-10 at 22:31 +0100, guenther wrote:
> > > > > I think every email has an Message-ID.  Why not just delete duplicates
> > > > > by Message-ID?  I think also to mailing lists, the message id is the
> > > > > same.
> > > >
> > > > Yes, they will be the same when Replying to All. Anyway, Message-Id's
> > > > are not *guaranteed* to be unique. Although, granted, identical
> > > > Message-Id's for different mails are very rare. Yes, we had this
> > > > discussion pretty often in the past...
> > >
> > > Because we don't WANT a mailing list message that's also cc'ed directly
> > > to the recipients to be considered a duplicate mail.  This is the
> > > desired behavior on most linux development mailing lists like LKML where
> > > most recipients procmail the list mail into a separate folder, so you
> > > get messages CC'ed to you directly in your inbox, but the list folder
> > > preserves the threading.
> >
> > Using the proper Mailing List Filters in Evo will result in the same
> > behavior. :)
>
> I know, that's what I do.  I am just saying that preferring Reply-To-All
> vs. stripping CC's is a convention that varies from mailing list to
> mailing list so we need to make sure any "remove dupes" implementation
> doesn't just blithely remove dupes by message ID, but rather can tell
> that two messages with identical bodies, one which came directly from
> the sender and one from the list, are not dupes.

Yes, I understood this the first time, and I totally agree with you. :)


> > > Personally I think we should just copy Mutt's implementation, I've never
> > > had it do the wrong thing.
> >
> > Hmm, I don't know how exactly Mutt manages to do this, but...
> >
> > Did you just say "volunteer"? ;-))
>
> No I don't have the bandwidth, I was just sugesting that there's already
> an IMHO ideal implementation out there so whoever implements this
> doesn't reinvent the wheel (poorly).

Too bad. ;-)

Hints like this preferably should be added to the proper bug report in
bugzilla, though. Posts to the list tend to be forgotten...

...guenther


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Lee Revell
On Sat, 2005-12-10 at 23:17 +0100, guenther wrote:
> Hints like this preferably should be added to the proper bug report in
> bugzilla, though. Posts to the list tend to be forgotten...

Added to bug 253244.

Lee

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Lee Revell
In reply to this post by Karsten Bräckelmann-2
On Sat, 2005-12-10 at 23:17 +0100, guenther wrote:
> Hints like this preferably should be added to the proper bug report in
> bugzilla, though. Posts to the list tend to be forgotten...

Do you know if there's a bug open for "evolution sometimes downloads
dupes from the POP server" already?  It should be set as related to this
one.  ;-)

Lee

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Andre Klapper
hi lee,

Am Samstag, den 10.12.2005, 17:27 -0500 schrieb Lee Revell:
> Do you know if there's a bug open for "evolution sometimes downloads
> dupes from the POP server" already?  It should be set as related to this
> one.  ;-)

must of them should be NEEDINFO or NOTABUG, e.g.
http://bugzilla.gnome.org/show_bug.cgi?id=317599

cheers,
andre

--
 mailto:[hidden email] | failed!
 http://www.iomc.de

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Lee Revell
On Sat, 2005-12-10 at 23:35 +0100, Andre Klapper wrote:

> hi lee,
>
> Am Samstag, den 10.12.2005, 17:27 -0500 schrieb Lee Revell:
> > Do you know if there's a bug open for "evolution sometimes downloads
> > dupes from the POP server" already?  It should be set as related to this
> > one.  ;-)
>
> must of them should be NEEDINFO or NOTABUG, e.g.
> http://bugzilla.gnome.org/show_bug.cgi?id=317599
>

Yeah I can't reproduce it 100%, it seems to be timing sensitive, and the
more unreliable the network connection the more of a problem it is (it
happens once a week on dialup).  It seems that if Evo is disconnected at
the exact wrong time then some messages will be downloaded again.

Unfortunately this is useless as a bug report...

Lee

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate emails

Karsten Bräckelmann-2

> > > Do you know if there's a bug open for "evolution sometimes downloads
> > > dupes from the POP server" already?  It should be set as related to this
> > > one.  ;-)
> >
> > must of them should be NEEDINFO or NOTABUG, e.g.
> > http://bugzilla.gnome.org/show_bug.cgi?id=317599
>
> Yeah I can't reproduce it 100%, it seems to be timing sensitive, and the
> more unreliable the network connection the more of a problem it is (it
> happens once a week on dialup).  It seems that if Evo is disconnected at
> the exact wrong time then some messages will be downloaded again.

Well, there are two different kinds of such reports.

* Broken server. Some POP3 servers seem to alter some headers every time
  a client fetches the mails "left on the POP3 account". Probably hard
  to work around this, as it is  a) abusing the POP3 protocol and  b) a
  broken server anyway.

* Bad line. Dropping connections resulting in the same message being
  downloaded twice even without "leave on server". Reproducing this
  should be easy.


> Unfortunately this is useless as a bug report...

I don't think so. Your description above offers sufficient information
to debug this. :-)

Steps to reproduce:

* Send a couple of *large* mails to your POP3 account. Large means, that
  depending on your line speed downloading any such mail takes at least
  several seconds.
* Fetch the mails using Evo. Watch the fetching progress.
* Kick the connection, *before* Evo downloaded the last message.
  pull the network plug / ifdown the interface / kill the server / etc
* Restore the connection and fetch again. Get duplicates.


This should be sufficient to reliably reproduce this. And it should not
be necessary for me to explain this to any developer. If a bug with a
bad line involved is closed cause the developer can't reproduce, he most
likely did not try hard enough.

...guenther


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

_______________________________________________
Evolution-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/evolution-list