| Joel H. Simplex ( @ 2008-05-12 22:59:00 |
After having fetched all of my gmail messages via POP3 into an mbox file, I decided to check to make sure that the number of email messages received matches the number in the mbox file.
So I ran this command to count the number of messages:
I have about 11,800 messages in my inbox, according to gmail, so something is obviously wrong. Either the command I used is wrong and/or something about the way I downloaded is wrong. What is wrong?
Also, the mbox file is 186.6 MiB, while gmail says that only 165MB of my account is being used. What's up with that?
I used the getmail program to get the email in many batches (gmail only allows ~1k to be fetched at a time), if that matters.
ETA[0]: What's a good way to either edit the mbox file or extract just certain messages from it that meet certain criteria? Specifically, I want just the messages within a certain time frame. I can write my own script, but would prefer something simpler.
ETA[1]: I tried grepmail and it gave discouraging results:
This could mean something regarding how getmail has fetched emails. Also, I'm a bit concerned that I can't get consistent counts from anything.
ETA[2]: Duplicates filtered:
$ mboxgrep.exe -c -nd . gmail-archive/gmail-backup.mbox
48807
So I ran this command to count the number of messages:
$ egrep -c "^From \w+@\w+\.\w+ " gmail-archive/gmail-backup.mbox
48858I have about 11,800 messages in my inbox, according to gmail, so something is obviously wrong. Either the command I used is wrong and/or something about the way I downloaded is wrong. What is wrong?
Also, the mbox file is 186.6 MiB, while gmail says that only 165MB of my account is being used. What's up with that?
I used the getmail program to get the email in many batches (gmail only allows ~1k to be fetched at a time), if that matters.
ETA[0]: What's a good way to either edit the mbox file or extract just certain messages from it that meet certain criteria? Specifically, I want just the messages within a certain time frame. I can write my own script, but would prefer something simpler.
ETA[1]: I tried grepmail and it gave discouraging results:
$ grepmail . -r gmail-archive/gmail-backup.mbox
gmail-archive/gmail-backup.mbox: 50205
$ mboxgrep.exe -c . gmail-archive/gmail-backup.mbox
51601This could mean something regarding how getmail has fetched emails. Also, I'm a bit concerned that I can't get consistent counts from anything.
ETA[2]: Duplicates filtered:
$ mboxgrep.exe -c -nd . gmail-archive/gmail-backup.mbox
48807