Hey all! I wanted to help out and felt the best way to do so would be technically. If you're a L*MP developer you'll dig this...
I've created a few command line utilities in PHP (so they should work in Windoze) that will import email dump files into an Elastic Stack. Elastic Stack is a really cool and free indexing/search engine (Elasticsearch), with a built-in visualization tool (Kibana). If you don't understand this paragraph, turn back now.
The full source code is posted on GitHub: https://github.com/ChaoticWave/LeakyThoughts
I'm working on making it a one-step install and run thing. As it stands you need to have a PHP runtime environment and an available Elastic Stack. I was thinking about making a Vagrant box with the install. But purists, like myself, would prefer to build their own. Anyway, feel free to contribute to the project, or make new requests through the Issues tab on GitHub.
You don't have to be an ubergeek to work with the data once it's been imported. The Kibana tool has a ton of built-in features to visualize the data. That, however, is not one of my strong points. So if someone who can think of a good way to visualize the data, I'd love to hear it.
The best thing I could think of was a top 500 word cloud: http://imgur.com/a/p64iQ (mirror: http://archive.is/673Rq))
Last thing, the tools are not specific to any one particular dump. That is to say, any leaked or personal email files can be used as long as they are in that standard mailbox format. The README has a link to a suggested data source.
Cheers.
FoxMcCloud11 ago
Have you see this project by MIT?
https://clinton.media.mit.edu/
chaoticWave ago
I just checked that out. Pretty cool. What I built isn't anything like that really. More for text analysis.
I did notice, however, that the email [email protected] is curiously not present in that MIT database; yet shows in WikiLeaks searches. Hmmm...
chaoticWave ago
Oh wow, way to go MIT! Now I feel completely worthless... lol
Thanks though, that is really cool.