Done uploading. I'll host the backup for 24 hours. I encourage everyone to download copies. I've included a zip of the backup in the main folder.
I skipped all of the meta data scans I normally do and removed all my auxiliary scripts from the file, but kept the main script I use to gather the data from a sub. It will not work out the box as anyone who wants to use it needs to set up oauth2 on their own, I removed that section of the script as it is the oauth2 info for my main account.
There were also a few scans I couldn't do such as overlapping subreddit users who also participate in since the sub was banned less than 5 minutes after I completed the main scan. I BARELY made it.
I need to work out a few kinks in part of the script to get a complete user list, but that info is available in the database. When it's fixed it's a complete list of every user who has ever interacted with the sub sorted by most active to least. Might be useful to mass PM users to tell them about the new site, but that also might piss Reddit off, so I will not be including it for public consumption. Don't want to contribute to any witch hunts.
Here is the main folder:
http://gigabytegenocide.com/pizzagate/
Here is where the HTML versions are:
http://gigabytegenocide.com/pizzagate/redmash/
Example one of the files that is sorted by date:
http://gigabytegenocide.com/pizzagate/redmash/pizzagate_date.html
The links to the comments on the list lead to the HTML folder and are threaded in the final order they were on the sub based on comment score:
http://gigabytegenocide.com/pizzagate/redmash/html/t3_5brme2.html
Sorry about the crappy formatting. Something went wrong and I am running out of time to trouble shoot it. There is just a bunch of ''' at the top of the page.
The RAW. DB file can he found in the database folder and if anyone knows their way around those types of files you can extract all the info I gathered.
If anyone has webdev skills and wants to contribute I've been wanting to clean up the formatting, add search function, multiple pages, and more of a Reddit looking design. I used some copy/paste JS to make the list sortable and some super basic CSS. But it works.
Any questions let me know. Thanks for all the praise and appreciation. I do this as a hobby and lucky enough I already had the tools ready to go in time go get the sub before it was gone.
I'm not a participate in your sub, but truly hope this help you guys.
Pisstubes0351 ago
SIAP: Those gigabyte sites are 404'd now.
nathanwblair ago
@erktheerk the links are all dead - return 404's - where's the torrent?
poxlox ago
ALL LINKS ARE 404ING, WAT DO?
erktheerk ago
Can you share the code and any related files? I would love to integrate that into my script and auto generate the website for every scan I do. Would save me a ton of work and the multiple pages will fix the issues I have with large subs breaking trying to load tens of thousands of entries at once.
Would like to okay with it and add threaded comments too
5PY_HUN73R ago
Just signaling that I've downloaded the and backedup the DB. I'm also a bit curious about your python script in there as well. Can you explain a little how to use it?
erktheerk ago
Man. That's a lot nicer than I've come up with. Maybe use bootstrap so it's responsive? It doesn't scale well on mobile and won't let me zoom. But that looms great.
Anson ago
What tool did you use to copy this stuff?
erktheerk ago
The script is included. It's called timesearch.py
erktheerk ago
Nice. I would definitely be interested in using your work in the future, especially if you add search and pages
OldShepThePirate ago
Saved. Just in case.
larry_b ago
Please forgive my ignorance, but since I'm a bit illiterate in this and I'd like to help, I have a couple of questions:
Which file should I download?
If something happened (e.g., /v/pizzagate dies or whatever), what could/should I do with the file I downloaded?
Thank you!
erktheerk ago
Pizzagate.zip
I have no experience with voat or the community.
erktheerk ago
If you look in the Redmash folder and view the HTML file it will have a link that says "comments". This will take you to the threaded view of that post.
Joseph-Guillotin ago
/u/erktheerk Thank you very much for doing this! We'll never know the extent, but you may have saved many children lives. One question: are you sure you crawled everything? It seems like we would have had a bigger db than 16M altogether. Did you backup all threads/comments that ever appeared on that sub, or only within a specific timestamp interval, or vote counts? Thanks for the info.
erktheerk ago
All of it. It's all essentially just .txt file so 16mb is about right for a little over 2000 posts and comments. About half of them only had 1-3 comments.
Retainedbylucifer ago
Shameless request for upvotes to spead the voting power
erktheerk ago
What are you trying to do? Have you looked at the timesearch.py script?
EDIT: Nevermind I just noticed you already made a comment on it.
If you PM me I might be able to help or at least ask the dev who wrote it for me. It's been a growing project for several years now and has gotten more and more complex. Even I need assistance most of the time when things go wrong.
Stellaris ago
For when he takes it down, here is upload mirrors: http://multimirrorupload.com/nniqzecm9513/pizzagate.zip http://www.mirrorupload.net/file/0WT2RCIL/#!pizzagate.zip
RebelSkum ago
absolutely painful to download, but some still work
erktheerk ago
Thank you.
erktheerk ago
What do you mean can not save them?
zkqv5djz6rfifreq ago
Please release a torrent. If you need more people to have it.
erktheerk ago
Another user created one but is stalled. I will tomorrow if it does not complete, but I will delete it from my server either way. I do not want to be responsible for hosting this. Just doing this because I could and was at the right place at the right now.
Isis_Pizza ago
Good stuff, everybody keep this on hand. The more people that have it, the harder it will be to erase from the world.
erktheerk ago
That's exactly where it is hosted now. I am putting a limit on it because I have not personally vetted the info. I do not want to be responsible for hosting it permanently.
erktheerk ago
Awesome. Can you link me to that when you do. Here or on Reddit. Same username.
AugmentedDragon ago
You, sir, are a goddamn hero. While I remain hesitant about this whole pizzagate thing, the fact that the subreddit and a bunch of others just got banned like that is definitely bad for free speech in general. You having backed it all up proves to me that this is information that someone doesn't want us to see and that we need to keep it out there for the sake of everyone involved
redsunfex ago
2,294 posts, correct? Excellent job!
erktheerk ago
Yes. Or view the HTML files for the next 24 hours before I remove it.
erktheerk ago
Appriciate it.
SelfReferenceParadox ago
Gonna archive this, maybe make a torrent.
erktheerk ago
I have two seedboxes. Send me the magnet link if you do. I'll seed it.
SelfReferenceParadox ago
Deal.
erktheerk ago
Huh...let me check the zip. Worked for me earlier.
Also I am only a hobbyist in webdev. It's hacked together. Any help is gladly accepted.
erktheerk ago
I know. It's a VPS paid for with bitcoin. If you actually see data please let me know but I do not think there is anything directly connected to my IRL info.
PerusingTheOoze ago
Bitcoin is used on the Deep DarkWeb for illegal purchases. Brock Pierce ( who was a business associate of Marc Collins-Rector and Bryan Singer) is on the board of directors for BitCoin.
WorldPeace ago
Do you all know the guy who invented Bitcoin is involved? https://youtu.be/Gi6ryNOg8z0
Millennial_Falcon ago
I tried to open a link on the date-sorted page, and it went to reddit "forbidden" page
erktheerk ago
Means it was a self post. You have to click on the comment link too see those.
BoycotReddit ago
Excellent job!!
P2 ago
Thanks. It was genius. I would like to see SPEZ's face and whoever paid the money
playzfahdayz ago
Epic, well done Sir.
MichaelWesten ago
Here is an archive copy that will run offline: https://voat.co/v/pizzagate/1428191
erktheerk ago
The zip I have on the page will also run offline. But thanks for sharing. I am only hosting it for a set time.
MichaelWesten ago
This version I linked preserves formatting. Not sure if that's important to folks, but hey.
erktheerk ago
Mine does too. Can sort in multiple ways. I have collected all comments as well in the orginial threaded format sorted by karma.
ThrowThisAwayPi ago
Wow! Stellar job! Thank you very VERY much!
justice4all ago
Thank you. We all appreciate your hard work.