Hello people. I am the person behind the github repos that originally were on reddit collab tools and now are on the sidebar that have been getting a lot of downloads/views. I was running into an issue with saving voat posts on this sub-verse since there is so many coming in daily. I noticed as a community we fell behind on archiving every post to archive.is. My guess is we were only archiving about 30% of them. The top posts were getting archived but a few diamond in the rough post that fell through were not. Another big issue I ran into when trying to retrieve older post was the voat admins disabled pagination recently past page 19. There is a lot of talk about it on /v/voatdev/ and it may get restored. The API is also not ready for production use so I was not able to get a key. I am also working with one of the people on /v/voatdev/ to get a full backup of the older post so that way we will for sure have 100% of the data backed up all over the world and to multiple sites.
The bot will go through page 1-19 on new every day on a cron job and make a folder of that day. it will then push to the git repos once done. Every HTML page will be downloaded with wget and saved as the post ID in the posts folder for that day. There is also a file called ids.txt in every day folder that will have the unique post ids. The post will also be automatically archived at archive.is through a POST request.
One thing I discovered last week about http://archive.is/https://voat.co/v/pizzagate/* is that they also have pagination issues. If someone could send an email about this issue to [email protected] I would really appreciate it. Make sure to post below you sent an email so the person does not receive multiple. We should request to be able to view all of them and that 950-1000 is not enough. The good thing though is they are archived even though they are not in the pagination ( I checked with a few older posts ). As long as we have all the post ids we can easily back track. I am going to try and create a master-post-ids.txt file in the main folder in the repo that will have every post ID ever on here. I brought this up just so you are all aware.
NOTE: PLEASE STILL USE ARCHIVE.IS THOUGH BECAUSE WE NEED TO BACK UP POSTS WITH MULTIPLE SCREENSHOTS BECAUSE PEOPLE ADD COMMENTS, DELETE COMMENTS ETC. THE BOT WON'T BE ABLE TO GET THE NEWEST ACTIVITY SO PLEASE KEEP ARCHIVING WHEN POST GET COMMENTS ETC. ALSO KEEP SAVING POSTS LOCALLY. DO NOT JUST RELY ON ME AND MY BOT.
Here is the repos: https://github.com/pizzascraper/pizzagate-voat-backup https://gitlab.com/pizzascraper/pizzagate-scraper
TO DO: Need to figure out CSS/JS/IMG assets. Viewing HTML post locally is currently not calling any stylesheets/scripts/images since the urls are not absolute in the html files so it looks pretty plain. This is not critical as it can always be fixed later. What is important is preserving the data. If you have an idea on how to fix this please file an issue or comment here. Also if you have any suggestions or any ideas on how to improve this please let me know. I really appreciate all the help I can get.
Can be cloned:
git clone https://github.com/pizzascraper/pizzagate-voat-backup.git
or
git clone https://gitlab.com/pizzascraper/pizzagate-scraper.git
Non tech users can download by going to https://github.com/pizzascraper/pizzagate-voat-backup/archive/master.zip .
view the rest of the comments →
Normality1 ago
THANK YOU for your work!
gittttttttttttttttt ago
Sure thing. I try to contribute where I can. Thank you! I really appreciate all of you on here doing research/upvoting/downvoting etc. It is hard work.
Mooka_Molaka ago
Wow, Thank You so much for all of your work on this! I wish I knew how to do things like this, or even assist somehow.
I'm going to share something with you & if it's dumb/doesn't apply please let me know & I'll delete it or update it with correct info. So here goes ~
Back in the first week of October in 2014, #GamerGate was about 5-6 weeks old. Much of the same gusto, comradeship & determination for research & evidence was flowing through us like we have here.
I can't remember who it was atm, but he had been putting together a massive amount of research & notes etc. on GitHub to make it that much easier to access info, add new stuff etc. it was great! Until Jake Boxer heard about it. Let me first say that there was NO dox(x) information on it nor ANY types of "attack plans", nor anything encouraging of violence. In fact there was absolutely NOTHING that broke any TOS or site rules, but most SJW's will bend over backwards to do some serious white knighting & scream to their social circles & networks about what wonderful people they are as they fuck over anyone who disagrees with them. They will treat us like "human garbage", target us, label us all the usual #RACIST! #SEXIST! #MISOGYNIST! etc, etc. Whatever it takes to be Top Virtue Signaler of the Week!
Ok, sorry. I didn't mean to write so much & go off about those tools. I guess I'm still burnt & disgusted by them! But anyway, to the important reason that little history lesson re; #GamerGate matters is because OUR ENTIRE GITHUB WAS DELETED WITHOUT WARNING! By a virtue signaling Whiteknight extraordinaire.
https://haegarr.wordpress.com/2014/10/04/github-deletes-repo-because-he-personally-doesnt-like-it/
I would hate to see something like this happen again. I worry you might lose all of the hard work you put into this. I don't have a plan or answer on how to know if it's coming or what to do if it happens, I just saw your post & felt it is important to let you know what has happened before, when certain people aren't happy with your personal opinions & may just trash your work without a care & without warning or a way for you to have a backup made.
I hope I'm wrong & there won't be any issues. But just in case I wanted you & the rest of us who care about #PizzaGate to be aware of the possibility.
💖God Bless You & Thank You 💖 for all of your efforts towards exposing the heinous crimes that are #PizzaGate
Here are a few more write ups about the entire #GG GitHub being deleted; (this isn't my comment permanent-link from KiA, but I felt that just copypasta-ing each of the links would be like taking credit from their post ~ I hope that's ok ^_^).
https://www.reddit.com/r/KotakuInAction/comments/3fq180/github_history_one_tweet_one_lie_and_a_gamergate/ctr8j22/
http://adland.tv/adnews/gamergate-op-deleted-github-official-reply-why/1331743980 http://gamergate.wikia.com/wiki/Github https://gitgud.net/gamergate/gamergateop/tree/master/Current-Happenings#-oct-3rd-friday https://archive.is/uypn2 (Pipedot.org thread) http://facepunch.com/showthread.php?t=1421478&p=46148933&viewfull=1#post46148933 http://theralphretort.com/github-censors-gamergate/ (yea it's Ralph but it's relevant) http://i.imgur.com/DeNiCiO.png (Jake Boxer tweets) http://www.reddit.com/r/KotakuInAction/comments/2i8jzr/the_gamergate_github_was_deleted_due_to_a_github/ https://archive.is/QkyiB http://www.reddit.com/r/KotakuInAction/comments/2i85z1/update_what_just_happened_to_the_github_repository/ http://www.reddit.com/r/KotakuInAction/comments/2i8wra/github_confirms_that_its_deleted_the_gamergate/
Also there's still the case of a alleged Github employee trying to snoop around using the email address of a user who tried to contact them.
https://www.reddit.com/r/KotakuInAction/comments/2i8zwe/after_emailing_github_this_guy_got_his_email/ http://www.twitlonger.com/show/n_1sce3fa https://archive.is/uaaYR another mature github employee https://archive.is/FDDWi
gittttttttttttttttt ago
Thanks for the heads up. I know all about the SJW problem at GitHub. I am just using github because it gets the most traffic and SEO's very well. I have the repo on many other providers so we don't have a single point of failure with github.
Mooka_Molaka ago
Excellent ^_^