Hello people. I am the person behind the github repos that originally were on reddit collab tools and now are on the sidebar that have been getting a lot of downloads/views. I was running into an issue with saving voat posts on this sub-verse since there is so many coming in daily. I noticed as a community we fell behind on archiving every post to archive.is. My guess is we were only archiving about 30% of them. The top posts were getting archived but a few diamond in the rough post that fell through were not. Another big issue I ran into when trying to retrieve older post was the voat admins disabled pagination recently past page 19. There is a lot of talk about it on /v/voatdev/ and it may get restored. The API is also not ready for production use so I was not able to get a key. I am also working with one of the people on /v/voatdev/ to get a full backup of the older post so that way we will for sure have 100% of the data backed up all over the world and to multiple sites.
The bot will go through page 1-19 on new every day on a cron job and make a folder of that day. it will then push to the git repos once done. Every HTML page will be downloaded with wget and saved as the post ID in the posts folder for that day. There is also a file called ids.txt in every day folder that will have the unique post ids. The post will also be automatically archived at archive.is through a POST request.
One thing I discovered last week about http://archive.is/https://voat.co/v/pizzagate/* is that they also have pagination issues. If someone could send an email about this issue to [email protected] I would really appreciate it. Make sure to post below you sent an email so the person does not receive multiple. We should request to be able to view all of them and that 950-1000 is not enough. The good thing though is they are archived even though they are not in the pagination ( I checked with a few older posts ). As long as we have all the post ids we can easily back track. I am going to try and create a master-post-ids.txt file in the main folder in the repo that will have every post ID ever on here. I brought this up just so you are all aware.
NOTE: PLEASE STILL USE ARCHIVE.IS THOUGH BECAUSE WE NEED TO BACK UP POSTS WITH MULTIPLE SCREENSHOTS BECAUSE PEOPLE ADD COMMENTS, DELETE COMMENTS ETC. THE BOT WON'T BE ABLE TO GET THE NEWEST ACTIVITY SO PLEASE KEEP ARCHIVING WHEN POST GET COMMENTS ETC. ALSO KEEP SAVING POSTS LOCALLY. DO NOT JUST RELY ON ME AND MY BOT.
Here is the repos: https://github.com/pizzascraper/pizzagate-voat-backup https://gitlab.com/pizzascraper/pizzagate-scraper
TO DO: Need to figure out CSS/JS/IMG assets. Viewing HTML post locally is currently not calling any stylesheets/scripts/images since the urls are not absolute in the html files so it looks pretty plain. This is not critical as it can always be fixed later. What is important is preserving the data. If you have an idea on how to fix this please file an issue or comment here. Also if you have any suggestions or any ideas on how to improve this please let me know. I really appreciate all the help I can get.
Can be cloned:
git clone https://github.com/pizzascraper/pizzagate-voat-backup.git
or
git clone https://gitlab.com/pizzascraper/pizzagate-scraper.git
Non tech users can download by going to https://github.com/pizzascraper/pizzagate-voat-backup/archive/master.zip .
Pizzafundthrowaway ago
@gittttttttttttttttt @adam_danischewski @pizza_gator @bikergang_accountant @freetibet @totesgoats908234 @Ivegotredditcancer @ParadiseFaIIs @pembo210
Okay, guys. I've tagged everyone who has expressed an interest/ability to contribute to this effort. My offer of $200 bitcoin stands to deliver a solution by consensus. Please work together and propose a solution.
In the meantime, I'm hitting a wall with this bitcoin thing. I've looked at bitcoin.org and CORE. I found the download, but I'm having trouble figuring out how to get my hands on a signed copy of the download for the wallet. I am utterly clueless about cryptography. I followed the guides, but for some reason, my stupid brain can't figure out how to validate the signature of any wallet download. Is there a complete idiot's guide for how to do this? I have googled my ass off, but haven't been able to find a simpleton pleb's guide to get started. Sorry for my ignorance, and thanks in advance for any pointers!
pizza_gator ago
Look into using electrum as your wallet instead of core. i think its more straight forward. plus you don't have to wait for the blockchain to load
Pizzafundthrowaway ago
I got electrum, but buying bitcoin anonymously seems virtually impossible. Every exchange has a ridiculous privacy policy. Is there a way to buy bitcoin anonymously?
pizza_gator ago
localbitcoins.com would be the way to go
bikergang_accountant ago
For the wallet just use electrum. You don't need a full client like core.
Pizzafundthrowaway ago
I got electrum. Now how do I buy bitcoin anonymously? The exchanges I found are asking for too much personal information and demand that I allow them to share my info with just about anyone they want.
bikergang_accountant ago
You can clean coins. Also you can use local bitcoins. They ask for information but it's optional.
The exchanges with your information would produce the lowest fee, then you can clean them in a casino. You just set your risk to almost zero (that's what the casinos are actually for, and they are cryptographically proveably fair).
The other factor is that bitcoin is pretty fucking anonymous. The address the exchange pays you with is recycled over and over and you can set your electrum client to behave more anonymously.
Let's put it this way. A VPN puts your ip into a certain geography. Well the FBI could see that 99% of your traffic is going to that geography and a single ip, and they see traffic increasing and decreasing on your end and they see similar patterns out of the VPN to certain sites. Bitcoin even without cleaning is magnitudes more anonymous than a VPN. You could even pay random amounts into change addresses.
Third option. dash is like bitcoin but cryptographically certain to be anonymous. You could use something like shapeshift or even one of those exchanges to go from bitcoin -> dash -> bitcoin and it will be clean. Or usd -> dash -> bitcoin would save you a transaction. Or you could pay someone in dash.
Pizzafundthrowaway ago
Zip download link is broken
gittttttttttttttttt ago
Hey thanks GitHub suspended the repo. New repo has been created and a 2nd zip url on gitlab has been added to the post. Everything should work now.
Pizzafundthrowaway ago
@kingkongwaswrong @millenial_falcon
Can we get this thread stickied?
gittttttttttttttttt ago
@kingkongwaswrong Side bar would probably be better
Pizzafundthrowaway ago
As a token of gratitude for your hard work, I'm going to give you $50 USD worth of bitcoin.
To support further development and completion of the project, im offering an additional $50 for you (or a team to split).
Send me a PM for how you would like to receive payment.
gittttttttttttttttt ago
Going to DM you. Thanks kindly.
soundbyyte ago
Would there be a way for you to send me the folders for each day and then the older posts when you're able to archive those just so I can skim through and see if anything external needs to be archived, because a lot of the daily posts as you've mentioned, do fall through the cracks. Plus, if more people have the data, then the more people we can share it with and the less likely it is for anyone to try to shut us down.
gittttttttttttttttt ago
Hey all external links will be archived automatically. Just did 1600 super fast from today.
soundbyyte ago
Awesome, that's fantastic to hear!
gittttttttttttttttt ago
A new folder just got added about an hour ago https://github.com/pizzascraper/pizzagate-voat-backup/tree/master/archives/2016-12-12-18:09:01/posts . I will archive the older posts very soon. Are you familiar with git? You could git pull once a day. You could even do it automatically with a cron job like this:
I will try to add an external link archive feature soon. Backing up the data would be great. If you are not familiar with git then I would recommend watching a youtube tutorial or two. All you really need to learn is git clone and git pull.
Thanks for wanting to help!
8_billion_eaters ago
And apparently an upvoat bot, too.
gittttttttttttttttt ago
Lol no I didn't. Post did got a lot of upvotes though.
MAGAphobia ago
Black people could never do this.
VictorSteinerDavion ago
This is awesome work, that you for the valuable effort!
IWishIWasFoxMulder ago
Everything you've just done is going to make it at least 100 times more difficult to take down this site and this subverse. You are weaponized autism at its finest and you inspire me to be a better autist so thank you. This is what web and Silicon Valley people are talking about when they mean redundancy in the truest sense. Is there anyway for you to backup videos? I'm trying to figure out a way we could backup James Alefantis' short film from Sundance that appears to be on Vimeo at the moment.
gittttttttttttttttt ago
Haha thanks :)
The best way to backup videos and I encourage you to do so is to use a command line tool called youtube-dl. There is ways to download many videos by specifying a keyword ( you can check the readme for all the options ). It can also easily download vimeo etc. If you have bandwidth and extra space on your machine then this would be a great idea. Then what I would recommend is finding some smaller video sites with less security/bot checking built in and figure out how to send POST request or create a bot/macro to upload all the videos as mirrors. If this sounds doable to you and you just need a little help to get it going then DM me.
derram ago
https://ghostbin.com/paste/xreqq
That's the stuff on the frontpage, just use this and go through the links until you have everything: https://gitgud.io/derram/arcitall
Make sure to copy the output somewhere between runs 'cause it gets overwritten.
PuttItOut ago
I might be able to help you with archiving tasks. Shoot me an email at hello @ voat and I'll see what I can help with.
gittttttttttttttttt ago
Thanks kindly. Just sent you a DM.
Votescam ago
Don't mean my comment to suggest that you contributed to current system -- which I think was very well intended, but somewhat confusing. Thank you for whatever you've done to improve things here.
A few questions about what is mean exactly by "bots" --- are you saying that there are no humans involved in posting for CTR? They've allegedly got $1 million for their propaganda, but they don't actually want to pay a living person with an actual brain to post for them? Are they as obsessed with money as we think they are?
Why all the gimmickry anyway here in ups and down which you have to get energy for elsewhere ... because someone on the website wanted to encourage people "to post"? People post when the articles are interesting and when they have a valid comment. I don't think you want comments just for the sake of comment. Why would you? Get out of the gimmicky and show the names of people up voting -- when it comes to down votes, insist that it be made with a human/intellectual comment. In other words, steal some good ideas on this from other websites. :)
Julian55 ago
Dude, seriously, thank you.
SaneGoatiSwear ago
ya'll need to meet @derram yo bro pizzagate's made a backup bot and auto-archiver!
y'all should be voat goat buddies!
then some other goat can draw you guys all working hard at investigating and preserving the data well and condensing it into awesome meme magic to bring the truth to the world!
justiceforever ago
YOU DA REAL MVP
Pizzatemp420 ago
Wow, thank you so much for your work!
Edit: Removed my previous comment because after finally getting home and being able to read the entire post, i realized what the bot's limitations were, and my post became completely irrelevant.
hedy ago
This is amazing work. Highly appreciated.
wecanhelp ago
Thank you so much for your work, this project is a huge relief for the community.
As for the assets: Voat seems to be using relative URLs. So when you archive a given page, could you parse (or grep) the HTML for <script /> and <link rel="stylesheet" /> tags pointing to the static assets requested by the page, and make an up-to-date copy of those assets every day, maintaining the folder structure as found in the src/href attributes? That way, when opening one of the .html files locally, the browser would look up the appropriate local copies of the scripts and stylesheets, and load them.
I'm sure this is overly simplistic, and problems will arise as you go, specifically with assets that are loaded on the fly. But do you see a problem with the initial logic?
Edit: Have you tried
wget -r -nc https://voat.co/v/pizzagate/new?page={1..19}
? Theoretically, this should download a page recursively (with all its assets), and prevent wget from overwriting a file if it already exists. Now, there seems to be a question as to whether wget will actually prevent the superfluous HTTP request from happening if a given file already exists, or it will carry out the request nonetheless and simply not do anything with the downloaded file if it is a duplicate. The latter behavior would, of course, result in a lot of unwanted traffic, but if the former is the case then this could be a good starting point.gittttttttttttttttt ago
Thanks for the pointers. Will give this a go in a little and test it out.
Sonic_fan1 ago
If anyone wants to try this on their end, another interesting one to try is WebHTTrack... it can be set to recursively follow links, it'll handle CSS and all that (at least, it used to... haven't used it for a while), and can be set to follow links however many levels deep you want. I've used it to fully download a friends website, and it'll even change links from absolute (http://me.com/img/1.jpg)) to relative to the folder structure... folder name/img/1.jpg), and it'll give you a browsable site. But, what you have now is awesome! If you don't have to change it, don't. Everyone is right, having any sort of complete archive of all this is the biggest thing, even if someone who looks at it has to wade through a little HTML. Thumbs up
Also, don't know if it's possible... maybe have the bot just sit and monitor the site for anytime something changes, because if the site has anything after page 19 disabled, and we have a busy day around here, important stuff might get bumped off by the other newer stuff. Maybe have the bot compare the front page of 'New' to the last archived page of 'New' (dates, or maybe thread titles) and if there's any difference, have it just slurp down the newest posts or pages or something.
And, this would be a great use of that old DLT4 tape drive someone has sitting around (which reminds me, I should get that tower from ma's at some point)... 800gigs on a single tape, as long as I could get Windows to recognize it... nightly or weekly backups of the GitHub. And, I know it's possible to get a tapedrive working under Win10 (I have a Travan 10/20 that works for backup, uses ZDatDump Free... software limits to 12gigs backed up without paying, but it works).
iceboob ago
Nice work but i think it would be easier to have the bot backup each post to archive.is. they'll take care of the assets and you can build a database of voat posts -> archive.is links. just my 2cents
wecanhelp ago
He's already POSTing the Voat URLs to archive.is for automatic backup as a "side effect", but for redundancy it is important that we don't rely only on archive.is. If it gets compromised and we only have a database table with Voat posts pointing to archive.is links then we're screwed. That's why local (and redundant) copies are key.
CrackerJacks ago
@gittttttttttttttttt Will your new thing automatically backup everything in this sub > https://voat.co/v/pizzagate/1459779 . Just asking as some of it may be important later.
Edit: Thanks for what you've done.
gittttttttttttttttt ago
No it won't. Will try adding something like that in the future. Keep archiving all the external links.
gittttttttttttttttt ago
Hey all external links will be backed up now
Htaed ago
Very nice.
PizzaGate-Is-Real ago
Archive of this Voat post and the GitHub repository landing page for good measure:
http://archive.is/https://voat.co/v/pizzagate/1479493
http://archive.is/5SgDJ
RedGreenAlliance ago
Very diligent, great to see people use specialised skills for the cause. Hat is duly tipped
THE_LIES_OH_THE_LIES ago
Awesome. Thank you so much.
VieBleu ago
Thank you thank you thank you.
One thing - I have a Skillz page, limited though it is, I add to it all the time. For extreme newbies and others could you please explain, step by step, exactly how to archive a post. I will put it on the page. also I honestly don't know how myself. thx.
gittttttttttttttttt ago
Copy a URL you want to backup
Go to archive.is
Paste the URL and press "save the page"
It is then archived online on archive.is
You can save it locally by pressing "ctrl s" or clicking the screenshot tab and right clicking the image and saving the image
VieBleu ago
why are we so sure of archive.is - surely it can be compromised?
gittttttttttttttttt ago
It can be you are absolutely right. We aren't 100% sure and never can be. That is why this bot is backing up locally too and there is git distributions all over. Anything you back up to archive.is you should save locally. We know archive.org is compromised. archive.is has been good to us so far.
disclosuretimes ago
Thank you so much!
gittttttttttttttttt ago
I am glad I can help. Thanks for contributing to the sub.
( note to self - check this persons post for Ted Gunderson data )
disclosuretimes ago
Ill do some research into the CF after my exams. We are going to reveal this.
jbooba ago
Thank you!
gittttttttttttttttt ago
Sure thing. Thanks for your work and posts. Please keep contributing to the sub and I will do my best coding/tech wise to help out. We all have different skills so its important we utilize them to help save the children!
stunknife ago
This is HIGHLY appreciated. I've also noticed many good threads that weren't archived got buried. This definitely helps in case threads start disappearing on here.
gittttttttttttttttt ago
Yes agreed. There is so much data its going to take a LONG TIME for us to all go through it. Preserving it now is crucial. We need to make sure we keep checking the new section so stuff is less likely to slip through.
Normality1 ago
THANK YOU for your work!
gittttttttttttttttt ago
Sure thing. I try to contribute where I can. Thank you! I really appreciate all of you on here doing research/upvoting/downvoting etc. It is hard work.
Mooka_Molaka ago
Wow, Thank You so much for all of your work on this! I wish I knew how to do things like this, or even assist somehow.
I'm going to share something with you & if it's dumb/doesn't apply please let me know & I'll delete it or update it with correct info. So here goes ~
Back in the first week of October in 2014, #GamerGate was about 5-6 weeks old. Much of the same gusto, comradeship & determination for research & evidence was flowing through us like we have here.
I can't remember who it was atm, but he had been putting together a massive amount of research & notes etc. on GitHub to make it that much easier to access info, add new stuff etc. it was great! Until Jake Boxer heard about it. Let me first say that there was NO dox(x) information on it nor ANY types of "attack plans", nor anything encouraging of violence. In fact there was absolutely NOTHING that broke any TOS or site rules, but most SJW's will bend over backwards to do some serious white knighting & scream to their social circles & networks about what wonderful people they are as they fuck over anyone who disagrees with them. They will treat us like "human garbage", target us, label us all the usual #RACIST! #SEXIST! #MISOGYNIST! etc, etc. Whatever it takes to be Top Virtue Signaler of the Week!
Ok, sorry. I didn't mean to write so much & go off about those tools. I guess I'm still burnt & disgusted by them! But anyway, to the important reason that little history lesson re; #GamerGate matters is because OUR ENTIRE GITHUB WAS DELETED WITHOUT WARNING! By a virtue signaling Whiteknight extraordinaire.
https://haegarr.wordpress.com/2014/10/04/github-deletes-repo-because-he-personally-doesnt-like-it/
I would hate to see something like this happen again. I worry you might lose all of the hard work you put into this. I don't have a plan or answer on how to know if it's coming or what to do if it happens, I just saw your post & felt it is important to let you know what has happened before, when certain people aren't happy with your personal opinions & may just trash your work without a care & without warning or a way for you to have a backup made.
I hope I'm wrong & there won't be any issues. But just in case I wanted you & the rest of us who care about #PizzaGate to be aware of the possibility.
💖God Bless You & Thank You 💖 for all of your efforts towards exposing the heinous crimes that are #PizzaGate
Here are a few more write ups about the entire #GG GitHub being deleted; (this isn't my comment permanent-link from KiA, but I felt that just copypasta-ing each of the links would be like taking credit from their post ~ I hope that's ok ^_^).
https://www.reddit.com/r/KotakuInAction/comments/3fq180/github_history_one_tweet_one_lie_and_a_gamergate/ctr8j22/
http://adland.tv/adnews/gamergate-op-deleted-github-official-reply-why/1331743980 http://gamergate.wikia.com/wiki/Github https://gitgud.net/gamergate/gamergateop/tree/master/Current-Happenings#-oct-3rd-friday https://archive.is/uypn2 (Pipedot.org thread) http://facepunch.com/showthread.php?t=1421478&p=46148933&viewfull=1#post46148933 http://theralphretort.com/github-censors-gamergate/ (yea it's Ralph but it's relevant) http://i.imgur.com/DeNiCiO.png (Jake Boxer tweets) http://www.reddit.com/r/KotakuInAction/comments/2i8jzr/the_gamergate_github_was_deleted_due_to_a_github/ https://archive.is/QkyiB http://www.reddit.com/r/KotakuInAction/comments/2i85z1/update_what_just_happened_to_the_github_repository/ http://www.reddit.com/r/KotakuInAction/comments/2i8wra/github_confirms_that_its_deleted_the_gamergate/
Also there's still the case of a alleged Github employee trying to snoop around using the email address of a user who tried to contact them.
https://www.reddit.com/r/KotakuInAction/comments/2i8zwe/after_emailing_github_this_guy_got_his_email/ http://www.twitlonger.com/show/n_1sce3fa https://archive.is/uaaYR another mature github employee https://archive.is/FDDWi
gittttttttttttttttt ago
Thanks for the heads up. I know all about the SJW problem at GitHub. I am just using github because it gets the most traffic and SEO's very well. I have the repo on many other providers so we don't have a single point of failure with github.
Mooka_Molaka ago
Excellent ^_^
Wellwerefucked ago
Its massively appreciated.