You are viewing a single comment's thread.

view the rest of the comments →

worldofmadness ago

Really? You decide to shut down one of the few censorship sites left on the net at this very pivotal moment in history where sharing true information is more important than ever? Strategic timing to shut down I would say. Kikes are definitely rubbing their hands like crazy over this announcement..

Muh-Shugana ago

We at least need a way to site-rip the text posts from this whole website, there's important shit stored here!

meowski ago

Just finished this python scraper. Got my posts and comments all saved down with it.

Get it while you can.

https://github.com/fwet/voat_scraper

operation_wetvac ago

Thanks, and just in the nick of time.

Currently running it. And I know, not the proper channels, but on Windows at least you also need to:

pip install requests_html

Also when doing the dry run option, need to create/touch the following files first as the script tries to read them before they exist:

comments_test.html
submissions_test.html
submission_test.html

When doing a live run, it tries to open files before creating the output directory:

commentsOutFilePath=outFolder+'/'+userName+'_comments_out.csv'
commentsOutFileHandle =  open(commentsOutFilePath, 'a', newline='')
commentsWriter = csv.writer(commentsOutFileHandle)

I just moved these two lines to right before that section and it's working:

logging.debug("Creating output folder "+outFolder)
Path(outFolder).mkdir(parents=True, exist_ok=True)

meowski ago

Thanks, I updated the repo accordingly

MadWorld ago

There is a 100-page limitation on how much you can save. If you want to get around that, you would have to start deleting submissions/comments. A better way to get around it would be to save from searchvoat.co and incrementally change either from or to date parameter, depending on the order of save. This method will allow the users to fully download all submissions/comments available, without touching voat.co. The downsides are that voting scores are not intact to the data, and it is necessary to parse the timestamp at the end of the page limitation.

It can all be done without Voat. I think the more pressing matter is to save user's saved and PM data:

https://voat.co/u/[username]/saved?page=[n]

https://voat.co/messages/private?page=[n]

https://voat.co/messages/sent?page=[n]

meowski ago

i added private message support so it saves inbox and sent messages

meowski ago

Oh look at that. that's unfortunate. guess i could make a destructive option to delete them and page through that way

MadWorld ago

I took the lazy option and curled from the terminal to fetch my private data with copypasta:

seq 0 99 | while read n; do curl -s -L -b "$cookie" -c "$cookie" "https://voat.co/u/MadWorld/saved?page=$n" > inbox.saved.page.$n.html; done;
seq 0 99 | while read n; do curl -s -L -b "$cookie" -c "$cookie" "https://voat.co/messages/private?page=$n" > inbox.private.page.$n.html; done;
seq 0 99 | while read n; do curl -s -L -b "$cookie" -c "$cookie" "https://voat.co/messages/sent?page=$n"    > inbox.sent.page.$n.html; done;

And did a global search-and-replace afterwards, to get the pages displayed properly:

rArray[0,0]='/css/min/edon-\(dark\|light\).min.css?v=1M_upzpM_U7-VQSviyTgGJuwxOW61ihjwLllwf5ChVk'
rArray[0,1]='https://web.archive.org/web/20201222002454/https://voat.co/css/min/edon-light.min.css?v=1M_upzpM_U7-VQSviyTgGJuwxOW61ihjwLllwf5ChVk'
rArray[1,0]='/css/ui-themes/autocomplete/autocompletebundle.min.css'
rArray[1,1]='https://web.archive.org/web/20201222152033/https://voat.co/css/ui-themes/autocomplete/autocompletebundle.min.css'
rArray[2,0]='/js/min/jquery.min.js?v=ALWrmkcUjo2DhFyYpbKmRHhILsFU9vtv2821SYosqdc'
rArray[2,1]='https://web.archive.org/web/20201222152033/https://voat.co/js/min/jquery.min.js?v=ALWrmkcUjo2DhFyYpbKmRHhILsFU9vtv2821SYosqdc'
rArray[3,0]='/js/min/bootstrap.min.js?v=W42kLKGGAhRDuqWgjj38bHc8ieQdOO4nB5n-NObv5Jc'
rArray[3,1]='https://web.archive.org/web/20201222124856/https://voat.co/js/min/bootstrap.min.js?v=W42kLKGGAhRDuqWgjj38bHc8ieQdOO4nB5n-NObv5Jc'
rArray[4,0]='/js/min/edon.min.js?v=cYc_QWv1N70th2CkxPW4E9Cg3pzwmbQYdwF0m4-xP1g'
rArray[4,1]='https://web.archive.org/web/20201222142014/https://voat.co/js/min/edon.min.js?v=cYc_QWv1N70th2CkxPW4E9Cg3pzwmbQYdwF0m4-xP1g'
rArray[5,0]='/host/voat/images/logo.png'
rArray[5,1]='https://web.archive.org/web/20201222172210/https://voat.co/host/voat/images/logo.png'
rArray[6,0]='/lib/signalr/signalr.min.js?v=1yGr-J4NjYMQVt6LP6vocjHIWhO-BFQLfdCs8x1Mv-U'
rArray[6,1]='https://web.archive.org/web/20201223034504/https://voat.co/lib/signalr/signalr.min.js?v=1yGr-J4NjYMQVt6LP6vocjHIWhO-BFQLfdCs8x1Mv-U'
rArray[7,0]='/js/edon.sockets.notifications.js?v=85dEE_C-FOOj9Rn55P7o6lG3s5cmV31C7Yd5KhBH6c0'
rArray[7,1]='https://web.archive.org/web/20201223034817/https://voat.co/js/edon.sockets.notifications.js?v=85dEE_C-FOOj9Rn55P7o6lG3s5cmV31C7Yd5KhBH6c0'

MadWorld ago

It only affects users with more than 100 pages or 2500 entries per message type.

i added private message support so it saves inbox and sent messages

Much appreciated!! You are the hero Voat needs on its last days... Thank you!!

IslamicStatePatriot ago

WinKey + R wsl

You can launch your Windows Subsystem for Linux and should be able to run this no prob for Win users.

Wonder_Boy ago

Can you build one that scrapes all the posts we've saved? To me, that's an even bigger deal. I want to get all the stuff others have posted that I've saved.

meowski ago

that didn't even cross my mind. i have never saved a post or used sets before either.

it would be easy to modify this to save saved posts since they're in the same format

Wonder_Boy ago

Bro, any luck with that code?

meowski ago

i added the private messages but didn't get to do the saved posts.

at this point i'd recommend saving those html files manually. i don't think i'll have time for that before tomorrow. got xmas family stuff going on

Wonder_Boy ago

It's really something I've wanted to do for a long time. I save people's posts & links all the time. There's a mountain of data & info on Voat that is frankly irreplaceable.

Do what you can, please.

Thanks!

StopTheEvilAgenda ago

Agreed

Muh-Shugana ago

The hero we need.