You are viewing a single comment's thread.

view the rest of the comments →

totesgoats908234 ago

Downloading a website like voat can be tricky without it taking up a huge amount of space. I have a web crawler that I have used to archive websites, that use MVC coding style to generate their content, with trying to take up as little space as needed. I will give it a shot and let you know if it works here.

If I were to just recursively download all the subverse with http GETs it would be a lot of extra space needed. Google says there are about 7600 pages in this subverse and downloading one generates ~800KB, so if you did it this way you would need ~6GB of space to download what is here to date.

If you use google to isolate search to a single day, I used the 14th, it doesnt show the total number of pages at the top but if you look there are 10 results per page and 32 pages so estimate 300 new posts a day @ 800KB means ~240MB of daily delta snapshot data you will need space for.

This is just my guestimate from using google and a single page download size as rule of thumb. I'm not 100% sure how accurate single day search is for estimating number of pages per day but meh.

Other than that, security is a concern. I would put it in a hidden service to make it harder to takedown.

Pizzafundthrowaway ago

Thank you so much for your willingness to help. Someone is already working on this, but he needs help. Can you follow the link in the updated OP and see what you can do? I'm willing to split the final $50 between the original developer and anyone else who helps him

totesgoats908234 ago

To be fair, whoever is creating that backup script you linked to is not familiar with software development. This is more just like a one-off shell script. It doesn't have functions, flow, or error correction. It will be prone to break and not be reliable. Not to be offensive, but it is just a hacked together bash shell script. I'd prefer to use existing developed tools like httrack - https://www.httrack.com/ that can be compiled for linux (what all web hosts will be using). It is well developed and has existing functions to update changes on web page and is easy to script for a cron job.

I'm willing to set this up if you use all the mentioned funds to purchase an anonymous hosting service. I won't ask for payment to set up and secure the server. I noticed that another member offered to set up a clone that will utilize a database backend, if you don't hear back from them then I will offer to set up more simplistic backup using httrack. I can set up prototype on a spare Linode server that I have if you PM me from your throwaway account.

I found other voat specific projects on github for backing up voat and comments ( https://github.com/voatarc/voatarc ) , but it would take more work than what I will do for free to set up such a replicant. The source code for Voat is also available to anyone to fork ( https://github.com/voat/voat ) . There are also existing scrapers for copying a website ( https://github.com/guillemhs/ScraperBot ), which is a better solution than what the other user using is doing by creating a one-off shell script for backup and replication.

Test using httrack to replicate - https://i.sli.mg/6RZAhv.png - https://i.sli.mg/XvfaLT.png- https://i.sli.mg/SJj64X.png - https://i.sli.mg/OxsHpB.png

Pizzafundthrowaway ago

Thank you very much for the suggestion. I still haven't issued any payment yet. I'm learning about how to use bitcoin (my first time). I'm also learning about the alternative suggestions people have made.

Please, can you guys work together on this? With all the censorship and content takedowns, I'm really afraid this site is in serious jeopardy.

I'm still learning to pick the best option, but if you and the others who offered their services can arrive at a consensus about how to tackle this, and deliver proof of concept, I'll double my contribution.

Teamwork, please. And consensus. We can do this!