You are viewing a single comment's thread.

view the rest of the comments →

anonOpenPress ago

1) Automated archiving & automated webpage screenshot

PizzagateBot ago

:)

I'm about 10k posts into a full archive of some 23k posts that I discovered by crawling ~400k possible ID numbers and checking if they are posts or not. I'll upload it somewhere once done, it saves css in each WARC so total itll be bloated like 8GB, and maybe seed a torrent of it.

I am using the WARC proxy that webrecorder.io provides because it is the first proxy that archives everything including javascript so that you can click on and expand comments. I tried others who claim the same thing but never worked. Also, I like that webrecorder.io allows you to download the WARC so I could include a WARC link in each post and if you thought that post was interesting or worth saving, you could just click and download the WARC.

I use webarchiveplayer - https://github.com/ikreymer/webarchiveplayer - to replay WARC files like this - https://files.catbox.moe/vgd5z5.png and I can combine multiple WARC into a single index so that it is searchable by link like this - https://files.catbox.moe/g8tiqd.png so to open a browser of the index of WARCs that I merged for testing, I just ran this command

webarchiveplayer WARCMerge20170329215008930044.warc