Thanks to the help of a fellow anonymous Redditor I've released a new version of RedditScrape. This new version now uses the push shift API to gather gigantic levels of data for you to download. This means we no longer need to provide any form of reddit credentials.

While the previous version was hard capped at 1,000 posts using the Reddit API, this new version has no limits at all, other than what resources and disk space you have.

For example, if you're brave enough to try and scrape something like gonewild, you'll find it takes DAYS just to get all of the data back from push shift. The JSON text alone is over 9 gigs (3.3 million posts) and climbing.

Running this is now a two step process, but results in a substantially larger set of media from your favorite subs.

Instructions can be found here. I hope I've fixed a few of the problems that people had with the first iteration along the way.

Comments (4)

Hello /u/nsfwutils! Thank you for posting in r/DataHoarder.

Please remember to read our Rules and Wiki.

If you're submitting a new script/software to the subreddit, please link to your GitHub repository. Please let the mod team know about your post and the license your project uses if you wish it to be reviewed and stored on our wiki and off site.

Asking for Cracked copies/or illegal copies of software will result in a permanent ban. Though this subreddit may be focused on getting Linux ISO's through other means, please note discussing methods may result in this subreddit getting unneeded attention.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

When running threaded-aquire.py with python as sudo every sub gives me "Unexpected error for sub (sub name): Expecting value: line 1 column 1 (char 0)

No file saved

Failed to retrieve data for these subs

And here it lists every sub I had in subs file.

Hi, thanks for redditscrape. It works great! It would be greater if i could download posts from friends. There is the /r/friends "sub" which seems to be like a virtual sub filled with friends' posts. I tried adding that to the "subs" file for redditscrape, but it ended up in the List of bad subs in the output. Do you know whether there's a workaround for this?

Does the pusshift-api work? I heard its down atm.