Downloads and archives content from reddit

archive downloader gfycat imgur python reddit scraper

Go to file

Ali Parlakci 90f25da666 FileNameTooLong exception added for Gfycat and Direct classes		2018-07-10 00:51:26 +03:00
src	FileNameTooLong exception added for Gfycat and Direct classes	2018-07-10 00:51:26 +03:00
_config.yml	Initial commit	2018-07-09 22:58:11 +03:00
.gitignore	Initial commit	2018-07-09 22:58:11 +03:00
LICENSE	Initial commit	2018-07-09 22:58:11 +03:00
README.md	Initial commit	2018-07-09 22:58:11 +03:00
script.py	Parsing arguments two times is fixed	2018-07-09 23:46:04 +03:00

README.md

Bulk Downloader for Reddit

This program downloads imgur, gfycat and direct image and video links of saved posts from a reddit account. It is written in Python 3.

PLEASE post any issue you had with the script to Issues tab. Since I don't have any testers or contributers I need your feedback.

Requirements
Configuring the APIs
- Creating an imgur app
Program Modes
Running the script
- Using the command line arguments
- Examples
FAQ
Changelog
- release-1.0.0

Requirements

Python 3.x*

You can install Python 3 here: https://www.python.org/downloads/

You have to check "Add Python 3 to PATH" option when installing in order it to run correctly.

*Although the latest version of python is suggested, you can use 3.6.5 since it runs perfectly on that version

Configuring the APIs

Because this is not a commercial app, you need to create an imgur developer app in order API to work.

Creating an imgur app

Go to https://api.imgur.com/oauth2/addclient
Enter a name into the Application Name field.
Pick Anonymous usage without user authorization as an Authorization type*
Enter your email into the Email field.
Correct CHAPTCHA
Click submit button

It should redirect to a page which shows your imgur_client_id and imgur_client_secret

*Select OAuth 2 authorization without a callback URL first then select Anonymous usage without user authorization if it says Authorization callback URL: required

Program Modes

All the program modes are activated with command-line arguments as shown here

saved mode

In saved mode, the program gets posts from given user's saved posts.

submitted mode

In submitted mode, the program gets posts from given user's submitted posts.

upvoted mode

In submitted mode, the program gets posts from given user's upvoted posts.

subreddit mode

In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number.

Multiple subreddits can be given

You may also use search in this mode. See py -3 script.py --help.

multireddit mode

In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number.

link mode

In link mode, the program gets posts from given reddit link.

You may customize the behaviour with --sort, --time, --limit.

You may also use search in this mode. See py -3 script.py --help.

log read mode

Two log files are created each time script.py runs.

POSTS Saves all the posts without filtering.
FAILED Keeps track of posts that are tried to be downloaded but failed.

In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again.

Running log read mode for FAILED.json file once after the download is complete is HIGHLY recommended as unexpected problems may occur.

Running the script

WARNING DO NOT let more than 1 instance of script run as it interferes with IMGUR Request Rate.

Using the command line arguments

If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure).

Open up the terminal and navigate to where script.py is. If you are unfamiliar with changing directories in terminal see Change Directories in this article.

Run the script.py file from terminal with command-line arguments. Here is the help page:

ATTENTION Use .\ for current directory and ..\ for upper directory when using short directories, otherwise it might act weird.

$ py -3 script.py --help
usage: script.py [-h] [--link link] [--saved] [--submitted] [--upvoted]
                 [--log LOG FILE] [--subreddit SUBREDDIT [SUBREDDIT ...]]
                 [--multireddit MULTIREDDIT] [--user redditor]
                 [--search query] [--sort SORT TYPE] [--limit Limit]
                 [--time TIME_LIMIT] [--NoDownload]
                 DIRECTORY

This program downloads media from reddit posts

positional arguments:
  DIRECTORY             Specifies the directory where posts will be downloaded
                        to

optional arguments:
  -h, --help            show this help message and exit
  --link link, -l link  Get posts from link
  --saved               Triggers saved mode
  --submitted           Gets posts of --user
  --upvoted             Gets upvoted posts of --user
  --log LOG FILE        Triggers log read mode and takes a log file
  --subreddit SUBREDDIT [SUBREDDIT ...]
                        Triggers subreddit mode and takes subreddit's name
                        without r/. use "frontpage" for frontpage
  --multireddit MULTIREDDIT
                        Triggers multireddit mode and takes multireddit's name
                        without m/
  --user redditor       reddit username if needed. use "me" for current user
  --search query        Searches for given query in given subreddits
  --sort SORT TYPE      Either hot, top, new, controversial, rising or
                        relevance default: hot
  --limit Limit         default: unlimited
  --time TIME_LIMIT     Either hour, day, week, month, year or all. default:
                        all
  --NoDownload          Just gets the posts and store them in a file for
                        downloading later

Examples

Don't include `py -3 script.py` part if you start the script by double-clicking

py -3 script.py .\\NEW_FOLDER --sort new --time all --limit 10 --link "https://www.reddit.com/r/gifs/search?q=dogs&restrict_sr=on&type=link&sort=new&t=month"

py -3 script.py .\\NEW_FOLDER --link "https://www.reddit.com/r/learnprogramming/comments/7mjw12/"

py -3 script.py .\\NEW_FOLDER --search cats --sort new --time all --subreddit gifs pics --NoDownload

py -3 script.py .\\NEW_FOLDER --user [USER_NAME] --submitted --limit 10

py -3 script.py .\\NEW_FOLDER --multireddit good_subs --user [USER_NAME] --sort top --time week --limit 250

py -3 script.py .\\NEW_FOLDER\\ANOTHER_FOLDER --saved --limit 1000

py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json

FAQ

I can't startup the script no matter what.

Try python3 or python or py -3 as python have real issues about naming their program

Changelog

v1.0.0

Initial release