Compare commits

...

50 Commits

Author SHA1 Message Date
Serene
17350ba9b6
Merge pull request #847 from Soulsuck24/development 2023-04-30 11:33:37 +10:00
Soulsuck24
8d6101112b
Redgif coverage
better coverage for thumbs subdomains and direct links to images.
2023-04-29 14:08:08 -04:00
Serene
72d5b1acaf
Merge pull request #845 from OMEGARAZER/catbox-downloader 2023-04-28 11:04:09 +10:00
OMEGARAZER
38bef1d1e0
Add Catbox downloader
Adds downloader for catbox.moe collections.
2023-04-27 16:14:17 -04:00
Serene
7e9afc2883
Merge pull request #831 from OMEGARAZER/doc-fixes
Update docs
2023-04-07 10:10:18 +10:00
OMEGARAZER
b90ab1b02a
Update docs 2023-04-06 02:20:01 -04:00
Serene
d8f79d79dc
Merge pull request #819 from OMEGARAZER/test-fix 2023-03-12 09:58:27 +10:00
OMEGARAZER
7fbd001e8a
Fix Youtube test
tested video now private. Updated to new video that should stay up.
2023-03-11 10:26:01 -05:00
Serene
b6bc94bdff
Merge pull request #810 from OMEGARAZER/user-agent 2023-03-08 10:55:08 +10:00
OMEGARAZER
1705884dce
Use proper user agent string 2023-03-05 17:22:44 -05:00
Serene
38e6c45792
Merge pull request #807 from OMEGARAZER/Scripts-tests 2023-03-02 16:02:36 +10:00
OMEGARAZER
a5b445945a
Partial revert of 98aa3d7
Reverts some quote changes for awk commands as the prints do not function the same with double quotes when the variable number is over 10
2023-03-01 23:46:20 -05:00
OMEGARAZER
74d842e6da
Scripts testing/fixes
Adds Bats and Pester testing to for bash and powershell scripts

Updates powershell scripts to match bash scripts in logic

Added missing score filter lookup for powershell script
2023-03-01 23:40:12 -05:00
Serene
a16622e11e
Merge pull request #804 from thomas694/update-readme 2023-03-01 09:23:34 +10:00
Serene
9fdd41851b
Merge pull request #802 from Soulsuck24/development 2023-02-28 10:36:04 +10:00
Soulsuck24
dd283130e3
Imgur fixes
Update regex for links styled as direct and album
2023-02-27 15:53:47 -05:00
Serene
6df7906818
Merge pull request #796 from OMEGARAZER/empty-dir-fix 2023-02-27 17:25:40 +10:00
OMEGARAZER
005454a5c2
Don't create directory if not needed
Moves creation of parent directories after dupe check so directories are not made if not needed.
2023-02-26 16:52:39 -05:00
Thomas
987172c1ce
Update README.md 2023-02-23 00:32:02 +01:00
Serene
60e9e58664
Merge pull request #785 from Soulsuck24/development 2023-02-20 07:58:25 +10:00
Soulsuck24
80c66c8c78
Erome fixes
Fixes searched class to not include thumbnails of other albums.
2023-02-18 23:59:32 -05:00
Serene
911410608a
Merge pull request #780 from OMEGARAZER/Quote-cleanup 2023-02-19 10:39:15 +10:00
Serene
3d07ffb6df
Merge pull request #781 from OMEGARAZER/requests 2023-02-19 10:10:41 +10:00
OMEGARAZER
8f9bed0874
Harden requests
Moves bare requests out of site downloaders into the base_downloader and add timeout exception handling now that there is a default timeout.
2023-02-18 17:36:00 -05:00
OMEGARAZER
98aa3d7cb6
Quote cleanup
cleanup of some strings to prioritize outer double quotes when both are present or switch to double quotes when only single is present.
2023-02-18 16:06:32 -05:00
OMEGARAZER
5c57de7c7d
B907 Cleanup/updates
Cleanup some double quoted locations based on bugbear B907 and add same formatting to some other locations the emphasis may be helpful.
2023-02-18 15:58:05 -05:00
Serene
5cf2c81b0c
Merge pull request #779 from OMEGARAZER/development 2023-02-18 07:31:51 +10:00
OMEGARAZER
a3b9e78f53
Update tests
Seems the thumbs subdomain was changed for these cases.
2023-02-17 15:17:25 -05:00
OMEGARAZER
9eeff73a12
Update yt-dlp
Update yt-dlp to 2023.2.17 as it contains updates to vreddit downloading for better coverage.
2023-02-17 14:50:40 -05:00
Serene-Arc
673076ed2e Add requirement for template 2023-02-14 14:41:16 +10:00
Serene-Arc
183f592ad8 Update CONTRIBUTING bug report requirements 2023-02-14 14:37:55 +10:00
Serene
0051877e01
Merge pull request #769 from OMEGARAZER/gfycat-api 2023-02-12 11:45:25 +10:00
Serene
e5b184ef9a
Merge pull request #762 from OMEGARAZER/pyupgrade-lint 2023-02-12 11:45:00 +10:00
OMEGARAZER
55384cd0f0
UP032 2023-02-11 15:30:54 -05:00
OMEGARAZER
0e28c7ed7c
Gfycat API
Moves Gfycat to use API via site access key.

Adds cachetools as dependency to reuse API keys for Gfycat/Redgifs at 95% of their TTL. Include tests to verify caching.

Updates versions of requests/yt-dlp/black/isort/pytest.

Added default timeout to requests calls.

Adds validate-pyproject and blacken-docs to pre-commit as well as updates versions.
2023-02-11 15:23:08 -05:00
OMEGARAZER
0bf44e5d82
UP012 2023-02-11 15:07:26 -05:00
Serene
980cf5df9d
Merge pull request #773 from Serene-Arc/enhancement_771
Resolves https://github.com/aliparlakci/bulk-downloader-for-reddit/issues/771
2023-02-06 09:56:56 +10:00
Serene-Arc
4e15af637f Make option descriptions clearer 2023-02-06 09:54:52 +10:00
Serene-Arc
1895d2f22a Add warning for --search-existing 2023-02-06 09:53:13 +10:00
Serene
c8a859a747
Merge pull request #768 from OMEGARAZER/logged-version 2023-02-04 18:07:35 +10:00
OMEGARAZER
4a91ff6293
Version logged
Adds to end log message to include type of run and version of BDFR

REF: https://github.com/aliparlakci/bulk-downloader-for-reddit/issues/764#issuecomment-1414552069_
2023-02-03 23:39:25 -05:00
Serene
7676ff06a3
Merge pull request #766 from bunny-foofoo/development 2023-02-03 13:20:40 +10:00
Serene
708b22d8a1
Merge pull request #765 from OMEGARAZER/unsave 2023-02-03 10:26:39 +10:00
OMEGARAZER
a535fee574
Black update
Black version 23.1.0 updates
2023-02-02 11:50:47 -05:00
Bunny
07e38a7709 fix #753 2023-02-01 19:26:16 -08:00
OMEGARAZER
afd2f88f91
Update test_direct.py
If this changes again another link should probably be found as this is only a few days old.
2023-02-01 14:36:23 -05:00
OMEGARAZER
730856934b
Update unsaveposts.py
Make some updates to the unsaveposts script and updated flake8 exclude now that there is a python script in the scripts directory.

Also added the scripts directory to actions test ignore as any changes in there shouldn't have any affect on the tests that are performed.
2023-02-01 14:18:20 -05:00
OMEGARAZER
086f4090d4
UP034 2023-01-28 23:59:53 -05:00
OMEGARAZER
95c8c72271
UP008 2023-01-28 23:58:36 -05:00
OMEGARAZER
247ea5ddd0
UP009 2023-01-28 23:56:23 -05:00
92 changed files with 556 additions and 275 deletions

View File

@ -7,9 +7,10 @@ assignees: ''
---
- [ ] I have read the [Opening an issue](https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/master/docs/CONTRIBUTING.md#opening-an-issue)
- [ ] I am reporting a bug.
- [ ] I am running the latest version of BDfR
- [ ] I have read the [Opening an issue](https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/master/docs/CONTRIBUTING.md#opening-an-issue)
- [ ] I am not asking a question about the BDFR (please use Discussions for this)
## Description

29
.github/workflows/scripts-test.yml vendored Normal file
View File

@ -0,0 +1,29 @@
name: Scripts Test
on:
push:
paths:
- "scripts/*.sh"
- "scripts/*.ps1"
pull_request:
paths:
- "scripts/*.sh"
- "scripts/*.ps1"
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
submodules: 'true'
- name: Bats tests
run: |
cd scripts/tests/
bats/bin/bats *.bats
- name: Pester tests
shell: pwsh
run: |
cd scripts/tests/
Invoke-Pester -CI -PassThru .

View File

@ -7,12 +7,14 @@ on:
- "**.md"
- ".markdown_style.rb"
- ".mdlrc"
- "scripts/"
pull_request:
branches: [ master, development ]
paths-ignore:
- "**.md"
- ".markdown_style.rb"
- ".mdlrc"
- "scripts/"
jobs:
test:

View File

@ -2,13 +2,18 @@
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/abravalheri/validate-pyproject
rev: v0.12.1
hooks:
- id: validate-pyproject
- repo: https://github.com/psf/black
rev: 22.12.0
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.11.4
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
@ -23,3 +28,9 @@ repos:
rev: v0.12.0
hooks:
- id: markdownlint
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.13.0
hooks:
- id: blacken-docs
additional_dependencies: [black>=23.1.0]

View File

@ -80,11 +80,11 @@ bdfr download ./path/to/output --user reddituser --submitted -L 100
```
```bash
bdfr download ./path/to/output --user me --saved --authenticate -L 25 --file-scheme '{POSTID}'
bdfr download ./path/to/output --user me --saved --authenticate -L 25 --file-scheme "{POSTID}"
```
```bash
bdfr download ./path/to/output --subreddit 'Python, all, mindustry' -L 10 --make-hard-links
bdfr download ./path/to/output --subreddit "Python, all, mindustry" -L 10 --make-hard-links
```
```bash
@ -92,7 +92,7 @@ bdfr archive ./path/to/output --user reddituser --submitted --all-comments --com
```
```bash
bdfr archive ./path/to/output --subreddit all --format yaml -L 500 --folder-scheme ''
bdfr archive ./path/to/output --subreddit all --format yaml -L 500 --folder-scheme ""
```
Alternatively, you can pass options through a YAML file.
@ -191,13 +191,13 @@ The following options are common between both the `archive` and `download` comma
- This is the name of a multireddit to add as a source
- Can be specified multiple times
- This can be done by using `-m` multiple times
- Multireddits can also be used to provide CSV multireddits e.g. `-m 'chess, favourites'`
- Multireddits can also be used to provide CSV multireddits e.g. `-m "chess, favourites"`
- The specified multireddits must all belong to the user specified with the `--user` option
- `-s, --subreddit`
- This adds a subreddit as a source
- Can be used mutliple times
- This can be done by using `-s` multiple times
- Subreddits can also be used to provide CSV subreddits e.g. `-m 'all, python, mindustry'`
- Subreddits can also be used to provide CSV subreddits e.g. `-m "all, python, mindustry"`
- `-t, --time`
- This is the time filter that will be applied to all applicable sources
- This option does not apply to upvoted or saved posts when scraping from these sources
@ -233,11 +233,12 @@ The following options apply only to the `download` command. This command downloa
- The default is 120 seconds
- See [Rate Limiting](#rate-limiting) for details
- `--no-dupes`
- This flag will not redownload files if they were already downloaded in the current run
- This flag will skip writing a file to disk if that file was already downloaded in the current run
- This is calculated by MD5 hash
- `--search-existing`
- This will make the BDFR compile the hashes for every file in `directory`
- The hashes are used to remove duplicates if `--no-dupes` is supplied or make hard links if `--make-hard-links` is supplied
- The hashes are used to skip duplicate files if `--no-dupes` is supplied or make hard links if `--make-hard-links` is supplied
- **The use of this option is highly discouraged due to inefficiency**
- `--file-scheme`
- Sets the scheme for files
- Default is `{REDDITOR}_{TITLE}_{POSTID}`
@ -407,7 +408,7 @@ Modules can be disabled through the command line interface for the BDFR or more
- `Vidble`
- `VReddit` (Reddit Video Post)
- `Youtube`
- `YoutubeDlFallback`
- `YtdlpFallback` (Youtube DL Fallback)
### Rate Limiting

View File

@ -1,4 +1,3 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
__version__ = "2.6.2"

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
import sys
@ -78,12 +77,16 @@ def _add_options(opts: list):
return wrap
def _check_version(context, param, value):
def _check_version(context, _param, value):
if not value or context.resilient_parsing:
return
current = __version__
latest = requests.get("https://pypi.org/pypi/bdfr/json").json()["info"]["version"]
print(f"You are currently using v{current} the latest is v{latest}")
try:
latest = requests.get("https://pypi.org/pypi/bdfr/json", timeout=10).json()["info"]["version"]
print(f"You are currently using v{current} the latest is v{latest}")
except TimeoutError:
logger.exception(f"Timeout reached fetching current version from Pypi - BDFR v{current}")
raise
context.exit()
@ -117,10 +120,10 @@ def cli_download(context: click.Context, **_):
reddit_downloader = RedditDownloader(config, [stream])
reddit_downloader.download()
except Exception:
logger.exception("Downloader exited unexpectedly")
logger.exception(f"Downloader exited unexpectedly - BDFR Downloader v{__version__}")
raise
else:
logger.info("Program complete")
logger.info(f"Program complete - BDFR Downloader v{__version__}")
@cli.command("archive")
@ -138,10 +141,10 @@ def cli_archive(context: click.Context, **_):
reddit_archiver = Archiver(config, [stream])
reddit_archiver.download()
except Exception:
logger.exception("Archiver exited unexpectedly")
logger.exception(f"Archiver exited unexpectedly - BDFR Archiver v{__version__}")
raise
else:
logger.info("Program complete")
logger.info(f"Program complete - BDFR Archiver v{__version__}")
@cli.command("clone")
@ -160,10 +163,10 @@ def cli_clone(context: click.Context, **_):
reddit_scraper = RedditCloner(config, [stream])
reddit_scraper.download()
except Exception:
logger.exception("Scraper exited unexpectedly")
logger.exception("Scraper exited unexpectedly - BDFR Scraper v{__version__}")
raise
else:
logger.info("Program complete")
logger.info("Program complete - BDFR Cloner v{__version__}")
@cli.command("completion")
@ -183,7 +186,7 @@ def cli_completion(shell: str, uninstall: bool):
Completion(shell).uninstall()
return
if shell not in ("all", "bash", "fish", "zsh"):
print(f"{shell} is not a valid option.")
print(f"{shell!r} is not a valid option.")
print("Options: all, bash, fish, zsh")
return
if click.confirm(f"Would you like to install {shell} completions for BDFR"):

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from abc import ABC, abstractmethod
from typing import Union

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
@ -12,7 +11,7 @@ logger = logging.getLogger(__name__)
class CommentArchiveEntry(BaseArchiveEntry):
def __init__(self, comment: praw.models.Comment):
super(CommentArchiveEntry, self).__init__(comment)
super().__init__(comment)
def compile(self) -> dict:
self.source.refresh()

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
@ -12,7 +11,7 @@ logger = logging.getLogger(__name__)
class SubmissionArchiveEntry(BaseArchiveEntry):
def __init__(self, submission: praw.models.Submission):
super(SubmissionArchiveEntry, self).__init__(submission)
super().__init__(submission)
def compile(self) -> dict:
comments = self._get_comments()

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
import logging
@ -27,7 +26,7 @@ logger = logging.getLogger(__name__)
class Archiver(RedditConnector):
def __init__(self, args: Configuration, logging_handlers: Iterable[logging.Handler] = ()):
super(Archiver, self).__init__(args, logging_handlers)
super().__init__(args, logging_handlers)
def download(self):
for generator in self.reddit_lists:
@ -66,7 +65,7 @@ class Archiver(RedditConnector):
return [supplied_submissions]
def get_user_data(self) -> list[Iterator]:
results = super(Archiver, self).get_user_data()
results = super().get_user_data()
if self.args.user and self.args.all_comments:
sort = self.determine_sort_function()
for user in self.args.user:
@ -95,7 +94,7 @@ class Archiver(RedditConnector):
elif self.args.format == "yaml":
self._write_entry_yaml(archive_entry)
else:
raise ArchiverError(f"Unknown format {self.args.format} given")
raise ArchiverError(f"Unknown format {self.args.format!r} given")
logger.info(f"Record for entry item {praw_item.id} written to disk")
def _write_entry_json(self, entry: BaseArchiveEntry):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from collections.abc import Iterable
@ -16,7 +15,7 @@ logger = logging.getLogger(__name__)
class RedditCloner(RedditDownloader, Archiver):
def __init__(self, args: Configuration, logging_handlers: Iterable[logging.Handler] = ()):
super(RedditCloner, self).__init__(args, logging_handlers)
super().__init__(args, logging_handlers)
def download(self):
for generator in self.reddit_lists:

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import subprocess
from os import environ

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from argparse import Namespace
@ -14,7 +13,7 @@ logger = logging.getLogger(__name__)
class Configuration(Namespace):
def __init__(self):
super(Configuration, self).__init__()
super().__init__()
self.authenticate = False
self.config = None
self.opts: Optional[str] = None
@ -64,7 +63,7 @@ class Configuration(Namespace):
self.parse_yaml_options(context.params["opts"])
for arg_key in context.params.keys():
if not hasattr(self, arg_key):
logger.warning(f"Ignoring an unknown CLI argument: {arg_key}")
logger.warning(f"Ignoring an unknown CLI argument: {arg_key!r}")
continue
val = context.params[arg_key]
if val is None or val == ():
@ -85,6 +84,6 @@ class Configuration(Namespace):
return
for arg_key, val in opts.items():
if not hasattr(self, arg_key):
logger.warning(f"Ignoring an unknown YAML argument: {arg_key}")
logger.warning(f"Ignoring an unknown YAML argument: {arg_key!r}")
continue
setattr(self, arg_key, val)

View File

@ -1,14 +1,13 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import configparser
import importlib.resources
import itertools
import logging
import logging.handlers
import platform
import re
import shutil
import socket
from abc import ABCMeta, abstractmethod
from collections.abc import Callable, Iterable, Iterator
from datetime import datetime
@ -22,6 +21,7 @@ import praw.exceptions
import praw.models
import prawcore
from bdfr import __version__
from bdfr import exceptions as errors
from bdfr.configuration import Configuration
from bdfr.download_filter import DownloadFilter
@ -65,7 +65,6 @@ class RedditConnector(metaclass=ABCMeta):
self.reddit_lists = self.retrieve_reddit_lists()
def _setup_internal_objects(self):
self.parse_disabled_modules()
self.download_filter = self.create_download_filter()
@ -77,6 +76,7 @@ class RedditConnector(metaclass=ABCMeta):
self.file_name_formatter = self.create_file_name_formatter()
logger.log(9, "Create file name formatter")
self.user_agent = praw.const.USER_AGENT_FORMAT.format(":".join([platform.uname()[0], __package__, __version__]))
self.create_reddit_instance()
self.args.user = list(filter(None, [self.resolve_user_name(user) for user in self.args.user]))
@ -117,7 +117,7 @@ class RedditConnector(metaclass=ABCMeta):
self.args.filename_restriction_scheme = self.cfg_parser.get(
"DEFAULT", "filename_restriction_scheme", fallback=None
)
logger.debug(f"Setting filename restriction scheme to '{self.args.filename_restriction_scheme}'")
logger.debug(f"Setting filename restriction scheme to {self.args.filename_restriction_scheme!r}")
# Update config on disk
with Path(self.config_location).open(mode="w") as file:
self.cfg_parser.write(file)
@ -127,7 +127,7 @@ class RedditConnector(metaclass=ABCMeta):
disabled_modules = self.split_args_input(disabled_modules)
disabled_modules = {name.strip().lower() for name in disabled_modules}
self.args.disable_module = disabled_modules
logger.debug(f'Disabling the following modules: {", ".join(self.args.disable_module)}')
logger.debug(f"Disabling the following modules: {', '.join(self.args.disable_module)}")
def create_reddit_instance(self):
if self.args.authenticate:
@ -140,6 +140,7 @@ class RedditConnector(metaclass=ABCMeta):
scopes,
self.cfg_parser.get("DEFAULT", "client_id"),
self.cfg_parser.get("DEFAULT", "client_secret"),
user_agent=self.user_agent,
)
token = oauth2_authenticator.retrieve_new_token()
self.cfg_parser["DEFAULT"]["user_token"] = token
@ -151,7 +152,7 @@ class RedditConnector(metaclass=ABCMeta):
self.reddit_instance = praw.Reddit(
client_id=self.cfg_parser.get("DEFAULT", "client_id"),
client_secret=self.cfg_parser.get("DEFAULT", "client_secret"),
user_agent=socket.gethostname(),
user_agent=self.user_agent,
token_manager=token_manager,
)
else:
@ -160,7 +161,7 @@ class RedditConnector(metaclass=ABCMeta):
self.reddit_instance = praw.Reddit(
client_id=self.cfg_parser.get("DEFAULT", "client_id"),
client_secret=self.cfg_parser.get("DEFAULT", "client_secret"),
user_agent=socket.gethostname(),
user_agent=self.user_agent,
)
def retrieve_reddit_lists(self) -> list[praw.models.ListingGenerator]:
@ -241,7 +242,7 @@ class RedditConnector(metaclass=ABCMeta):
pattern = re.compile(r"^(?:https://www\.reddit\.com/)?(?:r/)?(.*?)/?$")
match = re.match(pattern, subreddit)
if not match:
raise errors.BulkDownloaderException(f"Could not find subreddit name in string {subreddit}")
raise errors.BulkDownloaderException(f"Could not find subreddit name in string {subreddit!r}")
return match.group(1)
@staticmethod
@ -287,7 +288,7 @@ class RedditConnector(metaclass=ABCMeta):
)
)
logger.debug(
f'Added submissions from subreddit {reddit} with the search term "{self.args.search}"'
f"Added submissions from subreddit {reddit} with the search term {self.args.search!r}"
)
else:
out.append(self.create_filtered_listing_generator(reddit))
@ -303,7 +304,7 @@ class RedditConnector(metaclass=ABCMeta):
logger.log(9, f"Resolved user to {resolved_name}")
return resolved_name
else:
logger.warning('To use "me" as a user, an authenticated Reddit instance must be used')
logger.warning("To use 'me' as a user, an authenticated Reddit instance must be used")
else:
return in_name

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
import re
@ -34,9 +33,9 @@ class DownloadFilter:
if not self.excluded_extensions:
return True
combined_extensions = "|".join(self.excluded_extensions)
pattern = re.compile(r".*({})$".format(combined_extensions))
pattern = re.compile(rf".*({combined_extensions})$")
if re.match(pattern, resource_extension):
logger.log(9, f'Url "{resource_extension}" matched with "{pattern}"')
logger.log(9, f"Url {resource_extension!r} matched with {pattern!r}")
return False
else:
return True
@ -45,9 +44,9 @@ class DownloadFilter:
if not self.excluded_domains:
return True
combined_domains = "|".join(self.excluded_domains)
pattern = re.compile(r"https?://.*({}).*".format(combined_domains))
pattern = re.compile(rf"https?://.*({combined_domains}).*")
if re.match(pattern, url):
logger.log(9, f'Url "{url}" matched with "{pattern}"')
logger.log(9, f"Url {url!r} matched with {pattern!r}")
return False
else:
return True

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import hashlib
import logging.handlers
@ -26,7 +25,7 @@ logger = logging.getLogger(__name__)
def _calc_hash(existing_file: Path):
chunk_size = 1024 * 1024
md5_hash = hashlib.md5()
md5_hash = hashlib.md5(usedforsecurity=False)
with existing_file.open("rb") as file:
chunk = file.read(chunk_size)
while chunk:
@ -38,7 +37,7 @@ def _calc_hash(existing_file: Path):
class RedditDownloader(RedditConnector):
def __init__(self, args: Configuration, logging_handlers: Iterable[logging.Handler] = ()):
super(RedditDownloader, self).__init__(args, logging_handlers)
super().__init__(args, logging_handlers)
if self.args.search_existing:
self.master_hash_list = self.scan_existing_files(self.download_directory)
@ -67,7 +66,7 @@ class RedditDownloader(RedditConnector):
):
logger.debug(
f"Submission {submission.id} in {submission.subreddit.display_name} skipped"
f' due to {submission.author.name if submission.author else "DELETED"} being an ignored user'
f" due to {submission.author.name if submission.author else 'DELETED'} being an ignored user"
)
return
elif self.args.min_score and submission.score < self.args.min_score:
@ -124,12 +123,12 @@ class RedditDownloader(RedditConnector):
)
return
resource_hash = res.hash.hexdigest()
destination.parent.mkdir(parents=True, exist_ok=True)
if resource_hash in self.master_hash_list:
if self.args.no_dupes:
logger.info(f"Resource hash {resource_hash} from submission {submission.id} downloaded elsewhere")
return
elif self.args.make_hard_links:
destination.parent.mkdir(parents=True, exist_ok=True)
try:
destination.hardlink_to(self.master_hash_list[resource_hash])
except AttributeError:
@ -139,6 +138,7 @@ class RedditDownloader(RedditConnector):
f" in submission {submission.id}"
)
return
destination.parent.mkdir(parents=True, exist_ok=True)
try:
with destination.open("wb") as file:
file.write(res.content)
@ -156,7 +156,7 @@ class RedditDownloader(RedditConnector):
@staticmethod
def scan_existing_files(directory: Path) -> dict[str, Path]:
files = []
for (dirpath, _dirnames, filenames) in os.walk(directory):
for dirpath, _dirnames, filenames in os.walk(directory):
files.extend([Path(dirpath, file) for file in filenames])
logger.info(f"Calculating hashes for {len(files)} files")

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
class BulkDownloaderException(Exception):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import datetime
import logging
@ -38,7 +37,7 @@ class FileNameFormatter:
restriction_scheme: Optional[str] = None,
):
if not self.validate_string(file_format_string):
raise BulkDownloaderException(f'"{file_format_string}" is not a valid format string')
raise BulkDownloaderException(f"{file_format_string!r} is not a valid format string")
self.file_format_string = file_format_string
self.directory_format_string: list[str] = directory_format_string.split("/")
self.time_format_string = time_format_string
@ -154,6 +153,7 @@ class FileNameFormatter:
max_path_length = max_path - len(ending) - len(str(root)) - 1
out = Path(root, filename + ending)
safe_ending = re.match(r".*\..*", ending)
while any(
[
len(filename) > max_file_part_length_chars,
@ -162,6 +162,8 @@ class FileNameFormatter:
]
):
filename = filename[:-1]
if not safe_ending and filename[-1] != ".":
filename = filename[:-1] + "."
out = Path(root, filename + ending)
return out

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import configparser
import logging
@ -17,22 +16,27 @@ logger = logging.getLogger(__name__)
class OAuth2Authenticator:
def __init__(self, wanted_scopes: set[str], client_id: str, client_secret: str):
self._check_scopes(wanted_scopes)
def __init__(self, wanted_scopes: set[str], client_id: str, client_secret: str, user_agent: str):
self._check_scopes(wanted_scopes, user_agent)
self.scopes = wanted_scopes
self.client_id = client_id
self.client_secret = client_secret
@staticmethod
def _check_scopes(wanted_scopes: set[str]):
response = requests.get(
"https://www.reddit.com/api/v1/scopes.json", headers={"User-Agent": "fetch-scopes test"}
)
def _check_scopes(wanted_scopes: set[str], user_agent: str):
try:
response = requests.get(
"https://www.reddit.com/api/v1/scopes.json",
headers={"User-Agent": user_agent},
timeout=10,
)
except TimeoutError:
raise BulkDownloaderException("Reached timeout fetching scopes")
known_scopes = [scope for scope, data in response.json().items()]
known_scopes.append("*")
for scope in wanted_scopes:
if scope not in known_scopes:
raise BulkDownloaderException(f"Scope {scope} is not known to reddit")
raise BulkDownloaderException(f"Scope {scope!r} is not known to reddit")
@staticmethod
def split_scopes(scopes: str) -> set[str]:
@ -58,10 +62,10 @@ class OAuth2Authenticator:
if state != params["state"]:
self.send_message(client)
raise RedditAuthenticationError(f'State mismatch in OAuth2. Expected: {state} Received: {params["state"]}')
raise RedditAuthenticationError(f"State mismatch in OAuth2. Expected: {state} Received: {params['state']}")
elif "error" in params:
self.send_message(client)
raise RedditAuthenticationError(f'Error in OAuth2: {params["error"]}')
raise RedditAuthenticationError(f"Error in OAuth2: {params['error']}")
self.send_message(client, "<script>alert('You can go back to terminal window now.')</script>")
refresh_token = reddit.auth.authorize(params["code"])
@ -83,13 +87,13 @@ class OAuth2Authenticator:
@staticmethod
def send_message(client: socket.socket, message: str = ""):
client.send(f"HTTP/1.1 200 OK\r\n\r\n{message}".encode("utf-8"))
client.send(f"HTTP/1.1 200 OK\r\n\r\n{message}".encode())
client.close()
class OAuth2TokenManager(praw.reddit.BaseTokenManager):
def __init__(self, config: configparser.ConfigParser, config_location: Path):
super(OAuth2TokenManager, self).__init__()
super().__init__()
self.config = config
self.config_location = config_location

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import hashlib
import logging
@ -49,7 +48,7 @@ class Resource:
self.create_hash()
def create_hash(self):
self.hash = hashlib.md5(self.content)
self.hash = hashlib.md5(self.content, usedforsecurity=False)
def _determine_extension(self) -> Optional[str]:
extension_pattern = re.compile(r".*(\..{3,5})$")
@ -68,7 +67,7 @@ class Resource:
max_wait_time = 300
while True:
try:
response = requests.get(url, headers=headers)
response = requests.get(url, headers=headers, timeout=10)
if re.match(r"^2\d{2}", str(response.status_code)) and response.content:
return response.content
elif response.status_code in (408, 429):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import configparser

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from abc import ABC, abstractmethod
@ -28,10 +27,39 @@ class BaseDownloader(ABC):
@staticmethod
def retrieve_url(url: str, cookies: dict = None, headers: dict = None) -> requests.Response:
try:
res = requests.get(url, cookies=cookies, headers=headers)
res = requests.get(url, cookies=cookies, headers=headers, timeout=10)
except requests.exceptions.RequestException as e:
logger.exception(e)
raise SiteDownloaderError(f"Failed to get page {url}")
except TimeoutError as e:
logger.exception(e)
raise SiteDownloaderError(f"Timeout reached attempting to get page {url}")
if res.status_code != 200:
raise ResourceNotFound(f"Server responded with {res.status_code} at {url}")
return res
@staticmethod
def post_url(url: str, cookies: dict = None, headers: dict = None, payload: dict = None) -> requests.Response:
try:
res = requests.post(url, cookies=cookies, headers=headers, json=payload, timeout=10)
except requests.exceptions.RequestException as e:
logger.exception(e)
raise SiteDownloaderError(f"Failed to post to {url}")
except TimeoutError as e:
logger.exception(e)
raise SiteDownloaderError(f"Timeout reached attempting to post to page {url}")
if res.status_code != 200:
raise ResourceNotFound(f"Server responded with {res.status_code} to {url}")
return res
@staticmethod
def head_url(url: str, cookies: dict = None, headers: dict = None) -> requests.Response:
try:
res = requests.head(url, cookies=cookies, headers=headers, timeout=10)
except requests.exceptions.RequestException as e:
logger.exception(e)
raise SiteDownloaderError(f"Failed to check head at {url}")
except TimeoutError as e:
logger.exception(e)
raise SiteDownloaderError(f"Timeout reached attempting to check head at {url}")
return res

View File

@ -0,0 +1,39 @@
import logging
from itertools import chain
from typing import Optional
import bs4
from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
from bdfr.resource import Resource
from bdfr.site_authenticator import SiteAuthenticator
from bdfr.site_downloaders.base_downloader import BaseDownloader
logger = logging.getLogger(__name__)
class Catbox(BaseDownloader):
def __init__(self, post: Submission) -> None:
super().__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
links = self.get_links(self.post.url)
if not links:
raise SiteDownloaderError("Catbox parser could not find any links")
links = [Resource(self.post, link, Resource.retry_download(link)) for link in links]
return links
@staticmethod
def get_links(url: str) -> set[str]:
content = Catbox.retrieve_url(url)
soup = bs4.BeautifulSoup(content.text, "html.parser")
collection_div = soup.find("div", attrs={"class": "imagecontainer"})
images = collection_div.find_all("a")
images = [link.get("href") for link in images]
videos = collection_div.find_all("video")
videos = [link.get("src") for link in videos]
audios = collection_div.find_all("audio")
audios = [link.get("src") for link in audios]
resources = chain(images, videos, audios)
return set(resources)

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from typing import Optional

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from typing import Optional

View File

@ -1,11 +1,11 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import re
import urllib.parse
from bdfr.exceptions import NotADownloadableLinkError
from bdfr.site_downloaders.base_downloader import BaseDownloader
from bdfr.site_downloaders.catbox import Catbox
from bdfr.site_downloaders.delay_for_reddit import DelayForReddit
from bdfr.site_downloaders.direct import Direct
from bdfr.site_downloaders.erome import Erome
@ -27,7 +27,7 @@ class DownloadFactory:
sanitised_url = DownloadFactory.sanitise_url(url).lower()
if re.match(r"(i\.|m\.|o\.)?imgur", sanitised_url):
return Imgur
elif re.match(r"(i\.|thumbs\d\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
elif re.match(r"(i\.|thumbs\d{1,2}\.|v\d\.)?(redgifs|gifdeliverynetwork)", sanitised_url):
return Redgifs
elif re.match(r"(thumbs\.|giant\.)?gfycat\.", sanitised_url):
return Gfycat
@ -37,6 +37,8 @@ class DownloadFactory:
return Direct
elif re.match(r"erome\.com.*", sanitised_url):
return Erome
elif re.match(r"catbox\.moe", sanitised_url):
return Catbox
elif re.match(r"delayforreddit\.com", sanitised_url):
return DelayForReddit
elif re.match(r"reddit\.com/gallery/.*", sanitised_url):
@ -83,7 +85,7 @@ class DownloadFactory:
"php3",
"xhtml",
)
if re.match(rf'(?i).*/.*\.({"|".join(web_extensions)})$', url):
if re.match(rf"(?i).*/.*\.({'|'.join(web_extensions)})$", url):
return True
else:
return False

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
import re
@ -38,7 +37,7 @@ class Erome(BaseDownloader):
def _get_links(url: str) -> set[str]:
page = Erome.retrieve_url(url)
soup = bs4.BeautifulSoup(page.text, "html.parser")
front_images = soup.find_all("img", attrs={"class": "lasyload"})
front_images = soup.find_all("img", attrs={"class": "img-front"})
out = [im.get("data-src") for im in front_images]
videos = soup.find_all("source")

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from abc import ABC, abstractmethod

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from typing import Optional
@ -17,7 +16,7 @@ logger = logging.getLogger(__name__)
class YtdlpFallback(BaseFallbackDownloader, Youtube):
def __init__(self, post: Submission):
super(YtdlpFallback, self).__init__(post)
super().__init__(post)
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
out = Resource(

View File

@ -1,10 +1,8 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from typing import Optional
import requests
from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
@ -42,8 +40,7 @@ class Gallery(BaseDownloader):
possible_extensions = (".jpg", ".png", ".gif", ".gifv", ".jpeg")
for extension in possible_extensions:
test_url = f"https://i.redd.it/{image_id}{extension}"
response = requests.head(test_url)
if response.status_code == 200:
if Gallery.head_url(test_url).status_code == 200:
out.append(test_url)
break
return out

View File

@ -1,11 +1,10 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
import re
from typing import Optional
from bs4 import BeautifulSoup
from cachetools import TTLCache, cached
from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
@ -21,6 +20,20 @@ class Gfycat(Redgifs):
def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> list[Resource]:
return super().find_resources(authenticator)
@staticmethod
@cached(cache=TTLCache(maxsize=5, ttl=3420))
def _get_auth_token() -> str:
headers = {
"content-type": "text/plain;charset=UTF-8",
"host": "weblogin.gfycat.com",
"origin": "https://gfycat.com",
}
payload = {"access_key": "Anr96uuqt9EdamSCwK4txKPjMsf2M95Rfa5FLLhPFucu8H5HTzeutyAa"}
token = json.loads(
Gfycat.post_url("https://weblogin.gfycat.com/oauth/webtoken", headers=headers, payload=payload).text
)["access_token"]
return token
@staticmethod
def _get_link(url: str) -> set[str]:
gfycat_id = re.match(r".*/(.*?)(?:/?|-.*|\..{3-4})$", url).group(1)
@ -28,18 +41,33 @@ class Gfycat(Redgifs):
response = Gfycat.retrieve_url(url)
if re.search(r"(redgifs|gifdeliverynetwork)", response.url):
url = url.lower() # Fixes error with old gfycat/redgifs links
url = url.lower()
return Redgifs._get_link(url)
soup = BeautifulSoup(response.text, "html.parser")
content = soup.find("script", attrs={"data-react-helmet": "true", "type": "application/ld+json"})
auth_token = Gfycat._get_auth_token()
if not auth_token:
raise SiteDownloaderError("Unable to retrieve Gfycat API token")
headers = {
"referer": "https://gfycat.com/",
"origin": "https://gfycat.com",
"content-type": "application/json",
"Authorization": f"Bearer {auth_token}",
}
content = Gfycat.retrieve_url(f"https://api.gfycat.com/v1/gfycats/{gfycat_id}", headers=headers)
if content is None:
raise SiteDownloaderError("Could not read the API source")
try:
out = json.loads(content.contents[0])["video"]["contentUrl"]
response_json = json.loads(content.text)
except json.JSONDecodeError as e:
raise SiteDownloaderError(f"Received data was not valid JSON: {e}")
try:
out = response_json["gfyItem"]["mp4Url"]
except (IndexError, KeyError, AttributeError) as e:
raise SiteDownloaderError(f"Failed to download Gfycat link {url}: {e}")
except json.JSONDecodeError as e:
raise SiteDownloaderError(f"Did not receive valid JSON data: {e}")
return {
out,
}

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
import re
@ -41,7 +40,7 @@ class Imgur(BaseDownloader):
if link.endswith("/"):
link = link.removesuffix("/")
if re.search(r".*/(.*?)(gallery/|a/)", link):
imgur_id = re.match(r".*/(?:gallery/|a/)(.*?)(?:/.*)?$", link).group(1)
imgur_id = re.match(r".*/(?:gallery/|a/)(.*?)(?:/.*|\..{3,4})?$", link).group(1)
link = f"https://api.imgur.com/3/album/{imgur_id}"
else:
imgur_id = re.match(r".*/(.*?)(?:_d)?(?:\..{0,})?$", link).group(1)

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from typing import Optional

View File

@ -1,11 +1,10 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import json
import re
from typing import Optional
import requests
from cachetools import TTLCache, cached
from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
@ -22,14 +21,19 @@ class Redgifs(BaseDownloader):
media_urls = self._get_link(self.post.url)
return [Resource(self.post, m, Resource.retry_download(m), None) for m in media_urls]
@staticmethod
@cached(cache=TTLCache(maxsize=5, ttl=82080))
def _get_auth_token() -> str:
token = json.loads(Redgifs.retrieve_url("https://api.redgifs.com/v2/auth/temporary").text)["token"]
return token
@staticmethod
def _get_id(url: str) -> str:
try:
if url.endswith("/"):
url = url.removesuffix("/")
redgif_id = re.match(r".*/(.*?)(?:#.*|\?.*|\..{0,})?$", url).group(1).lower()
if redgif_id.endswith("-mobile"):
redgif_id = redgif_id.removesuffix("-mobile")
redgif_id = re.sub(r"(-.*)$", "", redgif_id)
except AttributeError:
raise SiteDownloaderError(f"Could not extract Redgifs ID from {url}")
return redgif_id
@ -38,7 +42,7 @@ class Redgifs(BaseDownloader):
def _get_link(url: str) -> set[str]:
redgif_id = Redgifs._get_id(url)
auth_token = json.loads(Redgifs.retrieve_url("https://api.redgifs.com/v2/auth/temporary").text)["token"]
auth_token = Redgifs._get_auth_token()
if not auth_token:
raise SiteDownloaderError("Unable to retrieve Redgifs API token")
@ -48,7 +52,6 @@ class Redgifs(BaseDownloader):
"content-type": "application/json",
"Authorization": f"Bearer {auth_token}",
}
content = Redgifs.retrieve_url(f"https://api.redgifs.com/v2/gifs/{redgif_id}", headers=headers)
if content is None:
@ -62,15 +65,13 @@ class Redgifs(BaseDownloader):
out = set()
try:
if response_json["gif"]["type"] == 1: # type 1 is a video
if requests.get(response_json["gif"]["urls"]["hd"], headers=headers).ok:
if Redgifs.head_url(response_json["gif"]["urls"]["hd"], headers=headers).status_code == 200:
out.add(response_json["gif"]["urls"]["hd"])
else:
out.add(response_json["gif"]["urls"]["sd"])
elif response_json["gif"]["type"] == 2: # type 2 is an image
if response_json["gif"]["gallery"]:
content = Redgifs.retrieve_url(
f'https://api.redgifs.com/v2/gallery/{response_json["gif"]["gallery"]}'
)
if gallery := response_json["gif"]["gallery"]:
content = Redgifs.retrieve_url(f"https://api.redgifs.com/v2/gallery/{gallery}")
response_json = json.loads(content.text)
out = {p["urls"]["hd"] for p in response_json["gifs"]}
else:
@ -80,7 +81,4 @@ class Redgifs(BaseDownloader):
except (KeyError, AttributeError):
raise SiteDownloaderError("Failed to find JSON data in page")
# Update subdomain if old one is returned
out = {re.sub("thumbs2", "thumbs3", link) for link in out}
out = {re.sub("thumbs3", "thumbs4", link) for link in out}
return out

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from typing import Optional

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import itertools
import logging
@ -7,7 +6,6 @@ import re
from typing import Optional
import bs4
import requests
from praw.models import Submission
from bdfr.exceptions import SiteDownloaderError
@ -37,7 +35,7 @@ class Vidble(BaseDownloader):
if not re.search(r"vidble.com/(show/|album/|watch\?v)", url):
url = re.sub(r"/(\w*?)$", r"/show/\1", url)
page = requests.get(url)
page = Vidble.retrieve_url(url)
soup = bs4.BeautifulSoup(page.text, "html.parser")
content_div = soup.find("div", attrs={"id": "ContentPlaceHolder1_divContent"})
images = content_div.find_all("img")

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
from typing import Optional

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
import tempfile

View File

@ -10,12 +10,16 @@ All communication on GitHub, Discord, email, or any other medium must conform to
**Before opening a new issue**, be sure that no issues regarding your problem already exist. If a similar issue exists, try to contribute to the issue.
**If you are asking a question** about the functioning of the BDFR or the interface, please use the discussions page. Bug reports are not the right medium for asking and answering questions, and the discussions page makes it much easier to discuss, answer, and save questions and responses for others going forwards.
### Bugs
When opening an issue about a bug, **please provide the full log file for the run in which the bug occurred**. This log file is named `log_output.txt` in the configuration folder. Check the [README](../README.md) for information on where this is. This log file will contain all the information required for the developers to recreate the bug.
If you do not have or cannot find the log file, then at minimum please provide the **Reddit ID for the submission** or comment which caused the issue. Also copy in the command that you used to run the BDFR from the command line, as that will also provide helpful information when trying to find and fix the bug. If needed, more information will be asked in the thread of the bug.
Adding this information is **not optional**. If a bug report is opened without this information, it cannot be replicated by developers. The logs will be asked for once and if they are not supplied, the issue will be closed due to lack of information.
### Feature requests
In the case of requesting a feature or an enhancement, there are fewer requirements. However, please be clear in what you would like the BDFR to do and also how the feature/enhancement would be used or would be useful to more people. It is crucial that the feature is justified. Any feature request without a concrete reason for it to be implemented has a very small chance to get accepted. Be aware that proposed enhancements may be rejected for multiple reasons, or no reason, at the discretion of the developers.

View File

@ -25,27 +25,28 @@ classifiers = [
dependencies = [
"appdirs>=1.4.4",
"beautifulsoup4>=4.10.0",
"cachetools>=5.3.0",
"click>=8.0.0",
"dict2xml>=1.7.0",
"praw>=7.2.0",
"pyyaml>=5.4.1",
"requests>=2.25.1",
"yt-dlp>=2022.11.11",
"requests>=2.28.2",
"yt-dlp>=2023.2.17",
]
dynamic = ["version"]
[tool.setuptools]
dynamic = {"version" = {attr = 'bdfr.__version__'}}
dynamic = {"version" = {attr = "bdfr.__version__"}}
packages = ["bdfr", "bdfr.archive_entry", "bdfr.site_downloaders", "bdfr.site_downloaders.fallback_downloaders",]
data-files = {"config" = ["bdfr/default_config.cfg",]}
[project.optional-dependencies]
dev = [
"black>=22.12.0",
"black>=23.1.0",
"Flake8-pyproject>=1.2.2",
"isort>=5.11.4",
"pre-commit>=2.20.0",
"pytest>=7.1.0",
"isort>=5.12.0",
"pre-commit>=3.0.4",
"pytest>=7.2.1",
"tox>=3.27.1",
]
@ -64,7 +65,7 @@ bdfr-download = "bdfr.__main__:cli_download"
line-length = 120
[tool.flake8]
exclude = ["scripts"]
exclude = ["scripts/tests"]
max-line-length = 120
show-source = true
statistics = true

View File

@ -2,11 +2,11 @@
Due to the verboseness of the logs, a great deal of information can be gathered quite easily from the BDFR's logfiles. In this folder, there is a selection of scripts that parse these logs, scraping useful bits of information. Since the logfiles are recurring patterns of strings, it is a fairly simple matter to write scripts that utilise tools included on most Linux systems.
- [Script to extract all successfully downloaded IDs](#extract-all-successfully-downloaded-ids)
- [Script to extract all failed download IDs](#extract-all-failed-ids)
- [Timestamp conversion](#converting-bdfrv1-timestamps-to-bdfrv2-timestamps)
- [Printing summary statistics for a run](#printing-summary-statistics)
- [Unsaving posts from your account after downloading](#unsave-posts-after-downloading)
- [Script to extract all successfully downloaded IDs](#extract-all-successfully-downloaded-ids)
- [Script to extract all failed download IDs](#extract-all-failed-ids)
- [Timestamp conversion](#converting-bdfrv1-timestamps-to-bdfrv2-timestamps)
- [Printing summary statistics for a run](#printing-summary-statistics)
- [Unsaving posts from your account after downloading](#unsave-posts-after-downloading)
## Extract all Successfully Downloaded IDs
@ -15,7 +15,7 @@ This script is contained [here](extract_successful_ids.sh) and will result in a
The script can be used with the following signature:
```bash
./extract_successful_ids.sh LOGFILE_LOCATION <OUTPUT_FILE>
./extract_successful_ids.sh LOGFILE_LOCATION >> <OUTPUT_FILE>
```
By default, if the second argument is not supplied, the script will write the results to `successful.txt`.
@ -32,7 +32,7 @@ An example of the script being run on a Linux machine is the following:
The script can be used with the following signature:
```bash
./extract_failed_ids.sh LOGFILE_LOCATION <OUTPUT_FILE>
./extract_failed_ids.sh LOGFILE_LOCATION >> <OUTPUT_FILE>
```
By default, if the second argument is not supplied, the script will write the results to `failed.txt`.
@ -72,19 +72,20 @@ Submissions from excluded subreddits: 0
## Unsave Posts After Downloading
[This script](unsaveposts.py) takes a list of submission IDs from a file named `successfulids` created with the `extract_successful_ids.sh` script and unsaves them from your account. To make it work you will need to make a user script in your reddit profile like this:
- Fill in the username and password fields in the script. Make sure you keep the quotes around the fields.
- Go to https://old.reddit.com/prefs/apps/
- Click on `Develop an app` at the bottom.
- Make sure you select a `script` not a `web app`.
- Name it `Unsave Posts`.
- Fill in the `Redirect URI` field with `127.0.0.0`.
- Save it.
- Fill in the `client_id` and `client_secret` fields on the script. The client ID is the 14 character string under the name you gave your script. .It'll look like a bunch of random characters like this: pspYLwDoci9z_A. The client secret is the longer string next to "secret". Again keep the quotes around the fields.
- Fill in the username and password fields in the script. Make sure you keep the quotes around the fields.
- Go to <https://old.reddit.com/prefs/apps/>
- Click on `Develop an app` at the bottom.
- Make sure you select a `script` not a `web app`.
- Name it `Unsave Posts`.
- Fill in the `Redirect URI` field with `127.0.0.0`.
- Save it.
- Fill in the `client_id` and `client_secret` fields on the script. The client ID is the 14 character string under the name you gave your script. .It'll look like a bunch of random characters like this: pspYLwDoci9z_A. The client secret is the longer string next to "secret". Again keep the quotes around the fields.
Now the script is ready tu run. Just execute it like this:
```bash
python3.9 -m bdfr download DOWNLOAD_DIR --authenticate --user me --saved --log LOGFILE_LOCATION
bdfr download DOWNLOAD_DIR --authenticate --user me --saved --log LOGFILE_LOCATION
./extract_successful_ids.sh LOGFILE_LOCATION > successfulids
./unsaveposts.py
```

View File

@ -1,21 +1,13 @@
if (Test-Path -Path $args[0] -PathType Leaf) {
$file=$args[0]
}
else {
Write-Host "CANNOT FIND LOG FILE"
if (($args[0] -eq $null) -or -Not (Test-Path -Path $args[0] -PathType Leaf)) {
Write-Output "CANNOT FIND LOG FILE"
Exit 1
}
if ($null -ne $args[1]) {
$output=$args[1]
Write-Host "Outputting IDs to $output"
}
else {
$output="./failed.txt"
elseif (Test-Path -Path $args[0] -PathType Leaf) {
$file=$args[0]
}
Select-String -Path $file -Pattern "Could not download submission" | ForEach-Object { -split $_.Line | Select-Object -Skip 11 | Select-Object -First 1 } | ForEach-Object { $_.substring(0,$_.Length-1) } >> $output
Select-String -Path $file -Pattern "Failed to download resource" | ForEach-Object { -split $_.Line | Select-Object -Skip 14 | Select-Object -First 1 } >> $output
Select-String -Path $file -Pattern "failed to download submission" | ForEach-Object { -split $_.Line | Select-Object -Skip 13 | Select-Object -First 1 } | ForEach-Object { $_.substring(0,$_.Length-1) } >> $output
Select-String -Path $file -Pattern "Failed to write file" | ForEach-Object { -split $_.Line | Select-Object -Skip 13 | Select-Object -First 1 } >> $output
Select-String -Path $file -Pattern "skipped due to disabled module" | ForEach-Object { -split $_.Line | Select-Object -Skip 8 | Select-Object -First 1 } >> $output
Select-String -Path $file -Pattern "Could not download submission" | ForEach-Object { -split $_.Line | Select-Object -Skip 11 | Select-Object -First 1 } | ForEach-Object { $_.substring(0,$_.Length-1) }
Select-String -Path $file -Pattern "Failed to download resource" | ForEach-Object { -split $_.Line | Select-Object -Skip 14 | Select-Object -First 1 }
Select-String -Path $file -Pattern "failed to download submission" | ForEach-Object { -split $_.Line | Select-Object -Skip 13 | Select-Object -First 1 } | ForEach-Object { $_.substring(0,$_.Length-1) }
Select-String -Path $file -Pattern "Failed to write file" | ForEach-Object { -split $_.Line | Select-Object -Skip 13 | Select-Object -First 1 }
Select-String -Path $file -Pattern "skipped due to disabled module" | ForEach-Object { -split $_.Line | Select-Object -Skip 8 | Select-Object -First 1 }

View File

@ -1,16 +1,16 @@
#!/bin/bash
if [ -e "$1" ]; then
if [ -e "$1" ] && [ -f "$1" ]; then
file="$1"
else
echo 'CANNOT FIND LOG FILE'
echo "CANNOT FIND LOG FILE"
exit 1
fi
{
grep 'Could not download submission' "$file" | awk '{ print $12 }' | rev | cut -c 2- | rev ;
grep 'Failed to download resource' "$file" | awk '{ print $15 }' ;
grep 'failed to download submission' "$file" | awk '{ print $14 }' | rev | cut -c 2- | rev ;
grep 'Failed to write file' "$file" | awk '{ print $14 }' ;
grep 'skipped due to disabled module' "$file" | awk '{ print $9 }' ;
grep "Could not download submission" "$file" | awk '{ print $12 }' | rev | cut -c 2- | rev ;
grep "Failed to download resource" "$file" | awk '{ print $15 }' ;
grep "failed to download submission" "$file" | awk '{ print $14 }' | rev | cut -c 2- | rev ;
grep "Failed to write file" "$file" | awk '{ print $14 }' ;
grep "skipped due to disabled module" "$file" | awk '{ print $9 }' ;
}

View File

@ -1,21 +1,14 @@
if (Test-Path -Path $args[0] -PathType Leaf) {
$file=$args[0]
}
else {
Write-Host "CANNOT FIND LOG FILE"
if (($args[0] -eq $null) -or -Not (Test-Path -Path $args[0] -PathType Leaf)) {
Write-Output "CANNOT FIND LOG FILE"
Exit 1
}
if ($null -ne $args[1]) {
$output=$args[1]
Write-Host "Outputting IDs to $output"
}
else {
$output="./successful.txt"
elseif (Test-Path -Path $args[0] -PathType Leaf) {
$file=$args[0]
}
Select-String -Path $file -Pattern "Downloaded submission" | ForEach-Object { -split $_.Line | Select-Object -Last 3 | Select-Object -SkipLast 2 } >> $output
Select-String -Path $file -Pattern "Resource hash" | ForEach-Object { -split $_.Line | Select-Object -Last 3 | Select-Object -SkipLast 2 } >> $output
Select-String -Path $file -Pattern "Download filter" | ForEach-Object { -split $_.Line | Select-Object -Last 4 | Select-Object -SkipLast 3 } >> $output
Select-String -Path $file -Pattern "already exists, continuing" | ForEach-Object { -split $_.Line | Select-Object -Last 4 | Select-Object -SkipLast 3 } >> $output
Select-String -Path $file -Pattern "Hard link made" | ForEach-Object { -split $_.Line | Select-Object -Last 1 } >> $output
Select-String -Path $file -Pattern "Downloaded submission" | ForEach-Object { -split $_.Line | Select-Object -Last 3 | Select-Object -SkipLast 2 }
Select-String -Path $file -Pattern "Resource hash" | ForEach-Object { -split $_.Line | Select-Object -Last 3 | Select-Object -SkipLast 2 }
Select-String -Path $file -Pattern "Download filter" | ForEach-Object { -split $_.Line | Select-Object -Last 4 | Select-Object -SkipLast 3 }
Select-String -Path $file -Pattern "already exists, continuing" | ForEach-Object { -split $_.Line | Select-Object -Last 4 | Select-Object -SkipLast 3 }
Select-String -Path $file -Pattern "Hard link made" | ForEach-Object { -split $_.Line | Select-Object -Last 1 }
Select-String -Path $file -Pattern "filtered due to score" | ForEach-Object { -split $_.Line | Select-Object -Index 8 }

View File

@ -1,17 +1,17 @@
#!/bin/bash
if [ -e "$1" ]; then
if [ -e "$1" ] && [ -f "$1" ]; then
file="$1"
else
echo 'CANNOT FIND LOG FILE'
echo "CANNOT FIND LOG FILE"
exit 1
fi
{
grep 'Downloaded submission' "$file" | awk '{ print $(NF-2) }' ;
grep 'Resource hash' "$file" | awk '{ print $(NF-2) }' ;
grep 'Download filter' "$file" | awk '{ print $(NF-3) }' ;
grep 'already exists, continuing' "$file" | awk '{ print $(NF-3) }' ;
grep 'Hard link made' "$file" | awk '{ print $(NF) }' ;
grep 'filtered due to score' "$file" | awk '{ print $9 }'
grep "Downloaded submission" "$file" | awk '{ print $(NF-2) }' ;
grep "Resource hash" "$file" | awk '{ print $(NF-2) }' ;
grep "Download filter" "$file" | awk '{ print $(NF-3) }' ;
grep "already exists, continuing" "$file" | awk '{ print $(NF-3) }' ;
grep "Hard link made" "$file" | awk '{ print $(NF) }' ;
grep "filtered due to score" "$file" | awk '{ print $9 }' ;
}

View File

@ -1,17 +1,9 @@
if (Test-Path -Path $args[0] -PathType Leaf) {
$file=$args[0]
}
else {
if (($args[0] -eq $null) -or -Not (Test-Path -Path $args[0] -PathType Leaf)) {
Write-Host "CANNOT FIND LOG FILE"
Exit 1
}
if ($null -ne $args[1]) {
$output=$args[1]
Write-Host "Outputting IDs to $output"
}
else {
$output="./successful.txt"
elseif (Test-Path -Path $args[0] -PathType Leaf) {
$file=$args[0]
}
Write-Host -NoNewline "Downloaded submissions: "

View File

@ -1,9 +1,9 @@
#!/bin/bash
if [ -e "$1" ]; then
if [ -e "$1" ] && [ -f "$1" ]; then
file="$1"
else
echo 'CANNOT FIND LOG FILE'
echo "CANNOT FIND LOG FILE"
exit 1
fi

View File

@ -1,2 +1,2 @@
[2022-07-23 14:04:14,095 - bdfr.downloader - DEBUG] - Submission ljyy27 filtered due to score 15 < [50]
[2022-07-23 14:04:14,104 - bdfr.downloader - DEBUG] - Submission ljyy27 filtered due to score 16 > [1]
[2022-07-23 14:04:14,104 - bdfr.downloader - DEBUG] - Submission ljyz27 filtered due to score 16 > [1]

View File

@ -0,0 +1,39 @@
Describe "extract_failed_ids" {
It "fail run no args" {
(..\extract_failed_ids.ps1) | Should -Be "CANNOT FIND LOG FILE"
}
It "fail run no logfile" {
(..\extract_failed_ids.ps1 missing.txt) | Should -Be "CANNOT FIND LOG FILE"
}
It "fail no downloader module" {
$down_error = (..\extract_failed_ids.ps1 example_logfiles\failed_no_downloader.txt)
$down_error | Should -HaveCount 3
$down_error | Should -Contain "nxv3ea"
}
It "fail resource error" {
$res_error = (..\extract_failed_ids.ps1 example_logfiles\failed_resource_error.txt)
$res_error | Should -HaveCount 1
$res_error | Should -Contain "nxv3dt"
}
It "fail site downloader error" {
$site_error = (..\extract_failed_ids.ps1 example_logfiles\failed_sitedownloader_error.txt)
$site_error | Should -HaveCount 2
$site_error | Should -Contain "nxpn0h"
}
It "fail failed file write" {
$write_error = (..\extract_failed_ids.ps1 example_logfiles\failed_write_error.txt)
$write_error | Should -HaveCount 1
$write_error | Should -Contain "nnboza"
}
It "fail disabled module" {
$disabled = (..\extract_failed_ids.ps1 example_logfiles\failed_disabled_module.txt)
$disabled | Should -HaveCount 1
$disabled | Should -Contain "m2601g"
}
}

View File

@ -0,0 +1,45 @@
Describe "extract_successful_ids" {
It "fail run no args" {
(..\extract_successful_ids.ps1) | Should -Be "CANNOT FIND LOG FILE"
}
It "fail run no logfile" {
(..\extract_successful_ids.ps1 missing.txt) | Should -Be "CANNOT FIND LOG FILE"
}
It "success downloaded submission" {
$down_success = (..\extract_successful_ids.ps1 example_logfiles\succeed_downloaded_submission.txt)
$down_success | Should -HaveCount 7
$down_success | Should -Contain "nn9cor"
}
It "success resource hash" {
$hash_success = (..\extract_successful_ids.ps1 example_logfiles\succeed_resource_hash.txt)
$hash_success | Should -HaveCount 1
$hash_success | Should -Contain "n86jk8"
}
It "success download filter" {
$filt_success = (..\extract_successful_ids.ps1 example_logfiles\succeed_download_filter.txt)
$filt_success | Should -HaveCount 3
$filt_success | Should -Contain "nxuxjy"
}
It "success already exists" {
$exist_success = (..\extract_successful_ids.ps1 example_logfiles\succeed_already_exists.txt)
$exist_success | Should -HaveCount 3
$exist_success | Should -Contain "nxrq9g"
}
It "success hard link" {
$link_success = (..\extract_successful_ids.ps1 example_logfiles\succeed_hard_link.txt)
$link_success | Should -HaveCount 1
$link_success | Should -Contain "nwnp2n"
}
It "success score filter" {
$score_success = (..\extract_successful_ids.ps1 example_logfiles\succeed_score_filter.txt)
$score_success | Should -HaveCount 2
$score_success | Should -Contain "ljyz27"
}
}

View File

@ -7,11 +7,16 @@ teardown() {
rm -f failed.txt
}
@test "fail run no logfile" {
@test "fail run no args" {
run ../extract_failed_ids.sh
assert_failure
}
@test "fail run no logfile" {
run ../extract_failed_ids.sh ./missing.txt
assert_failure
}
@test "fail no downloader module" {
run ../extract_failed_ids.sh ./example_logfiles/failed_no_downloader.txt
echo "$output" > failed.txt

View File

@ -7,6 +7,16 @@ teardown() {
rm -f successful.txt
}
@test "fail run no args" {
run ../extract_successful_ids.sh
assert_failure
}
@test "fail run no logfile" {
run ../extract_successful_ids.sh ./missing.txt
assert_failure
}
@test "success downloaded submission" {
run ../extract_successful_ids.sh ./example_logfiles/succeed_downloaded_submission.txt
echo "$output" > successful.txt

View File

@ -1,6 +1,6 @@
#! /usr/bin/env python3.9
'''
This script takes a list of submission IDs from a file named "successfulids" created with the
#!/usr/bin/env python3
"""
This script takes a list of submission IDs from a file named "successfulids" created with the
"extract_successful_ids.sh" script and unsaves them from your account. To make it work you must
fill in the username and password fields below. Make sure you keep the quotes around the fields.
You'll need to make a "user script" in your reddit profile to run this.
@ -14,12 +14,18 @@ The client ID is the 14 character string under the name you gave your script.
It'll look like a bunch of random characters like this: pspYLwDoci9z_A
The client secret is the longer string next to "secret".
Replace those two fields below. Again keep the quotes around the fields.
'''
"""
import praw
from pathlib import Path
try:
r= praw.Reddit(
import praw
import prawcore.exceptions
except ImportError:
print("Please install PRAW")
try:
reddit = praw.Reddit(
client_id="CLIENTID",
client_secret="CLIENTSECRET",
password="USERPASSWORD",
@ -27,14 +33,15 @@ try:
username="USERNAME",
)
with open("successfulids", "r") as f:
for item in f:
r.submission(id = item.strip()).unsave()
with Path("successfulids").open() as id_file:
for item in id_file:
reddit.submission(id=item.strip()).unsave()
except:
print("Something went wrong. Did you install PRAW? Did you change the user login fields?")
except FileNotFoundError:
print("ID file not found")
except prawcore.exceptions.ResponseException:
print("Something went wrong. Did you change the user login fields?")
else:
print("Done! Thanks for playing!")

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import praw
import pytest

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import praw
import pytest

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import configparser
import socket

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import re
import shutil

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import shutil
from pathlib import Path

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import shutil
from pathlib import Path
@ -186,7 +185,7 @@ def test_cli_download_user_data_bad_me_unauthenticated(test_args: list[str], tmp
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert 'To use "me" as a user, an authenticated Reddit instance must be used' in result.output
assert "To use 'me' as a user, an authenticated Reddit instance must be used" in result.output
@pytest.mark.online
@ -218,7 +217,7 @@ def test_cli_download_download_filters(test_args: list[str], tmp_path: Path):
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert any((string in result.output for string in ("Download filter removed ", "filtered due to URL")))
assert any(string in result.output for string in ("Download filter removed ", "filtered due to URL"))
@pytest.mark.online
@ -440,3 +439,17 @@ def test_cli_download_explicit_filename_restriction_scheme(test_args: list[str],
assert result.exit_code == 0
assert "Downloaded submission" in result.output
assert "Forcing Windows-compatible filenames" in result.output
@pytest.mark.online
@pytest.mark.reddit
@pytest.mark.skipif(not does_test_config_exist, reason="A test config file is required for integration tests")
@pytest.mark.parametrize("test_args", (["--link", "ehqt2g", "--link", "ehtuv8", "--no-dupes"],))
def test_cli_download_no_empty_dirs(test_args: list[str], tmp_path: Path):
runner = CliRunner()
test_args = create_basic_args_for_download_runner(test_args, tmp_path)
result = runner.invoke(cli, test_args)
assert result.exit_code == 0
assert "downloaded elsewhere" in result.output
assert Path(tmp_path, "EmpireDidNothingWrong").exists()
assert not Path(tmp_path, "StarWarsEU").exists()

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,2 +1 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock

View File

@ -0,0 +1,59 @@
from unittest.mock import Mock
import pytest
from bdfr.resource import Resource
from bdfr.site_downloaders.catbox import Catbox
@pytest.mark.online
@pytest.mark.parametrize(
("test_url", "expected"),
(
(
"https://catbox.moe/c/vel5eg",
{
"https://files.catbox.moe/h2dx9k.gif",
"https://files.catbox.moe/bc83lg.png",
"https://files.catbox.moe/aq3m2a.jpeg",
"https://files.catbox.moe/yfk8r7.jpeg",
"https://files.catbox.moe/34ofbz.png",
"https://files.catbox.moe/xx4lcw.mp4",
"https://files.catbox.moe/xocd6t.mp3",
},
),
),
)
def test_get_links(test_url: str, expected: set[str]):
results = Catbox.get_links(test_url)
assert results == expected
@pytest.mark.online
@pytest.mark.slow
@pytest.mark.parametrize(
("test_url", "expected_hashes"),
(
(
"https://catbox.moe/c/vel5eg",
{
"014762b38e280ef3c0d000cc5f2aa386",
"85799edf12e20876f37286784460ad1b",
"c71b88c4230aa3aaad52a644fb709737",
"f40cffededd1929726d9cd265cc42c67",
"bda1f646c49607183c2450441f2ea6e8",
"21b48729bf9be7884999442b73887eed",
"0ec327259733a8276c207cc6e1b001ad",
},
),
),
)
def test_download_resources(test_url: str, expected_hashes: set[str]):
mock_download = Mock()
mock_download.url = test_url
downloader = Catbox(mock_download)
results = downloader.find_resources()
assert all(isinstance(res, Resource) for res in results)
[res.download() for res in results]
hashes = {res.hash.hexdigest() for res in results}
assert hashes == set(expected_hashes)

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import Mock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import Mock
@ -14,10 +13,7 @@ from bdfr.site_downloaders.direct import Direct
("test_url", "expected_hash"),
(
("https://i.redd.it/q6ebualjxzea1.jpg", "6ec154859c777cb401132bb991cb3635"),
(
"https://file-examples.com/wp-content/uploads/2017/11/file_example_MP3_700KB.mp3",
"3caa342e241ddb7d76fd24a834094101",
),
("https://filesamples.com/samples/audio/mp3/sample3.mp3", "d30a2308f188cbb11d74cf20c357891c"),
),
)
def test_download_resource(test_url: str, expected_hash: str):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import praw
import pytest

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import re
from unittest.mock import MagicMock
@ -40,6 +39,7 @@ def test_get_link(test_url: str, expected_urls: tuple[str]):
(
("https://www.erome.com/a/vqtPuLXh", 1),
("https://www.erome.com/a/4tP3KI6F", 1),
("https://www.erome.com/a/WNyK674a", 41),
),
)
def test_download_resource(test_url: str, expected_hashes_len: int):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import praw
import pytest

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import Mock
@ -9,18 +8,28 @@ from bdfr.resource import Resource
from bdfr.site_downloaders.gfycat import Gfycat
@pytest.mark.online
def test_auth_cache():
auth1 = Gfycat._get_auth_token()
auth2 = Gfycat._get_auth_token()
assert auth1 == auth2
@pytest.mark.online
@pytest.mark.parametrize(
("test_url", "expected_url"),
(
("https://gfycat.com/definitivecaninecrayfish", "https://giant.gfycat.com/DefinitiveCanineCrayfish.mp4"),
("https://gfycat.com/dazzlingsilkyiguana", "https://giant.gfycat.com/DazzlingSilkyIguana.mp4"),
("https://gfycat.com/WearyComposedHairstreak", "https://thumbs4.redgifs.com/WearyComposedHairstreak.mp4"),
("https://gfycat.com/WearyComposedHairstreak", "https://thumbs44.redgifs.com/WearyComposedHairstreak.mp4"),
(
"https://thumbs.gfycat.com/ComposedWholeBullfrog-size_restricted.gif",
"https://thumbs4.redgifs.com/ComposedWholeBullfrog.mp4",
"https://thumbs44.redgifs.com/ComposedWholeBullfrog.mp4",
),
(
"https://giant.gfycat.com/ComposedWholeBullfrog.mp4",
"https://thumbs44.redgifs.com/ComposedWholeBullfrog.mp4",
),
("https://giant.gfycat.com/ComposedWholeBullfrog.mp4", "https://thumbs4.redgifs.com/ComposedWholeBullfrog.mp4"),
),
)
def test_get_link(test_url: str, expected_url: str):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import Mock
@ -50,6 +49,7 @@ from bdfr.site_downloaders.imgur import Imgur
("https://imgur.com/a/1qzfWtY/mp4", ("65fbc7ba5c3ed0e3af47c4feef4d3735",)),
("https://imgur.com/a/1qzfWtY/spqr", ("65fbc7ba5c3ed0e3af47c4feef4d3735",)),
("https://i.imgur.com/expO7Rc.gifv", ("e309f98158fc98072eb2ae68f947f421",)),
("https://i.imgur.com/a/aqpiMuL.gif", ("5b2a9a5218bf43dc26ba41389410c981",)),
),
)
def test_find_resources(test_url: str, expected_hashes: list[str]):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import re
from unittest.mock import Mock
@ -10,6 +9,13 @@ from bdfr.resource import Resource
from bdfr.site_downloaders.redgifs import Redgifs
@pytest.mark.online
def test_auth_cache():
auth1 = Redgifs._get_auth_token()
auth2 = Redgifs._get_auth_token()
assert auth1 == auth2
@pytest.mark.parametrize(
("test_url", "expected"),
(
@ -19,6 +25,7 @@ from bdfr.site_downloaders.redgifs import Redgifs
("https://thumbs4.redgifs.com/DismalIgnorantDrongo.mp4", "dismalignorantdrongo"),
("https://thumbs4.redgifs.com/DismalIgnorantDrongo-mobile.mp4", "dismalignorantdrongo"),
("https://v3.redgifs.com/watch/newilliteratemeerkat#rel=user%3Atastynova", "newilliteratemeerkat"),
("https://thumbs46.redgifs.com/BabyishCharmingAidi-medium.jpg", "babyishcharmingaidi"),
),
)
def test_get_id(test_url: str, expected: str):
@ -75,6 +82,7 @@ def test_get_link(test_url: str, expected: set[str]):
"44fb28f72ec9a5cca63fa4369ab4f672",
},
),
("https://thumbs46.redgifs.com/BabyishCharmingAidi-medium.jpg", {"bf14b9f3d5b630cb5fd271661226f1af"}),
),
)
def test_download_resource(test_url: str, expected_hashes: set[str]):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import praw
import pytest

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import Mock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock
@ -16,7 +15,7 @@ from bdfr.site_downloaders.youtube import Youtube
("test_url", "expected_hash"),
(
("https://www.youtube.com/watch?v=uSm2VDgRIUs", "2d60b54582df5b95ec72bb00b580d2ff"),
("https://www.youtube.com/watch?v=GcI7nxQj7HA", "5db0fc92a0a7fb9ac91e63505eea9cf0"),
("https://www.youtube.com/watch?v=NcA_j23HuDU", "26e6ca4849267e600ff474f4260c3b5b"),
),
)
def test_find_resources_good(test_url: str, expected_hash: str):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from pathlib import Path
from unittest.mock import MagicMock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
from pathlib import Path

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from collections.abc import Iterator
from datetime import datetime, timedelta

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import logging
import re
from pathlib import Path

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import platform
import sys
@ -519,3 +518,19 @@ def test_name_submission(
results = test_formatter.format_resource_paths(test_resources, Path())
results = set([r[0].name for r in results])
assert results == expected_names
@pytest.mark.parametrize(
("test_filename", "test_ending", "expected_end"),
(
("A" * 300 + ".", "_1.mp4", "A_1.mp4"),
("A" * 300 + ".", ".mp4", "A.mp4"),
("A" * 300 + ".", "mp4", "A.mp4"),
),
)
def test_shortened_file_name_ending(
test_filename: str, test_ending: str, expected_end: str, test_formatter: FileNameFormatter
):
result = test_formatter.limit_file_name_length(test_filename, test_ending, Path("."))
assert result.name.endswith(expected_end)
assert len(str(result)) <= FileNameFormatter.find_max_path_length()

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import configparser
from pathlib import Path
@ -34,7 +33,7 @@ def example_config() -> configparser.ConfigParser:
),
)
def test_check_scopes(test_scopes: set[str]):
OAuth2Authenticator._check_scopes(test_scopes)
OAuth2Authenticator._check_scopes(test_scopes, "fetch-scopes test")
@pytest.mark.parametrize(
@ -68,7 +67,7 @@ def test_split_scopes(test_scopes: str, expected: set[str]):
)
def test_check_scopes_bad(test_scopes: set[str]):
with pytest.raises(BulkDownloaderException):
OAuth2Authenticator._check_scopes(test_scopes)
OAuth2Authenticator._check_scopes(test_scopes, "fetch-scopes test")
def test_token_manager_read(example_config: configparser.ConfigParser):

View File

@ -1,5 +1,4 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from unittest.mock import MagicMock