-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Handle substantional Instagram requirements #2612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle substantional Instagram requirements #2612
Conversation
78610e5 to
4c61303
Compare
Provide handling for Instagram feedback_required/checkpoint_required/challenge_required responses.
4c61303 to
ad4f478
Compare
aandergr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. However the implementation itself would introduce some issues which I mentioned in inline comments here.
Given that there already is a _response_error() to build an error message, what would you think about the following way to handle the 400 status code instead?
if resp.status_code == 400:
with suppress(json.decoder.JSONDecodeError):
if resp.json().get("message") in [
"feedback_required",
"checkpoint_required",
"challenge_required",
]:
# Raise AbortDownloadException in case of substantial Instagram
# requirements to stop producing more requests
raise AbortDownloadException(self._response_error(resp))
raise QueryReturnedBadRequestException(self._response_error(resp))
instaloader/instaloadercontext.py
Outdated
| try: | ||
| resp_json = resp.json() | ||
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this try-except is written will just leave resp_json uninitialized when resp.json() fails, continuing with accesses to an uninitialized variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you are right, there should have been else
instaloader/instaloadercontext.py
Outdated
| if 'application/json' in content_type or not content_type: | ||
| try: | ||
| resp_json = resp.json() | ||
| except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resp.json() would actually raise json.decoder.JSONDecodeError. Though it is a ValueError, catching JSONDecodeError would make that part of the code more explanatory regarding what exactly is the error we're looking out for.
instaloader/instaloadercontext.py
Outdated
| content_type = resp.headers.get('Content-Type') | ||
| if 'application/json' in content_type or not content_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no added value in explicitly checking the content type. I suggest to directly call resp.json() and handle the case that it fails with a JSONDecodeError.
instaloader/instaloadercontext.py
Outdated
| resp_json = resp.json() | ||
| except ValueError: | ||
| pass | ||
| if 'feedback_required' in resp_json.get('message'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is "feedback_required"/"checkpoint_required"/"challenge_required", not a superstring of it, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm actually not sure about that, so I would better stick to a superstring version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just searched the issues to check on that (the message string is printed to the log), we have
- feedback_required: NOT ABLE TO DOWNLOAD BULK (6000+) REELS/PHOTOS AT ONCE #2616
- checkpoint_required: Profile.get_followers() doesn't seem to care about RateControl #2512
- challenge_required: Profile.get_followers() doesn't seem to care about RateControl #2512
So actually no superstrings, just the plain "feedback_required"/"checkpoint_required"/"challenge_required" strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, then your version should be totally fine. Sorry for misleading!
|
@aandergr Thank you for reviewing! I like your version, except for the assumption that "feedback_required"/"checkpoint_required"/"challenge_required" is the whole message content (I'm not sure about that, and even if it's true now, it may change in the future) |
|
Would you like me to change the PR accordingly, or would you prefer committing by yourself? |
Thanks! I created a commit with the proposed change. |
When Instagram returns "feedback_required", "checkpoint_required" or "challenge_required", any further requests will be declined until the required action is performed manually. Those messages mean Instagram suspects us of being a bot, so we must stop producing more requests immediately in order not to increase the suspicions.
This PR makes get_json (the status_code 400 block) raise AbortDownloadException with a detailed explanation for Instagram "feedback_required"/"checkpoint_required"/"challenge_required" responses.
I've updated get_json's docstring, but the documentation may need to be rerendered.
Generally, the PR's (almost?) ready to be merged.
I would appreciate a review and some help with docs (if there's something else to be done). You should be able to make edits directly to my branch.