一次 requests.get 的踩坑记录

本文记录一次使用 requests 模块时的踩坑经历。由于 request.get 传递的 URL 参数中有嵌套字典,还有 URL 加号和空格的编码问题,导致始终返回不到正确结果。

调试过程

先看浏览器发送请求时,带参数的完整 URL,经过编码后是这样的:

https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/?jsv=2.5.0&appKey=23536927&t=1588412724458&sign=714708c873e7227c0137626b1f2266d4&api=mtop.youku.live.com.liveplaycontrol&v=3.0&H5Request=true&AntiFlood=true&type=jsonp&dataType=jsonp&callback=mtopjsonp4&data=%7B%22encryptRClient%22%3A%22%22%2C%22liveId%22%3A8020520%2C%22sceneId%22%3A0%2C%22reqQuality%22%3A0%2C%22ad%22%3A%22%7B%5C%22site%5C%22%3A%5C%22youku%5C%22%2C%5C%22aw%5C%22%3A%5C%22w%5C%22%2C%5C%22p%5C%22%3A1%2C%5C%22vs%5C%22%3A%5C%221.0%5C%22%2C%5C%22vc%5C%22%3A0%2C%5C%22bt%5C%22%3A%5C%22pc%5C%22%2C%5C%22rst%5C%22%3A%5C%22mp4%5C%22%2C%5C%22dq%5C%22%3A0%2C%5C%22isvert%5C%22%3A0%2C%5C%22wintype%5C%22%3A%5C%22h5%5C%22%2C%5C%22bf%5C%22%3A0%2C%5C%22utdid%5C%22%3A%5C%22sBMzF7Ec6DgCAXWYW%2FkNTg23%5C%22%2C%5C%22fu%5C%22%3A0%2C%5C%22os%5C%22%3A%5C%22win%5C%22%2C%5C%22dvh%5C%22%3A1058%2C%5C%22dvw%5C%22%3A426%2C%5C%22ccode%5C%22%3A%5C%22live05010101%5C%22%2C%5C%22lid%5C%22%3A8020520%7D%22%2C%22cna%22%3A%22sBMzF7Ec6DgCAXWYW%2FkNTg23%22%2C%22playAbilities%22%3A%22%7B%5C%22decode_resolution_FPS%5C%22%3A%5C%221080p_50%5C%22%2C%5C%22abrPlay%5C%22%3A1%2C%5C%22vrPlay%5C%22%3A1%7D%22%2C%22keyIndex%22%3A%22web01%22%2C%22ccode%22%3A%22live05010101%22%2C%22app%22%3A%22Pc%22%2C%22refer%22%3A%22%22%2C%22ckey%22%3A%22123%23Ln6DbneQVYqb96bxlDKS8ldEzQDDO1io8aHFMKh0ul%2FHzaHKDdFPdTmaL2xadyAf53JtJEy9iiS3PFRL7DyybdveChmwWZrIeVBO9pzW4j2ao4qMbHG80gjhAWMAO0DXbB1KwAh%2F82QU2HedcVxTG%2F6AWubc9oKTCQO%2BYvilBP6uyyOhy5O9s%2ByObHsXTG4xTuEGE3e3ufqPbGVK5TdqWC%2Fue%2BhcXk7etb8YjmAqDH5cMME7%2BmknkkHL7DWX9DlHGErQco4mAMOwwbpmpgpz6G4ITyATSyXJdMPJh37AmUwUX1Uyu6veiJLAKm0I90LAlLWy0wcualLwqioFLq%2FwudfpUAFj3cLbCXHBnUYjNeMWgPhRGBW1unyzko7tJgVNYhKmdaIxN%2BsTML8EOPZ22mEOTZcINkYXcvJo40%2FA9lKvfRk8dLFoLmu4tpM2zMAl9XPmxqb6ArATBklulYk0iW12c%2F5YTqk92YCgnqZjcb89%2FHrhM%2BahFPKXpx08IBiOLqRlIvu72v8pEpK6Xc4S96NzioYELSVsrkOpI5z0IWk9cY%2FTim4nivrTNNpsdj2yW1I69%2BxXjbr2guOEDQTnIrbgmx0QX6XV4sb4O3GV1Z%2F0YO8MgaPMOxTEAJvY2PAJaZi%2BHJPYT32geeuH8DhYX006PnUJ80jNW4%2FxeXHxsESvwLM17V8QSbCDrQ%3D%3D%22%7D

未经编码前的 URL 是这样的:

https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/?jsv=2.5.0&appKey=23536927&t=1588412724458&sign=714708c873e7227c0137626b1f2266d4&api=mtop.youku.live.com.liveplaycontrol&v=3.0&H5Request=true&AntiFlood=true&type=jsonp&dataType=jsonp&callback=mtopjsonp4&data={"encryptRClient":"","liveId":8020520,"sceneId":0,"reqQuality":0,"ad":"{\"site\":\"youku\",\"aw\":\"w\",\"p\":1,\"vs\":\"1.0\",\"vc\":0,\"bt\":\"pc\",\"rst\":\"mp4\",\"dq\":0,\"isvert\":0,\"wintype\":\"h5\",\"bf\":0,\"utdid\":\"sBMzF7Ec6DgCAXWYW/kNTg23\",\"fu\":0,\"os\":\"win\",\"dvh\":1058,\"dvw\":426,\"ccode\":\"live05010101\",\"lid\":8020520}","cna":"sBMzF7Ec6DgCAXWYW/kNTg23","playAbilities":"{\"decode_resolution_FPS\":\"1080p_50\",\"abrPlay\":1,\"vrPlay\":1}","keyIndex":"web01","ccode":"live05010101","app":"Pc","refer":"","ckey":"123#Ln6DbneQVYqb96bxlDKS8ldEzQDDO1io8aHFMKh0ul/HzaHKDdFPdTmaL2xadyAf53JtJEy9iiS3PFRL7DyybdveChmwWZrIeVBO9pzW4j2ao4qMbHG80gjhAWMAO0DXbB1KwAh/82QU2HedcVxTG/6AWubc9oKTCQO+YvilBP6uyyOhy5O9s+yObHsXTG4xTuEGE3e3ufqPbGVK5TdqWC/ue+hcXk7etb8YjmAqDH5cMME7+mknkkHL7DWX9DlHGErQco4mAMOwwbpmpgpz6G4ITyATSyXJdMPJh37AmUwUX1Uyu6veiJLAKm0I90LAlLWy0wcualLwqioFLq/wudfpUAFj3cLbCXHBnUYjNeMWgPhRGBW1unyzko7tJgVNYhKmdaIxN+sTML8EOPZ22mEOTZcINkYXcvJo40/A9lKvfRk8dLFoLmu4tpM2zMAl9XPmxqb6ArATBklulYk0iW12c/5YTqk92YCgnqZjcb89/HrhM+ahFPKXpx08IBiOLqRlIvu72v8pEpK6Xc4S96NzioYELSVsrkOpI5z0IWk9cY/Tim4nivrTNNpsdj2yW1I69+xXjbr2guOEDQTnIrbgmx0QX6XV4sb4O3GV1Z/0YO8MgaPMOxTEAJvY2PAJaZi+HJPYT32geeuH8DhYX006PnUJ80jNW4/xeXHxsESvwLM17V8QSbCDrQ=="}

有各种参数:

requests.get的params参数

拆分这个 URL:

请求地址:

url = 'https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/'

传递的各种参数:俺直接写成 Python 字典形式,可以看到 data 参数是一个嵌套的字典,里面还有 ad 、playAbilities 两个字典:

params = {
	'jsv': '2.5.0',
	'appKey': 23536927,
	't': 1588412724458,
	'sign': '714708c873e7227c0137626b1f2266d4',
	'api': 'mtop.youku.live.com.liveplaycontrol',
	'v': '3.0',
	'H5Request': 'true',
	'AntiFlood': 'true',
	'type': 'jsonp',
	'dataType': 'jsonp',
	'callback': 'mtopjsonp4',
	'data': {
        "encryptRClient":"",
        "liveId":8020520,
        "sceneId":0,
        "reqQuality":0,
        "ad": {
            "site": "youku",
            "aw": "w",
            "p": 1,
            "vs": "1.0",
            "vc": 0,
            "bt": "pc",
            "rst": "mp4",
            "dq": 0,
            "isvert": 0,
            "wintype": "h5",
            "bf": 0,
            "utdid": "sBMzF7Ec6DgCAXWYW/kNTg23",
            "fu": 0,
            "os": "win",
            "dvh": 1058,
            "dvw": 426,
            "ccode": "live05010101",
            "lid": 8020520
        },
        "cna": "sBMzF7Ec6DgCAXWYW/kNTg23",
        'playAbilities': {
            'decode_resolution_FPS':'1080p_50',
            'abrPlay':1,
            'vrPlay':1
            },
        "keyIndex": "web01",
        "ccode": "live05010101",
        "app": "Pc",
        "refer": "",
        "ckey": "123#Ln6DbneQVYqb96bxlDKS8ldEzQDDO1io8aHFMKh0ul/HzaHKDdFPdTmaL2xadyAf53JtJEy9iiS3PFRL7DyybdveChmwWZrIeVBO9pzW4j2ao4qMbHG80gjhAWMAO0DXbB1KwAh/82QU2HedcVxTG/6AWubc9oKTCQO YvilBP6uyyOhy5O9s yObHsXTG4xTuEGE3e3ufqPbGVK5TdqWC/ue hcXk7etb8YjmAqDH5cMME7 mknkkHL7DWX9DlHGErQco4mAMOwwbpmpgpz6G4ITyATSyXJdMPJh37AmUwUX1Uyu6veiJLAKm0I90LAlLWy0wcualLwqioFLq/wudfpUAFj3cLbCXHBnUYjNeMWgPhRGBW1unyzko7tJgVNYhKmdaIxN sTML8EOPZ22mEOTZcINkYXcvJo40/A9lKvfRk8dLFoLmu4tpM2zMAl9XPmxqb6ArATBklulYk0iW12c/5YTqk92YCgnqZjcb89/HrhM ahFPKXpx08IBiOLqRlIvu72v8pEpK6Xc4S96NzioYELSVsrkOpI5z0IWk9cY/Tim4nivrTNNpsdj2yW1I69 xXjbr2guOEDQTnIrbgmx0QX6XV4sb4O3GV1Z/0YO8MgaPMOxTEAJvY2PAJaZi HJPYT32geeuH8DhYX006PnUJ80jNW4/xeXHxsESvwLM17V8QSbCDrQ=="
    }
}

开始挖坑了。

坑一:嵌套字典

如果此时直接 requests.get(url=url, params=params) 是不行的,因为 params 是个多层字典,会导致请求一个如下形式的错误 URL:

https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/?jsv=2.5.0&appKey=23536927&t=1588412724458&sign=714708c873e7227c0137626b1f2266d4&api=mtop.youku.live.com.liveplaycontrol&v=3.0&H5Request=true&AntiFlood=true&type=jsonp&dataType=jsonp&callback=mtopjsonp4&data=encryptRClient&data=liveId&data=sceneId&data=reqQuality&data=ad&data=cna&data=playAbilities&data=keyIndex&data=ccode&data=app&data=refer&data=ckey

试试先用 json.dumps() 将包括 data 在内的所有嵌套字典进行编码,如下形式:

'playAbilities': json.dumps({
    'decode_resolution_FPS': '1080p_50',
    'abrPlay': 1,
    'vrPlay': 1
})

json.dumps() 编码后请求的 URL:

https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/?jsv=2.5.0&appKey=23536927&t=1588412724458&sign=714708c873e7227c0137626b1f2266d4&api=mtop.youku.live.com.liveplaycontrol&v=3.0&H5Request=true&AntiFlood=true&type=jsonp&dataType=jsonp&callback=mtopjsonp4&data={"encryptRClient":+"",+"liveId":+8020520,+"sceneId":+0,+"reqQuality":+0,+"ad":+"{\"site\":+\"youku\",+\"aw\":+\"w\",+\"p\":+1,+\"vs\":+\"1.0\",+\"vc\":+0,+\"bt\":+\"pc\",+\"rst\":+\"mp4\",+\"dq\":+0,+\"isvert\":+0,+\"wintype\":+\"h5\",+\"bf\":+0,+\"utdid\":+\"sBMzF7Ec6DgCAXWYW/kNTg23\",+\"fu\":+0,+\"os\":+\"win\",+\"dvh\":+1058,+\"dvw\":+426,+\"ccode\":+\"live05010101\",+\"lid\":+8020520}",+"cna":+"sBMzF7Ec6DgCAXWYW/kNTg23",+"playAbilities":+"{\"decode_resolution_FPS\":+\"1080p_50\",+\"abrPlay\":+1,+\"vrPlay\":+1}",+"keyIndex":+"web01",+"ccode":+"live05010101",+"app":+"Pc",+"refer":+"",+"ckey":+"123#Ln6DbneQVYqb96bxlDKS8ldEzQDDO1io8aHFMKh0ul/HzaHKDdFPdTmaL2xadyAf53JtJEy9iiS3PFRL7DyybdveChmwWZrIeVBO9pzW4j2ao4qMbHG80gjhAWMAO0DXbB1KwAh/82QU2HedcVxTG/6AWubc9oKTCQO+YvilBP6uyyOhy5O9s+yObHsXTG4xTuEGE3e3ufqPbGVK5TdqWC/ue+hcXk7etb8YjmAqDH5cMME7+mknkkHL7DWX9DlHGErQco4mAMOwwbpmpgpz6G4ITyATSyXJdMPJh37AmUwUX1Uyu6veiJLAKm0I90LAlLWy0wcualLwqioFLq/wudfpUAFj3cLbCXHBnUYjNeMWgPhRGBW1unyzko7tJgVNYhKmdaIxN+sTML8EOPZ22mEOTZcINkYXcvJo40/A9lKvfRk8dLFoLmu4tpM2zMAl9XPmxqb6ArATBklulYk0iW12c/5YTqk92YCgnqZjcb89/HrhM+ahFPKXpx08IBiOLqRlIvu72v8pEpK6Xc4S96NzioYELSVsrkOpI5z0IWk9cY/Tim4nivrTNNpsdj2yW1I69+xXjbr2guOEDQTnIrbgmx0QX6XV4sb4O3GV1Z/0YO8MgaPMOxTEAJvY2PAJaZi+HJPYT32geeuH8DhYX006PnUJ80jNW4/xeXHxsESvwLM17V8QSbCDrQ=="}

坑二:json.dumps() 的空格

和最初浏览器中请求的 URL 对比,发现还是不一样啊。看图中的不同处,应该是 json.dumps() 编码时保留了 “key: value” 冒号后的空格,而空格被编码成了加号“+”。这里可使用 json.dumps() 的 separators=(',',  ':') 参数来消除空格。

json.dumps()后的空格

坑三:空格的编码

在上步中,去掉了 json 冒号后的空格,就不存在编码影响了。但 ckey 的值里,存在的空格是有效字符,使用 Python request.get 模拟请求时,ckey 里的空格被编码成了加号“+”;而在浏览器中编码成加号“+”后又二次编码成 “%2B”,这就导致请求的 URL 还是不对。所以将 ckey 中的空格用 replace 替换成加号“+”。

URL中的空格编码

最后,用完整的 Python 代码对比 URL ,发现终于一样了:

import requests
import json


def compare_urls():
    url_from_browser = "https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/?jsv=2.5.0&appKey=23536927&t=1588412724458&sign=714708c873e7227c0137626b1f2266d4&api=mtop.youku.live.com.liveplaycontrol&v=3.0&H5Request=true&AntiFlood=true&type=jsonp&dataType=jsonp&callback=mtopjsonp4&data=%7B%22encryptRClient%22%3A%22%22%2C%22liveId%22%3A8020520%2C%22sceneId%22%3A0%2C%22reqQuality%22%3A0%2C%22ad%22%3A%22%7B%5C%22site%5C%22%3A%5C%22youku%5C%22%2C%5C%22aw%5C%22%3A%5C%22w%5C%22%2C%5C%22p%5C%22%3A1%2C%5C%22vs%5C%22%3A%5C%221.0%5C%22%2C%5C%22vc%5C%22%3A0%2C%5C%22bt%5C%22%3A%5C%22pc%5C%22%2C%5C%22rst%5C%22%3A%5C%22mp4%5C%22%2C%5C%22dq%5C%22%3A0%2C%5C%22isvert%5C%22%3A0%2C%5C%22wintype%5C%22%3A%5C%22h5%5C%22%2C%5C%22bf%5C%22%3A0%2C%5C%22utdid%5C%22%3A%5C%22sBMzF7Ec6DgCAXWYW%2FkNTg23%5C%22%2C%5C%22fu%5C%22%3A0%2C%5C%22os%5C%22%3A%5C%22win%5C%22%2C%5C%22dvh%5C%22%3A1058%2C%5C%22dvw%5C%22%3A426%2C%5C%22ccode%5C%22%3A%5C%22live05010101%5C%22%2C%5C%22lid%5C%22%3A8020520%7D%22%2C%22cna%22%3A%22sBMzF7Ec6DgCAXWYW%2FkNTg23%22%2C%22playAbilities%22%3A%22%7B%5C%22decode_resolution_FPS%5C%22%3A%5C%221080p_50%5C%22%2C%5C%22abrPlay%5C%22%3A1%2C%5C%22vrPlay%5C%22%3A1%7D%22%2C%22keyIndex%22%3A%22web01%22%2C%22ccode%22%3A%22live05010101%22%2C%22app%22%3A%22Pc%22%2C%22refer%22%3A%22%22%2C%22ckey%22%3A%22123%23Ln6DbneQVYqb96bxlDKS8ldEzQDDO1io8aHFMKh0ul%2FHzaHKDdFPdTmaL2xadyAf53JtJEy9iiS3PFRL7DyybdveChmwWZrIeVBO9pzW4j2ao4qMbHG80gjhAWMAO0DXbB1KwAh%2F82QU2HedcVxTG%2F6AWubc9oKTCQO%2BYvilBP6uyyOhy5O9s%2ByObHsXTG4xTuEGE3e3ufqPbGVK5TdqWC%2Fue%2BhcXk7etb8YjmAqDH5cMME7%2BmknkkHL7DWX9DlHGErQco4mAMOwwbpmpgpz6G4ITyATSyXJdMPJh37AmUwUX1Uyu6veiJLAKm0I90LAlLWy0wcualLwqioFLq%2FwudfpUAFj3cLbCXHBnUYjNeMWgPhRGBW1unyzko7tJgVNYhKmdaIxN%2BsTML8EOPZ22mEOTZcINkYXcvJo40%2FA9lKvfRk8dLFoLmu4tpM2zMAl9XPmxqb6ArATBklulYk0iW12c%2F5YTqk92YCgnqZjcb89%2FHrhM%2BahFPKXpx08IBiOLqRlIvu72v8pEpK6Xc4S96NzioYELSVsrkOpI5z0IWk9cY%2FTim4nivrTNNpsdj2yW1I69%2BxXjbr2guOEDQTnIrbgmx0QX6XV4sb4O3GV1Z%2F0YO8MgaPMOxTEAJvY2PAJaZi%2BHJPYT32geeuH8DhYX006PnUJ80jNW4%2FxeXHxsESvwLM17V8QSbCDrQ%3D%3D%22%7D"

    url_from_requests = "https://acs.youku.com/h5/mtop.youku.live.com.liveplaycontrol/3.0/"      

    params = {
        'jsv': '2.5.0',
        'appKey': '23536927',
        't': '1588412724458',
        'sign': '714708c873e7227c0137626b1f2266d4',
        'api': 'mtop.youku.live.com.liveplaycontrol',
        'v': '3.0',
        'H5Request': 'true',
        'AntiFlood': 'true',
        'type': 'jsonp',
        'dataType': 'jsonp',
        'callback': 'mtopjsonp4',
        'data': json.dumps({
            "encryptRClient":"",
            "liveId":8020520,
            "sceneId":0,
            "reqQuality":0,
            "ad": json.dumps({
                "site": "youku",
                "aw": "w",
                "p": 1,
                "vs": "1.0",
                "vc": 0,
                "bt": "pc",
                "rst": "mp4",
                "dq": 0,
                "isvert": 0,
                "wintype": "h5",
                "bf": 0,
                "utdid": "sBMzF7Ec6DgCAXWYW/kNTg23",
                "fu": 0,
                "os": "win",
                "dvh": 1058,
                "dvw": 426,
                "ccode": "live05010101",
                "lid": 8020520
            }, separators=(',', ':')),
            "cna": "sBMzF7Ec6DgCAXWYW/kNTg23",
            'playAbilities': json.dumps({
                'decode_resolution_FPS':'1080p_50',
                'abrPlay':1,
                'vrPlay':1
                }, separators=(',', ':')),
            "keyIndex": "web01",
            "ccode": "live05010101",
            "app": "Pc",
            "refer": "",
            "ckey": '123#Ln6DbneQVYqb96bxlDKS8ldEzQDDO1io8aHFMKh0ul/HzaHKDdFPdTmaL2xadyAf53JtJEy9iiS3PFRL7DyybdveChmwWZrIeVBO9pzW4j2ao4qMbHG80gjhAWMAO0DXbB1KwAh/82QU2HedcVxTG/6AWubc9oKTCQO YvilBP6uyyOhy5O9s yObHsXTG4xTuEGE3e3ufqPbGVK5TdqWC/ue hcXk7etb8YjmAqDH5cMME7 mknkkHL7DWX9DlHGErQco4mAMOwwbpmpgpz6G4ITyATSyXJdMPJh37AmUwUX1Uyu6veiJLAKm0I90LAlLWy0wcualLwqioFLq/wudfpUAFj3cLbCXHBnUYjNeMWgPhRGBW1unyzko7tJgVNYhKmdaIxN sTML8EOPZ22mEOTZcINkYXcvJo40/A9lKvfRk8dLFoLmu4tpM2zMAl9XPmxqb6ArATBklulYk0iW12c/5YTqk92YCgnqZjcb89/HrhM ahFPKXpx08IBiOLqRlIvu72v8pEpK6Xc4S96NzioYELSVsrkOpI5z0IWk9cY/Tim4nivrTNNpsdj2yW1I69 xXjbr2guOEDQTnIrbgmx0QX6XV4sb4O3GV1Z/0YO8MgaPMOxTEAJvY2PAJaZi HJPYT32geeuH8DhYX006PnUJ80jNW4/xeXHxsESvwLM17V8QSbCDrQ=='.replace(' ', '+')
        }, separators=(',', ':'))
    }
    
    response1 = requests.get(url=url_from_browser)

    response2 = requests.get(url=url_from_requests, params=params)

    if response1.url == response2.url:
        print('Well Done!')
    else:
        print('Nooooo!')
    

compare_urls()

参考资料:

Http-Post请求特殊符号变空格的问题解决
字符解码时加号解码为空格问题探究
urlencode|urldecode中空格和加号的问题
Python去除json.dumps()生成的空格
JSON 编码和解码器

» 链接地址:https://wbt5.com/requests-get.html »英雄不问来路,转载请注明出处。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注