Skip to content
Unknown's avatar

Sean Wang's territory

All about Testing, Python gadgets and personal notes

Tag: web scratch

Python自动登录网站脚本

找到个免费VPN不容易,虽然过一会就断线,但是对于俺连接美服更新Diablo3是绰绰有余了。不过这个VPN需要每3天登录一次,这个比较麻烦。

思路:

  • 很早之前就读过关于用Python抓站的文章(simplecd.org作者Observer大大的文章很给力)里面提到自动登录。所以直接使用拿来主义借用。
  • 不过现在网站登录没个验证码真是说不过去,所以这个也是一个需要解决的问题。不过还好,网站使用的验证码非常的正常,没有扭曲变形加噪,更不像google的captcha那么变态。是非常正常的英文加数字:
    • 那必须直接用google的开源ocr工具tesseract-ocr啊,这也是之前研究Sikuli时了解到的~
    • 至于和Python挂钩,使用google搜了下,找到个简单的tesseract的wrapper工具pytesser符合我的要求

开工:

  • 安装tesseract-ocr的Windows最新版本(其它OS的同学安装对应的就好)
  • 下载pytesser,解压出来pytesser.py,util.py,errors.py直接放到脚本文件夹备用
  • 然后自己写的脚本。如下:
#!/usr/bin/env python
#-*- coding: UTF-8 -*-
# filename: AutoLogin.py

from __future__ import unicode_literals
import urllib2
import cookielib
import urllib
import Image
from cStringIO import StringIO
import re
from pytesser import *

LOGIN_URL = 'http://*.*.*.*/lr.sm' #网站就隐了,被发现了估计验证码加强了就不好整了-_-||
IMAGE_URL = 'http://*.*.*.*/image'
USER = 'yourusername'
PWD = 'yourpassword'

### OCR using pytesser ###
img_file=urllib2.urlopen(IMAGE_URL)
img= StringIO(img_file.read())
checkImg= Image.open(img)
ocr_str= image_to_string(checkImg).lower()
CODE=''.join(ocr_str.split())

postdata=urllib.urlencode({
    'user.nick':USER,
    'password':PWD,
    'validationCode':CODE,
})

headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1',
    'Referer':LOGIN_URL
}

cookie_support = urllib2.HTTPCookieProcessor(cookielib.CookieJar())
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)

req = urllib2.Request( url = LOGIN_URL, data = postdata, headers = headers )

result = urllib2.urlopen(req).read()
decoded_result=result.decode('utf-8')
if re.search('{} **欢迎您'.format(USER), decoded_result): #隐去网站名称...
    print 'Logged in successfully!'
else:
    with open('result.html','w') as f:
        f.write(result)
    print 'Logged in failed, check result.html file for details'

应该是只登录就好了,所以没对cookie做处理。以后有时间研究下cookielib~

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
Like Loading...
seganw Coding, Python Leave a comment July 13, 2012July 13, 2012 1 Minute

云

adb android android-driver android automation testing automation bcd beatiful soup beautifulsoup bilingual Blizzard Updater carrierIQ carrier IQ command CTM draw something dual boot eclipse for GAppPr0xy GFxxkingW git github git push error grub grub2 grub rescue grub_xputs guess image ImageMagick iOS java decompiler jd jd-gui livespace live writer lxml mbr merge monkeyrunner montage photos pydev python python bindings remove Ruby script selenium selenium-server subtitle testing toc version ubuntu webdriver win7 windows WoW Xcode xlobo Youtube 升级 双启动 失败 小红伞 已下载文件 快递 查询 控制文件 搬家 更新 校检 离线发布 脚本 说明不匹配 魔兽世界

足迹

seganw's avatarseganw on Python Bottle+virtualenv+uWSGI…
seganw's avatarseganw on Python Bottle+virtualenv+uWSGI…
Eric Miller's avatarEric Miller on Python Bottle+virtualenv+uWSGI…

工作间

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
  • Subscribe Subscribed
    • Sean Wang's territory
    • Already have a WordPress.com account? Log in now.
  • Privacy
    • Sean Wang's territory
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
%d