Python生成双语字幕文件

过了九月以后就是美剧爆发的季节，等有字幕的RMVB（我承认我对画质的要求下降了。。。）已经跟不上进度。遂又开始了每周去eztv下载avi然后去射手搜字幕的时候。

字幕这个事呢，不能强求。有些字幕组会做中英文合并的单字幕文件，有些呢，就是中英文分开的两个。对于俺这种喜欢看双语字幕的人，就需要多做一步。以前都是人肉操作，现在改脚本帮忙了。

目标：把两种语言的两个字幕文件（比如.cn.srt文件和.en.srt文件）合并为一个单独的双语字幕文件（比如.combined.srt)

思路：想必大家都知道，把中、英文2个字幕文件中的任一个的内容加到另一个文件中保存后，那个合并过的srt文件就是双语字幕了。

脚本说明：非常的简单，就是读一个文件然后写入另一个。我懒得想复杂的功能了，以下代码可以成功执行的情况：

把代码保存为.py文件，放到希望整合srt的目录
该目录下有且只有2个.srt文件
文件名以.combined.srt结尾的文件就是咱们需要的了

脚本源码：

#coding=utf-8
from __future__ import unicode_literals
import os

srtFiles= [ i for i in os.listdir(os.getcwd()) if i.endswith('.srt')]
srtFiles.sort()
assert len(srtFiles) ==2
with open(srtFiles[0],'r') as f1:
    with open(srtFiles[1],'a') as f2:
        f2.writelines(f1.readlines())
os.rename(srtFiles[1],os.path.splitext(srtFiles[1])[0] +'.combined.srt')

用Python+BeautifulSoup查快递状态

俺在淘宝代购了双鞋，从美国邮过来的，时间比较长。每天要去快递网站比较麻烦，一直有想用python模拟浏览器的想法，正好拿这个开刀。由于请求的网址中包括了快递单号，基本思路就出来了：只需要把带单号的地址发送请求过去，返回的页面中就有状态信息了；然后对返回的HTML页面进行解析，提取自己想要的信息就搞定了。

我用了urllib2来发HTTP请求（其它的httplib之类的都行）。解析网页这块找了好几个,比如HTMLParser.HTMLParser，sgmllib.SGMLParser。最后还是选择了BeautifulSoup。“靓汤”真的很“好喝”，哈哈。

使用Beautiful Soup，先下载最新的3.2.0（python 2.x专用）,解压后进入BeautifulSoup-3.2.0目录执行如下命令即可以喝“靓汤”啦~

$python setup.py install

上代码,解释看代码里面的注释：

# coding=utf-8
# filename: CheckXLOBOExpressStatus.py
from __future__ import unicode_literals
from BeautifulSoup import BeautifulSoup
import urllib2
import re
url=r'http://www.xlobo.com/Public/PubB.aspx?code=DB111111111US' # DBxxxUS is the express number to be queried
req=urllib2.Request(url)
resp=urllib2.urlopen(req)
page=resp.read()
from os import linesep
page=page.strip(linesep)
soup=BeautifulSoup(page)
result=[]
temp=[]
# using two 'for' block to make all info in the same line in one list
for block in soup.findAll('div', {'class' : 'ff_row'}):
    for tag in block.findAll('span',{'class': re.compile(r'ff_\d{3} in_bk')}):
        temp.append(tag.text)
    result.append(temp)
    temp=[]

from sys import getfilesystemencoding
for i,v in enumerate(result):
    if i ==0:
        print '{0[0]:<s}\t\t\t{0[1]:<s}\t\t\t\t{0[2]:<s}'.format(v).encode(getfilesystemencoding())
    else:
        print '{0[0]:<s}\t{0[1]:<s}\t{0[2]:<s}\t{0[3]:s}'.format(v).encode(getfilesystemencoding())

魔兽世界CTM升级脚本（pywin32）

从年初就没玩WLK了，不过听说国服的魔兽世界要升级到CTM（国服叫“大地的裂变”）了，还是很早就下载了那个6.25G的升级补丁。期间看到NGA论坛很多人安装那些偷跑的第二部分补丁提前安装客户端。当时也没在意，心想到时候升级应该也没啥问题，升级那么多次了都。

谁想到，原来在这个新的资料片，暴雪把整个游戏文件都重构了，比如把之前很多文件都合为一个文件等。文件结构确实清爽了很多，但是这也加大了整个升级过程的工作量。特别坑爹的是升级的时候还需要联网和服务器验证和下载一些东西。中国的网络环境，再加上那几个升级服务器不给力（据说server2的文件本身就是错的），导致了太多更新失败！

俺也是其中之一。开服当天晚上都没更新成功，然后终于在第二天在一个帖子里面知道了Blizzard Updater的工作原理（以下是我发到知乎上面的）：

这次的更新暴雪一个让人痛恨的地方是不能断点续传。如果失败了，之前更新成功的文件（都是.temp结尾的）会全部删除，然后重新开始。所以就有一种偷天换日的办法，在显示安装完一个文件后马上把它复制出去。然后下次重装的时候，等更新器生成那些.temp文件并在进度到8%之前覆盖它们

这个脚本其实好早就想发出来了。但是实际上当时我完成更新的时候，写的脚本只是做了最简单的一步，就是打印出来Blizzard Updater上面的进度和正在安装的文字信息，这样之后就可以知道哪些需要备份，哪些正在进行中，然后手工备份的（太想早点弄好，所以奉行“够用就行”的原则）。

等待CTM安装更新的时候就粗略写了自动备份的功能，不过相当不成熟。其实我本来想写个完全傻瓜版（自动搜索WoW路径然后启动Blizzard Updater，自动备份，出错后自动重新开始更新并同时使用备份文件覆盖，然后循环直到成功）的发到NGA给大家用的，不过那个太耗时间就放弃了。。。
而且现在发出来的备份功能也没有实际测试过，只是自己检查代码几次后的成果（因为文件都更新掉了……）。
估计现在还没更新好的也是极少数了吧（从游戏人数就能看到，开服第二天深夜我登录的时候服务器上的人寥寥无几，第三天以后就越来越多了），用我这个脚本的可能性不会很大，呵呵。

脚本使用须知：

只供学习研究使用，后果自负，特别是备份功能。（实际上只有一处调用到删除文件操作；在cleanCp方法里面，对备份目录的文件进行的操作，可以把那行os.remove开头的删掉）
修改最开始的wowPath（魔兽3.3.5游戏目录）和backupPath（备份用的目录，最好找个不容易出错的地方）为自己对应的目录
如果只想用到查看更新进度功能，把从#back up files到最后后面的都删掉就行了
备份出来的文件都统一放在backupPath里面，需要手动覆盖到游戏目录(wowPath)里面：文件名有zhCN的放到Data\zhCN下面，enCN同理；其它放Data目录

脚本基本思路：

使用pywin32模块得到窗口标题栏及窗口文字内容，然后就是根据这些的扩展功能（备份）。就这么简单，本人也就Python初级水平，见笑了。

Python脚本：

# encoding: utf-8
# author: Sean Wang : weibo.com/fclef
from __future__ import unicode_literals
import win32gui
import pywintypes
import time
import os
import subprocess

from sys import getfilesystemencoding

# change below two paths to your own
wowPath = r'E:\World Of Warcraft' # WoW install path
backupPath = r'E:\Games setup\CTMbackup' # path to back up update files CAUTION:use a safe place to store!
ENCODING=getfilesystemencoding()

def getBUWin():
    def callback(hwnd,allWin):
        winText=win32gui.GetWindowText(hwnd).decode(ENCODING)
        # search for Blizzard Updater, get handler and title text
        if winText.find('Blizzard')>0:
            allWin.append(hwnd)
            allWin.append(winText)
        return True
    BlizWin = []
    try:
        win32gui.EnumWindows(callback,BlizWin)
    except pywintypes.errors, wte:
        print wte
    return BlizWin

def getAbsSrcPath(mpqFName):
    """helper method. get absolute path to the mpq file name"""
    assert mpqFName.upper.endswith("MPQ"), "%s is not a valid mpq filename"%mpqFName
    if mpqFName.find('zhCN') >=0:
        return os.path.join(wowPath,'Data','zhCN.temp',mpqFName+'.temp')
    elif mpqFName.find('enCN')>=0:
        return os.path.join(wowPath,'Data','enCN.temp',mpqFName+'.temp')
    else:
        return os.path.join(wowPath,'Data',mpqName+'.temp')

def backUp(srcAbsFPath):
    """helper method. return Popen instance of copy command"""
    assert os.path.isfile(srcAbsFPath), "source mpq file %r does not exist!"%srcAbsFPath
    srcSize=os.stat(srcAbsFPath).st_size
    destFile=os.path.join(backupPath,src)
    # compare the two files.
    # Use file size actually seems to be non-sense in Windows since even if
    # copy failed, the size would be also the same
    if os.path.exists(existFile) and os.stat(destFile).st_size==srcSize:
        print '%s already exists and size is the same.Ignored.'%srcAbsFPath
        return None
    else:
        print 'Start backuping...'
        cpCmd="copy /y %s %s"%(src,backupPath)
        return subprocess.Popen(cpCmd,shell=True,
                stdout=open(os.devnull,'w'),stderr=subprocess.PIPE)

def cleanCp(cpStat):
     if cpStat:
        for p in cpStat:
            if p and p.poll() is None:
                p.kill()
                #below part is the only place where this script would delete
                #your files, use with caution!!! backupPath should be a safe
                #place to do deletion task
                os.remove(os.path.join(backupPath,cpStat[p]))
            elif p.poll() != 0:
                print '%s failed to backup'%cpStat[p]
                print p.stderr

if __name__ == '__main__':
    contents=progress=toBackUp=[]
    cpStat={}
    lastBak=''
    while True:
        # get progress status on window title
        if getBUWin():
            window,title=getBUWin()
        else:
            cleanCp(cpStat)
            print "Error occurred. Could not find Blizzard Update window. Check if you got failed update or you forgot to launch it"
            break
        if title not in progress:
            progress.append(title)
            print "%s : %s"%(time.strftime('%H:%M:%S'),progress[-1])
        # get file process status displayed on programme
        try:
            control= win32gui.FindWindowEx(window,0,"static",None)
        except pywintypes.errors:
            if progress[-1].find('100%') >=0:
                print 'Update Done.'
            else:
                cleanCp(cpStat)
                print "Error occurred. Could not find Blizzard Update window. Check if you got failed update or you forgot to launch it"
            break
        content=win32gui.GetWindowText(control).decode(ENCODING)
        if content not in contents:
            contents.append(content)
            print "%s : %s"%(time.strftime('%H:%M:%S'),contents[-1])
        #back up files
        #do not backup temppatch-2.MPQ as we need time to copy those backup files after BU updater started
        toBackup=[ i for i in contents if i !=contents[-1] and i.find('temppatch-2.MPQ')=0:
            lastBak=toBackup[-1]
            mpqFile=toBackup[-1].split('"')[1]
            if mpqFile.endswith('.MPQ'):
                mpqSrcPath= getAbsSrcPath(mpqFile)
                cpStat.setdefault(backup(mpqSrcPath),mpqFile+'.temp')

上两张图，当时截的（未使用备份功能）：

更新过程的一个截图，这已经是因为出错而重复更新的第2还是第3次了，不过有了备份文件，重新更新的速度会快很多（可以参看第二幅更新完成的图，后来加了时间戳就很明显了）

这个是更新成功以后的图，更新完以后提示我还要下载2G，还好不是杯具的10G党。。。其实只要Launcher上显示第二部分已经完成就可以进游戏了的，当时不知道，在还剩500M左右的时候我终于忍不住试了下，真的可以进去游戏了。

Python版图片合成脚本

Oct/13/2011 update script: remove urgly lambda, use os.path.splitext()

之前写过用Ruby+ImageMagick整合多张图片,因为现在转用python，所以又用python写了一个。

当然，这个脚本的前提也是需要安装ImageMagick，而且要把它的bin目录加到PATH里面。也就是说在命令行下面输入montage不会提示找不到程序。另外，你还要装Python环境。。。

这次写的Python版比较简单，没有写Usage，这里提一下：

把脚本文件搁到你要整合图片的目录，然后执行脚本。该目录所有图片就合成一个叫result.png的文件了。

运行时会提示用哪种方式连接。默认的是普通的拼接。那个photostyle是相册风格的，具体的自己试下就行了。

代码如下：

import os
imgTypes=('.jpg','jpeg','.png','.gif')
#isImg= lambda x: x.find('.') >0 and x.split('.')[1] in imgTypes
#imgFiles=[f for f in os.listdir(os.getcwd()) if isImg(f) and f.find('result') < 0]
imgFiles=[f for f in os.listdir(os.getcwd()) if os.path.splitext(f)[1].lower() in imgTypes and f.find('result') < 0]
imgStrings=reduce(lambda x,y: '%s %s'%(x,y), imgFiles)

photoStyle='montage %s -auto-orient -bordercolor Lavender -background white +polaroid -tile 1x -gravity center -background SkyBlue  -geometry "1x1<" result.jpg'%imgStrings
plainStyle='montage %s -tile 1x -geometry "1x1<" result.jpg'%imgStrings
choice=raw_input("1. default: plain\t2. photo style\nplz choose image merge type:")
if choice == '' or choice == '1':
    os.system(plainStyle)
else:
    os.system(photoStyle)

样图：

PhotoStyle： PlainStyle：

Share this:

Share this:

Share this:

Share this: