python如何獲得一個(gè)url地址對應(yīng)的跳轉(zhuǎn)后的最終網(wǎng)址
問題描述
在python中,如何獲取短鏈對應(yīng)最終網(wǎng)址,現(xiàn)在有很多淘寶短鏈,我需要獲取短鏈跳到的最終網(wǎng)址,這個(gè)有什么好辦法嗎?有的是302 有的是直接在網(wǎng)頁用js進(jìn)行跳轉(zhuǎn),,這些如何獲取
問題解答
回答1:用selenium+phantonjs...
http://stackoverflow.com/ques...
#!/usr/bin/python2.7from twisted.internet import reactorfrom twisted.internet.defer import Deferred, DeferredList, DeferredLockfrom twisted.internet.defer import inlineCallbacksfrom twisted.web.client import Agent, HTTPConnectionPoolfrom twisted.web.http_headers import Headersfrom pprint import pprintfrom collections import defaultdictfrom urlparse import urlparsefrom random import randrangeimport fileinputpool = HTTPConnectionPool(reactor)pool.maxPersistentPerHost = 16agent = Agent(reactor, pool)locks = defaultdict(DeferredLock)locations = {}def getLock(url, simultaneous = 1): return locks[urlparse(url).netloc, randrange(simultaneous)]@inlineCallbacksdef getMapping(url): # Limit ourselves to 4 simultaneous connections per host # Tweak this as desired, but make sure that it no larger than # pool.maxPersistentPerHost lock = getLock(url,4) yield lock.acquire() try:resp = yield agent.request(’HEAD’, url)locations[url] = resp.headers.getRawHeaders(’location’,[None])[0] except Exception as e:locations[url] = str(e) finally: lock.release()
而且可以試試pip包
https://pypi.python.org/pypi/...
from urlunshort import resolveresolve('http://bit.ly/qlKaI') 結(jié)果 ’http://bitbucket.org/runeh/urlunshort/’
相關(guān)文章:
1. bootstrp是col-md-12列的,只有col-md-10有內(nèi)容,可以讓沒有內(nèi)容的不占據(jù)位置嗎;2. wordpress里,這樣的目錄列表是屬于小工具還是啥?3. 一直報(bào)這個(gè)錯(cuò)誤4. 常量在外面不加引號會報(bào)錯(cuò)。5. python 3.4 error: Microsoft Visual C++ 10.0 is required6. mysql 為什么主鍵 id 和 pid 都市索引, id > 10 走索引 time > 10 不走索引?7. MySQL 使用 group by 之后然后 IFNULL(COUNT(*),0) 為什么還是會獲得 null8. 我的怎么不顯示啊,話說有沒有QQ群什么的9. python如何設(shè)置一個(gè)隨著系統(tǒng)時(shí)間變化的動態(tài)變量?10. mysql federated引擎無法開啟
