文章詳情頁(yè)

網(wǎng)頁(yè)爬蟲 - Python3.6 下的爬蟲總是重復(fù)爬第一頁(yè)的內(nèi)容

瀏覽：153日期：2022-06-30 17:08:03

問(wèn)題描述

問(wèn)題如題：改成while，試了很多，然沒有效果，請(qǐng)教大家

# coding:utf-8# from lxml import etreeimport requests,lxml.html,osclass MyError(Exception): def __init__(self, value):self.value = value def __str__(self):return repr(self.value) def get_lawyers_info(url): r = requests.get(url) html = lxml.html.fromstring(r.content) # phones = html.xpath(’//span[@class='law-tel']’) phones = html.xpath(’//span[@class='phone pull-right']’) # names = html.xpath(’//p[@class='fl']/p/a’) names = html.xpath(’//h4[@class='text-center']’) if(len(phones) == len(names)):list(zip(names,phones))phone_infos = [(names[i].text, phones[i].text_content()) for i in range(len(names))] else:error = 'Lawyers amount are not equal to the amount of phone_nums: '+urlraise MyError(error) phone_infos_list = [] for phone_info in phone_infos:if(phone_info[0] == ''): info = '沒留姓名'+': '+phone_info[1]+'rn'else: info = phone_info[0]+': '+phone_info[1]+'rn'print (info)phone_infos_list.append(info) return phone_infos_listdir_path = os.path.abspath(os.path.dirname(__file__))print (dir_path)file_path = os.path.join(dir_path,'lawyers_info.txt')print (file_path)if os.path.exists(file_path): os.remove(file_path)with open('lawyers_info.txt','ab') as file: for i in range(1000):url = 'http://www.xxxx.com/cooperative_merchants?searchText=&industry=100&provinceId=19&cityId=0&areaId=0&page='+str(i+1)# r = requests.get(url)# html = lxml.html.fromstring(r.content)# phones = html.xpath(’//span[@class='phone pull-right']’)# names = html.xpath(’//h4[@class='text-center']’) # if phones or names:info = get_lawyers_info(url)for each in info: file.write(each.encode('gbk'))

問(wèn)題解答

回答1：

# coding: utf-8import requestsfrom pyquery import PyQuery as Qurl = ’http://www.51myd.com/cooperative_merchants?industry=100&provinceId=19&cityId=0&areaId=0&page=’with open(’lawyers_info.txt’, ’ab’) as f: for i in range(1, 5):r = requests.get(’{}{}’.format(url, i))usernames = Q(r.text).find(’.username’).text().split()phones = Q(r.text).find(’.phone’).text().split()print zip(usernames, phones)

Python 編程

上一條：python from fileutils import FileUtils文件操作下一條：網(wǎng)頁(yè)爬蟲 - python+smtp發(fā)送郵件附件問(wèn)題

相關(guān)文章：

1. java中返回一個(gè)對(duì)象，和輸出對(duì)像的值，意義在哪兒2. docker網(wǎng)絡(luò)端口映射，沒有方便點(diǎn)的操作方法么？3. mysql - 在不允許改動(dòng)數(shù)據(jù)表的情況下，如何優(yōu)化以varchar格式存儲(chǔ)的時(shí)間的比較？4. docker start -a dockername 老是卡住，什么情況？5. css3 - 純css實(shí)現(xiàn)點(diǎn)擊特效6. apache web server 怎么限制某一個(gè)網(wǎng)站對(duì)服務(wù)器資源的占用？7. javascript - 關(guān)于apply（）與call（）的問(wèn)題8. docker - 各位電腦上有多少個(gè)容器啊？容器一多，自己都搞混了，咋辦呢？9. 安全性測(cè)試 - nodejs中如何防m(xù)ySQL注入10. python - pandas dataframe如何對(duì)某列的空數(shù)據(jù)位置進(jìn)行update？update的函數(shù)是自定義的，參數(shù)是同一行的另外兩列數(shù)據(jù)

排行榜

					
					docker - 各位電腦上有多少個(gè)容器??？容器一多，自己都搞混了，咋辦呢？
docker start -a dockername 老是卡住，什么情況？
docker網(wǎng)絡(luò)端口映射，沒有方便點(diǎn)的操作方法么？
apache web server 怎么限制某一個(gè)網(wǎng)站對(duì)服務(wù)器資源的占用？
java中返回一個(gè)對(duì)象，和輸出對(duì)像的值，意義在哪兒
css3 - 純css實(shí)現(xiàn)點(diǎn)擊特效
安全性測(cè)試 - nodejs中如何防m(xù)ySQL注入
javascript - 關(guān)于apply（）與call（）的問(wèn)題
mysql - 在不允許改動(dòng)數(shù)據(jù)表的情況下，如何優(yōu)化以varchar格式存儲(chǔ)的時(shí)間的比較？
javascript - jQuery post()方法，里面的請(qǐng)求串可以轉(zhuǎn)換為GBK編碼么？可以的話怎樣轉(zhuǎn)換？
java - spring boot 如何打包成asp.net core 那種獨(dú)立應(yīng)用?
				

熱門標(biāo)簽

av一区二区在线观看_亚洲男人的天堂网站_日韩亚洲视频_在线成人免费_欧美日韩精品免费观看视频_久草视

網(wǎng)頁(yè)爬蟲 - Python3.6 下的爬蟲總是重復(fù)爬第一頁(yè)的內(nèi)容