文章詳情頁

python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)

瀏覽：2日期：2022-07-04 16:57:42

按F12打開開發(fā)者工具抓包，可以定位到招聘信息的接口

在請求中可以獲取到接口的url和formdata，表單中pn為請求的頁數(shù)，kd為關請求職位的關鍵字

python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)

使用python構建post請求

data = { ’first’: ’true’, ’pn’: ’1’, ’kd’: ’python’}headers = { ’referer’: ’https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’, ’user-agent’: ’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’}res = requests.post('https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false', data=data,headers=headers)print(res.text)

發(fā)現(xiàn)沒有從接口獲取到數(shù)據(jù)

python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)

換了個網(wǎng)絡后接口還是會返回操作頻繁的錯誤信息，仔細檢查后發(fā)現(xiàn)這個接口需要一個動態(tài)的cookies不然會一值返回錯誤頻繁

data = { ’first’: ’true’, ’pn’: ’1’, ’kd’: ’python’}#頭部中必須有user-agent和referer不然不會返回cookiesheaders = { ’referer’: ’https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’, ’user-agent’: ’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’}#通過訪問主頁獲取cookiesr1= requests.get('https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’',headers=headers)#再post請求中傳入cookiesr2 = requests.post('https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false', data=data,headers=headers, cookies=r2.cookies)print(r2.text)

注意！每請求十次接口cookies也會刷新一次,下面貼上完整爬蟲代碼

import jsonimport loggingimport requests#獲取cookiedef getCookie(): res = requests.get('https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=',headers=headers) return res.cookies#獲取json數(shù)據(jù)def getPage(i, cookies, kw): data = { ’first’: ’true’, ’pn’: i, ’kd’: kw } res = requests.post('https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false', data=data, headers=headers, cookies=cookies) return json.loads(res.text)#合并列表def reduceList(l): text = '' for i in l: text += i + ' ' return text.strip()#提取字段并保存到文件中def saveInCsv(f, data): js = data['content']['positionResult']['result'] for node in js: # 對空值進行處理 district = node['district'] if district != None: district = '-' + district else: district = '' f.write( node['positionName'] + '·' + node['city'] + district + '·' + node['salary'] + '·' + node['workYear'] + '·' + node['education'] + '·' + reduceList(node['skillLables']) + '·' + node['companyShortName'] + '·' + node['companySize'] + '·' + node['positionAdvantage'] + 'n')if __name__ == ’__main__’: #定義頭部 headers = { ’referer’: ’https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput=’, ’user-agent’: ’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36’ } #初始化cookie cookies = getCookie() with open('file.csv', 'w', encoding='utf-8') as f: for i in range(1, 31): #每十個請求重新獲取cookie if (i % 10 == 0):cookies = getCookie() #解析字段并存儲 data = getPage(i, cookies, 'python') saveInCsv(f, data)

到此這篇關于python使用requests庫爬取拉勾網(wǎng)招聘信息的實現(xiàn)的文章就介紹到這了,更多相關python requests爬取拉勾網(wǎng)內容請搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持好吧啦網(wǎng)！

Python 編程

上一條：Python paramiko使用方法代碼匯總下一條：Python getsizeof()和getsize()區(qū)分詳解

相關文章：

1. JSP之表單提交get和post的區(qū)別詳解及實例2. 利用FastReport傳遞圖片參數(shù)在報表上展示簽名信息的實現(xiàn)方法3. jsp cookie+session實現(xiàn)簡易自動登錄4. 利用ajax+php實現(xiàn)商品價格計算5. Spring MVC+ajax進行信息驗證的方法6. chat.asp聊天程序的編寫方法7. Ajax請求超時與網(wǎng)絡異常處理圖文詳解8. PHP循環(huán)與分支知識點梳理9. JSP+Servlet實現(xiàn)文件上傳到服務器功能10. jsp實現(xiàn)textarea中的文字保存換行空格存到數(shù)據(jù)庫的方法

排行榜

					
					詳解JAVA 強引用
SpringBoot集成mqtt的多模塊項目配置詳解
IntelliJ IDEA導入jar包的方法
python之cur.fetchall與cur.fetchone提取數(shù)據(jù)并統(tǒng)計處理操作
通過實例了解Java Integer類和int的區(qū)別
JSP之表單提交get和post的區(qū)別詳解及實例
網(wǎng)頁中img圖片使用css實現(xiàn)等比例自動縮放不變形（代碼已測試）
PHP循環(huán)與分支知識點梳理
jsp cookie+session實現(xiàn)簡易自動登錄
Python TestSuite生成測試報告過程解析
JSP+Servlet實現(xiàn)文件上傳到服務器功能