文章詳情頁

csv - python多列存取爬蟲網頁？

瀏覽：109日期：2022-08-30 10:07:16

問題描述

爬蟲抓取的資料想分列存取在tsv上,試過很多方式都沒有辦法成功存存取成兩列資訊。想存取為數字爬取的資料一列,底下類型在第二列 csv - python多列存取爬蟲網頁？

from urllib.request import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)def GPname(): GPnameList = bs0bj.find_all('dd',{'class':re.compile('ddappname')}) str = ’’ for name in GPnameList:str += name.get_text()str += ’n’print(name.get_text()) return strdef GPcompany(): GPcompanyname = bs0bj.find_all('dd',{'style':re.compile('color')}) str = ’’ for cpa in GPcompanyname:str += cpa.get_text()str += ’n’print(cpa.get_text()) return strwith open(’0217.tsv’,’w’,newline=’’,encoding=’utf-8’) as f: f.write(GPname()) f.write(GPcompany())f.close()

可能對zip不熟悉，存取下來之后變成一個字一格也找到這篇參考，但怎么嘗試都沒有辦法成功https://segmentfault.com/q/10...

問題解答

回答1：

寫csv文件簡單點你的結構數據要成這樣 [['1. 東森新聞雲','新聞'],['2. 創世黎明(Dawn of world)','遊戲']]

from urllib import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)GPnameList = [name.get_text() for name in bs0bj.find_all('dd',{'class':re.compile('ddappname')})]GPcompanyname = [cpa.get_text() for cpa in bs0bj.find_all('dd',{'style':re.compile('color')})]data = ’n’.join([’,’.join(d) for d in zip(GPnameList, GPcompanyname)])with open(’C:/Users/sa/Desktop/0217.csv’,’wb’) as f: f.write(data.encode(’utf-8’))

Python 編程

上一條：python - 搜索大文件（20G左右）下一條：ubuntu - Python3.x的中文字符在Linux下面的占位問題？

相關文章：

1. angular.js - webpack build后的angularjs路由跳轉問題2. 數組按鍵值封裝！3. java - web項目中，用戶登陸信息存儲在session中好還是cookie中好，取決于什么？4. mysql - 查詢字段做了索引為什么不起效,還有查詢一個月的時候數據都是全部出來的，如果分拆3次的話就沒問題，為什么呢。5. mysql 新增用戶主機名設定失敗6. 單擊登錄按鈕無反應7. ubuntu - mysql 連接問題8. mysql federated引擎無法開啟9. mysql - 大部分數據沒有行溢出的text字段是否需要拆表10. mysql儲存json錯誤

排行榜

					
					數組按鍵值封裝！
angular.js - webpack build后的angularjs路由跳轉問題
java - web項目中，用戶登陸信息存儲在session中好 還是cookie中好，取決于什么？
表格對其 只涉及到對其，沒有涉及到大小，長寬還有背景色類的嗎
html5 - 在echarts3中怎么使用echarts2中的wordCloud
javascript - vue-resource 如何二次封裝
pdo - mysql 簡單注入疑問
javascript - vue監聽data中的某一數組的某一項
python - 類似“%22%3A%22”這樣的字符串怎么解碼？
html5 - H5實現微場景的思路是什么 環境怎么搭建 求大神幫忙帶路 有好的課程希望推薦一下
有些網站為何不讓訪問首頁？（或者說不設首頁那個按鈕）
				

熱門標簽

av一区二区在线观看_亚洲男人的天堂网站_日韩亚洲视频_在线成人免费_欧美日韩精品免费观看视频_久草视

csv - python多列存取爬蟲網頁？