智慧生活科技專業社群: 用Python來爬圖，找出南開校園美景圖

2018年11月15日星期四

用Python來爬圖，找出南開校園美景圖

經由前面幾篇文章，我們可以看到Python的在網路上爬文以及製作WORD文件的技術，今天我們就來看看爬圖的技術。在南開官網中有許多學校校園美景圖。

接下來我們就來討論如何下載這個網址上的圖片。

首先我們先測試下列程式

import requests
url='http://www.nkut.edu.tw/page2/photo.php?CID=1&Album_ID=2&ano='
r=requests.get(url)
print(r)

執行結果是：

表示網站存取成功
import requests
url='http://www.nkut.edu.tw/page2/photo.php?CID=1&Album_ID=2&ano='
r=requests.get(url)
print(r.text)

我在r物件中加上.text，就可以讀取HTML文章，其結果如下：

結果發現有一些亂碼，我們只要在程式中加入一行指令

import requests
url='http://www.nkut.edu.tw/page2/photo.php?CID=1&Album_ID=2&ano='
r=requests.get(url)
r.encoding=r.apparent_encoding
print(r.text)

執行結果如下：

利用BeautifulSoup中的find_all指令找出網頁中的含圖片的標籤，其程式如下：
import requests
from bs4 import BeautifulSoup

url='http://www.nkut.edu.tw/page2/photo.php?CID=1&Album_ID=2&ano='
r=requests.get(url)
r.encoding=r.apparent_encoding
soup =BeautifulSoup(r.text,'html.parser')
all_img=soup.find_all('img')
print(all_img)

執行結果

利用os來進行開檔、存檔等動作，其完整的程式如下：

import requests
from bs4 import BeautifulSoup
import os

url='http://www.nkut.edu.tw/page2/photo.php?CID=1&Album_ID=2&ano='
r=requests.get(url)
r.encoding=r.apparent_encoding
soup =BeautifulSoup(r.text,'html.parser')
all_img=soup.find_all('img')
for img in all_img:
src=img['src']
img_url='http://www.nkut.edu.tw/'+src
print (img_url)
root='C:/nkut_pic/'
path = root + img_url.split('/')[-1]
try:
if not os.path.exists(root):
os.mkdir(root)
if not os.path.exists(path):
r = requests.get(img_url)
with open(path, 'wb') as f:
f.write(r.content)
f.close()
print("文件保存成功")
else:
print("文件已存在")
except:
print("爬取失敗")

執行結果

http://www.nkut.edu.tw//files/adbannerplay/4_5643b90c.jpg
文件保存成功
http://www.nkut.edu.tw//files/adbannerplay/5_29d7fb68.jpg
文件保存成功
http://www.nkut.edu.tw//files/adbannerplay/6_28f3edec.jpg
文件保存成功
http://www.nkut.edu.tw//files/adbannerplay/7_8655f891.jpg

文件保存成功

智慧生活科技專業社群

2018年11月15日星期四

用Python來爬圖，找出南開校園美景圖

沒有留言:

張貼留言

2018年11月15日 星期四

用Python來爬圖，找出南開校園美景圖

沒有留言:

張貼留言

2018年11月15日星期四