请查收,最近B站献给新一代的青年宣言片。
国家一级演员何冰走上舞台,以青年宣言《后浪》为词,认可、赞美与寄语年轻一代。在UP主们的青春混剪中,属于年轻人的光芒正在闪耀。“你们有幸 遇见这样的时代 但时代更有幸 遇见这样的你们”
用Python爬取《后浪》弹幕,看看“后浪”都在评论些什么?
一、找到评论链接
进入B站《后浪》播放页面,按F12键后,刷新页面。
评论链接在红色标记下对应的包,蓝线上方Request URL即为评论链接https://api.bilibili.com/x/v1/dm/list.so?oid=188273397。注意,Hide data URLs勾选为All。
二、敲代码
1、利用requests库请求获取网页
def getHtmlText(url):
try:
response = requests.get(url)
response.raise_for_status()
response.encoding = response.apparent_encoding
data = response.content.decode('utf-8')#utf-8转码,不然会出现乱码
return data
except:
return ''
2、找到所有评论并将其存入二维列表
所有评论都在标签下的文本中,而所有标签都在标签下。利用bs4库中的BeautifulSoup函数将1中得到的页面进行解析。找到标签下的文本存入列表。(这里也可以用正则表达式)
def fillList(html,list):
soup = BeautifulSoup(html,'html.parser')
itotal = soup.find('i')
dtotals = itotal.find_all('d')
for dtotal in dtotals:
danmu = dtotal.text
List.append([danmu])
3、将二维列表中的所有数据下载到文本
def down_danmu(list):
for list in List:
with open('后浪弹幕.txt', 'a', encoding='utf-8-sig') as f:
s = str(list).replace('[','').replace(']','') + '\n' #去除[],这两行按数据不同,可以选择,每行末尾加换行符
s = s.replace("'", '').replace(',', '')#去除单引号,逗号,每行末尾追加换行符
f.write(s)
print("文件写入成功!")
4、生成专属词云图
def showPic():
img = np.array(image.open('bili.jpg'))#numpy库与PIL库引入自定义图片
with open('后浪弹幕.txt','r',encoding='utf-8') as f:
text = f.read()
w = wordcloud.WordCloud(font_path="msyh.ttc", mask=img, scale=15,stopwords=' ', width=1000, height=700, background_color='white')
w.generate(text)
w.to_file("Houlang.png")
完整代码
import requests
from bs4 import BeautifulSoup
import wordcloud
import PIL.Image as image
import numpy as np
def getHtmlText(url):
try:
response = requests.get(url)
response.raise_for_status()
response.encoding = response.apparent_encoding
html = response.content.decode('utf-8')
return html
except:
return ''
def fillList(html,list):
soup = BeautifulSoup(html,'html.parser')
itotal = soup.find('i')
dtotals = itotal.find_all('d')
for dtotal in dtotals:
danmu = dtotal.text
List.append([danmu])
def down_danmu(list):
for list in List:
with open('./后浪弹幕.txt', 'a', encoding='utf-8-sig') as f:
s = str(list).replace('[','').replace(']','') + '\n'
s = s.replace("'", '').replace(',', '')
f.write(s)
print("文件写入成功!")
def showPic():
img = np.array(image.open('bili.jpg'))
with open('后浪弹幕.txt','r',encoding='utf-8') as f:
text = f.read()
w = wordcloud.WordCloud(font_path="msyh.ttc", mask=img, scale=15,stopwords=' ', width=1000, height=700, background_color='white')
w.generate(text)
w.to_file("Houlang.png")
List = []
url = 'https://api.bilibili.com/x/v1/dm/list.so?oid=186803402'
html = getHtmlText(url)
fillList(html,List)
down_danmu(List)
showPic()
效果图