type
Post
status
Published
date
Apr 3, 2023
slug
summary
tags
思考
Python
category
技术分享
icon
password
文章来源说明

🤔 折腾记录

  1. 写爬虫过程中经常遇到需要解析html中的数据
  1. 平常对html数据一团乱麻的结构无从下手
    1. 早就听说了有这个模块一直没有用过,今天闲来无事写了一个小demo
 

📝Html源码

<!DOCTYPE html><html><head><title>2023-4-2 17:00</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="format-detection" content="telephone=no"/><link href="https://wodemo.com/statics/build/css/cb4a99cbe25c7fc4e15fa44ed3c12d97.css" rel="stylesheet" type="text/css" /><script type="text/javascript" src="https://s.wodemo.com/js/locale.js?lang=en_US&t=1680494664&login=0"></script><script type="text/javascript" src="https://wodemo.com/statics/build/js/6973b75b053df45c1097b1beb493a1c9.js"></script><link rel="alternate" href="https://mu228.wodemo.com/feed" type="application/rss+xml" title="RSS"></head><body><div id="whole_body" class="wo-main-body wo-mode-visitor"> <h1 class="wo-entry-title">2023-4-2 17:00</h1><div class="wo-entry-time" timestamp="1680425692">2023-04-02 16:54:52 +0800</div><div class="wo-entry-section wo-text-markdown"><p>2025-4-2 18:00</p></div><div class="wo-entry-prev-next">&#171;Newer&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="/entry/540437" title="学习歌单轻音乐">Older&#187;</a></div><form method="POST" action="https://s.wodemo.com/"><input type="hidden" name="act" value="file_talk" /><input type="hidden" name="fid" value="540447" />Comment:<br /><textarea name="content" cols="15" rows="3" class="wo-reply-textarea"></textarea><div>Name:<input type="input" name="name" value="" /></div><input type="submit" value="Submit" /></form><hr /><a href="https://mu228.wodemo.com/">Back to home</a><br /><br /> <span class="wo-site-feed-link"><a href="https://mu228.wodemo.com/subscribe">Subscribe</a> | </span> <a href="https://wodemo.com/reg?return_to=https%3A%2F%2Fmu228.wodemo.com%2Fentry%2F540447">Register</a> |<a href="https://wodemo.com/login?return_to=https%3A%2F%2Fmu228.wodemo.com%2Fentry%2F540447">Login</a> <span class="wo-n-count-block wo-hidden">| <a href="https://s.wodemo.com/notification">N<sup class="wo-n-count-num"></sup></a></span> </div><!--content body--> <script type="text/javascript">WoUtil.init('https://mu228.wodemo.com/entry/540447');</script></body></html>
 

现在使用beautiful4模块提取数据
 
  • 提取title中的2023-4-2 17:00
  • 再提取P标签里的2025-4-2 18:00

准备工作:

  • pip install Beautiful4
 
import time import requests from bs4 import BeautifulSoup url = 'https://mu228.wodemo.com/entry/540447' response = requests.get(url) start_time = time.time() soup = BeautifulSoup(response.text,'html.parser') text = soup.find('title').text print(text) text = soup.find('p').text print(text) end_time = time.time() print(f"运行速度: {end_time - start_time:<.4f} 秒")
 
 

再来一个提取QQ说说数据的代码
url = 'https://h5.qzone.qq.com/ugc/share?ticket&srctype=62&sharetag=77E995FF84EC7444B67D29989E084A59&bp7&bp2&bp1&_wv=1&g_f=5758&no_topbar=1&res_uin=464599171&appid=311#detail?appid=311&cellid=undefined&subid=&res_uin=464599171&lloc=&batch=#formbox=true'response = requests.get(url) # 使用BeautifulSoup解析HTML start_time = time.time() soup = BeautifulSoup(response.content, 'html.parser') text = soup.find('p', class_='txt').text print(text)end_time = time.time() print(f"运行速度:{end_time - start_time:.4f}秒")

执行结果

notion image

🤗总结归纳

完美
 
 
 
致谢:
💡
有关Notion安装或者使用上的问题,欢迎您在底部评论区留言,一起交流~
 
 
使用notion搭建个人博客python模块之configparser学习

NotionNext
NotionNext
一个普通的干饭人🍚
公告
type
Notice
status
Published
date
Jul 2, 2021
slug
#
summary
类型为Notice的文章将被显示为公告,仅 hexo和next支持;仅限一个公告
tags
category
icon
password
🎉NotionNext 3.13已上线🎉
-- 新版本特性 ---
二级菜单
Database打开
-- 感谢您的支持 ---
👏欢迎更新体验👏