您的位置：首页 > 编程语言 > Python开发

【Python有趣打卡】python—调用百度人脸识别API计算颜值

2019-03-04 22:09 639 查看

python—调用百度人脸识别API计算颜值

今天要跟着罗罗攀（公众号：luoluopan1）学习Python有趣|寻找知乎最美小姐姐

参加了罗罗攀的python打卡，太有意思了，安利给大家，原文：https://mp.weixin.qq.com/s/M64NBbAFglxscPOvuz0r-w
此文章仅为学习文章~~

爬虫网页：https://www.zhihu.com/question/295119062

爬虫目的：爬取网页小姐姐们的照片，并调用百度人脸识别API进行颜值打分

分析网页

该网页采用了异步加载技术，就是不停的滑呀滑呀，就会出来好多小姐姐们的回答，找到滑动会加载的答案页面

观察这个网址

我一般都会选取几个网址，一起放在txt里，进行对比

放在一起就会很容易发现，除了offset会有不同，而且每5个一轮，其他都是固定不变的，因此这个url构造起来还是比较容易的，不断的循环offset即可，是不是敲简单~
我们任选一个url，打开看下~

这是json格式的，这里案例个小公举——“https://www.json.cn”（这里感谢下CYH推荐的哈哈哈哈哈），只要把json格式的内容拷贝到这个网页上就能稍微好看点

观察这个文件，可以发现主要的回答的内容在data的content里，我们还需要一些别的信息，比如知乎用户名（在data的author的name里），最重要的当然是图片啦，可以看到，图片的连接都在content里

现在我们已经知道这些网址是什么了，也知道我们要的数据在网页返回内容的什么位置了，那我们就开始爬取我们要的数据吧！

爬取网页

import requests
from lxml import etree
import json
import time
import re

headers={'user-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36',
'cookie':'填写你自己的哦'}

def get_img(url):
res = requests.get(url,headers = headers)
i = 1
json_data = json.loads(res.text)
datas = json_data['data']
for data in datas:
Id = data['author']['name']
content = data['content']
imgs = re.findall('img src="(.*?)"',content,re.S)
if len(imgs) == 0 :   #也有没有po照片的~
pass
else:
for img in imgs:
if 'jpg' in img:
res_img = requests.get(img,headers=headers)
fp = open('存放文件的地址'+ Id + '+' + str(i)+'.jpg','wb')
fp.write(res_img.content)
i = i+1    #有的小姐姐po了很多照片，emmm和表情包
print(id,img)

if __name__ =="__main__":
urls =['https://www.zhihu.com/api/v4/questions/29024583/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit=5&offset={}&platform=desktop&sort_by=default'.format(
str(i)) for i in range(0,25000,5)]
for url in urls:
get_img(url)
time.sleep(2)

人脸识别API

我们已经获得了一堆小姐姐的美美的照片了，偷偷塞了几张好基友的照片哈哈哈哈哈哈哈，算下他们的颜值，但是图片中还有一些男的（不想要），还有各种表情包！介么多照片肯定不能一张一张挑选，可以使用百度的人脸识别API进行图片筛选和打分（最期待打分了）~
接下来就是调用接口的时候了
百度人脸识别：http://ai.baidu.com/tech/face
人脸识别手册：https://ai.baidu.com/docs#/Face-Detect-V3/top
按照文档的要求一步一来就可以了
第一步：创建应用

创建成功后，会获得API Key和Secret Key，这类似于你的通行证，有了他们你才能调用

第二步：根据人脸识别文档的第一步是通过API Key和Secret Key获取token

import requests

ak = '你自己的ak'
sk = '你自己的sk'
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={}&client_secret={}'.format(ak,sk)
res = requests.post(host)
print(res.text)

用这个token就可以对API进行请求了
直接上代码（建议对着开发文档写代码）

import base64
import json
import requests

token = '刚刚得到的token
def get_img_base(file):
with open(file,'rb') as fp:
content = base64.b64encode(fp.read())
return content
requests_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
requests_url = requests_url+'?access_token=' +token

params ={
'image':get_img_base(r'C:\Users\xxj\Desktop\test.jpg'),
'image_type':'BASE64',
'face_field':'age,beauty,gender'
}

res = requests.post(requests_url,data = params)
result = res.text
json_result = json.loads(result)

code = json_result['error_code']
gender = json_result['result']['face_list'][0]['gender']['type']
beauty = json_result['result']['face_list'][0]['beauty']
print(code,gender,beauty)

这里以我居居老师为例哈哈哈哈哈哈哈哈，看下我神仙颜值的居老师有多少分

辣鸡，怎么可能才73.41分，表示怀疑，应该100昏！！！！！
再试试我丽颖

真神仙颜值！！！哈哈哈哈哈哈哈哈

综合

最后，我们要调用接口过滤掉非女孩子，非人物的照片，对小姐姐照片进行打分，并按照不同的级别的分数进行分类放置在不同的文件夹里。

import requests
import os
import base64
import json
import time

def get_img_base(file):
with open(file,'rb') as fp:
content = base64.b64encode(fp.read())
return content

file_path = 'C:/Users/Desktop/test'
list_paths = os.listdir(file_path)
for list_path in list_paths:
img_path = file_path + '/'+ list_path
token = ''
requests_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect"
requests_url = requests_url+'?access_token=' +token

params ={
'image':get_img_base(img_path),
'image_type':'BASE64',
'face_field':'age,beauty,gender'
}

res = requests.post(requests_url,data = params)
result = res.text
json_result = json.loads(result)

code = json_result['error_code']
if code == 222202:
continue
try:
gender = json_result['result']['face_list'][0]['gender']['type']
if gender == 'male':
continue
beauty = json_result['result']['face_list'][0]['beauty']
new_beauty = round(beauty/10,1)
print(img_path,new_beauty)
if new_beauty >= 8:
os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/Desktop/8分',str(new_beauty) +'+'+ list_path))
elif new_beauty >= 7:
os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/Desktop/7分',str(new_beauty) +'+'+ list_path))
elif new_beauty >= 6:
os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/Desktop/6分',str(new_beauty) +'+'+ list_path))
elif new_beauty >= 5:
os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/xxj/Desktop/5分',str(new_beauty) +'+'+ list_path))
else:
os.rename(os.path.join(file_path,list_path),os.path.join('C:/Users/xxj/Desktop/哎',str(new_beauty) +'+'+ list_path))
time.sleep(1)
except KeyError:
pass
except TypeError:
pass

完成任务！
perfect！

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航