当前位置:Gxlcms > 数据库问题 > 多线程爬取都挺好链接并保存到mongodb

多线程爬取都挺好链接并保存到mongodb

时间:2021-07-01 10:21:17 帮助过:28人阅读

  • 一个比较简单,python3多线程使用requests库爬取都挺好,并使用正则提取下载链接,保存到mongodb
  • #!/usr/bin/env python
    # -*- coding:utf-8 -*-
    """
    @author:Aiker Zhao
    @file:doutinghao.py
    @time:下午8:18
    """
    import requests
    import re
    import pymongo
    from multiprocessing import Pool
    
    MONGO_URL = ‘localhost:27017‘
    MONGO_DB = ‘doutinghao‘
    MONGO_TABLE = ‘doutinghao‘
    client = pymongo.MongoClient(MONGO_URL, connect=False)
    db = client[MONGO_DB]
    
    def get_result(url):
        response = requests.get(url).text
        # print(reponse.text)
        pattern = re.compile(‘<a href="(ed2k.*?)"\srel.*?title="(.*?.mp4).*?".*?>‘, re.S)
        result = re.findall(pattern, response)
        if result:
            for i in result:
                url, name = i
                yield {
                    "name": name,
                    ‘url‘: url
                }
    
    def save_to_mongo(result):
        if db[MONGO_TABLE].insert(result):
            print(‘存储到MongoDB成功‘, result)
            return True
        return False
    
    def main(result):
        # result = get_result(url)
        save_to_mongo(result)
    
    if __name__ == ‘__main__‘:
        pool = Pool()
        url = "https://www.xl720.com/thunder/34283.html"
        item = [item for item in get_result(url)]
        # print(item)
        pool.map(main, item)
        pool.close()
        pool.join()
    

    技术图片

    技术图片
    技术图片

    多线程爬取都挺好链接并保存到mongodb

    标签:local   localhost   爬取   out   并保存   text   code   多线程   图片   

    人气教程排行