时间:2021-07-01 10:21:17 帮助过:68人阅读
通过下面这段代码可以一目了然的知道scrapy的抓取页面结构,调用也非常简单
- #!/usr/bin/env python
- import fileinput, re
- from collections import defaultdict
- def print_urls(allurls, referer, indent=0):
- urls = allurls[referer]
- for url in urls:
- print ' '*indent + referer
- if url in allurls:
- print_urls(allurls, url, indent+2)
- def main():
- log_re = re.compile(r'<get (.*?)=""> \(referer: (.*?)\)')
- allurls = defaultdict(list)
- for l in fileinput.input():
- m = log_re.search(l)
- if m:
- url, ref = m.groups()
- allurls[ref] += [url]
- print_urls(allurls, 'None')
- main()</get>
希望本文所述对大家的Python程序设计有所帮助。