当前位置：Gxlcms > PHP教程 > php自制基于simple_html_dom的爬虫一只v1.0

php自制基于simple_html_dom的爬虫一只v1.0

时间：2021-07-01 10:21:17 帮助过：15人阅读

一直以来网页解析和爬虫的制作热情丝毫未减今天用开源的simple_html_dom.php解析框架做了一只爬虫：

find('a') as $e) 
{
	$f=$e->href;
	//if($f[10]==':')continue;
	if($f[0]=='/')$f='http://www.baidu.com'.$f;//Completion the url
	if($f[4]=='s')continue;//If the url is "https://" continue (the simple_html_dom might can't prase the https:// url)  
	if(stripos($f,"baidu")==FALSE)continue;//If the url not in this website continue
    echo $f . '
';
	$tmp[$cun++]=$f; //Save the urls into array
}

foreach($tmp as $r) //Dig the urls in $tmp[]
{
$html2=file_get_html($r); //Redo the step
foreach($html2->find('a') as $a)
{
	$u=$a->href;
	if($u[0]=='/')$u='http://www.baidu.com'.$u;
	if($u[4]=='s')continue;
	if(stripos($u,"baidu")==FALSE)continue;
	echo $u.'
';
}
$html2=null;
}
?>

//最后总会出现一个Fatal error: Call to a member function find() on a non-object in D:\xampp\htdocs\html\index.php on line 21 的警告与学长沟通后改正了很多小错误不过这个仍然没有解决希望有大神能够指点一下

---------------------分割线---------------------

simple_html_dom下载：

https://github.com/Ph0enixxx/simple_html_dom

= =家里电脑用不了git4win

以上就介绍了 php 自制基于simple_html_dom的爬虫一只v1.0，包括了方面的内容，希望对PHP教程有兴趣的朋友有所帮助。

php自制基于simple_html_dom的爬虫一只v1.0

人气教程排行