当前位置:Gxlcms > PHP教程 > PHPcurl抓包问题

PHPcurl抓包问题

时间:2021-07-01 10:21:17 帮助过:14人阅读

用curl抓取网页书据,成绩,课表都是用这种方法抓取的,但是抓取就不行了,报500错误

这是用httpwatch抓包工具抓取的页面,cookie正确拿到

这是要抓取的界面,数据通过post传输
								

PHP代码

报错界面


回复讨论(解决方案)

第一:带上CURLOPT_REFERER试试,对方是不是有可能判断了页面来源
第二:对方的登陆页是否有隐藏参数,如果有的话需要先访问登陆页获取隐藏值再提交
第三:没看见你什么地方提交了登录的账号和密码,就是自己这边POST过来的账号和密码

$url = "http://202.117.64.25/loginAction.do";
$fields = "dllx=dldl&zjh=201224080126&mm=201224080126";
这是提交账户和密码!
CURLOPT_REFERER 要加什么?

找到两个隐藏参数,但是貌似没什么用

第一:带上CURLOPT_REFERER试试,对方是不是有可能判断了页面来源
第二:对方的登陆页是否有隐藏参数,如果有的话需要先访问登陆页获取隐藏值再提交
第三:没看见你什么地方提交了登录的账号和密码,就是自己这边POST过来的账号和密码


$url = "http://202.117.64.25/loginAction.do";
$fields = "dllx=dldl&zjh=201224080126&mm=201224080126";
这是提交账户和密码!
CURLOPT_REFERER 要加什么?

找到两个隐藏参数,但是貌似没什么用

帮你测试了一下,登录是没问题的,已经登录成功,主要问题出在你第二次请求的参数上,检查一下参数,抓取你第二个页面上所有的参数下来,另外JAVA的这个报错不是很懂!

帮你测试了一下,登录是没问题的,已经登录成功,主要问题出在你第二次请求的参数上,检查一下参数,抓取你第二个页面上所有的参数下来,另外JAVA的这个报错不是很懂!


你抓取成功了吗?

大家帮忙看看,怎么回事???

你的流程和代码都有问题!正确的流程应该是:
1、访问 http://202.117.64.25/
获取 cookie。因为他的 sessionid 在这个页面发出的
2、访问 http://202.117.64.25/loginAction.do 并发送 post 表单数据
3、第2步返回的是一个框架页,你得根据需要进入某个框架
比如访问 http://202.117.64.25/menu/s_top.jsp 可以得到已登录信息:欢迎光临 黄小龙
测试代码

<?phpinclude 'curl/curl_get.php';$url = 'http://202.117.64.25/';curl_get($url);$url = "http://202.117.64.25/loginAction.do";$d = 'dllx=dldl&zjh=201224080126&mm=201224080126';curl_get($url, $d);echo curl_get('http://202.117.64.25/menu/s_top.jsp');echo curl_get('http://202.117.64.25/menu/mainFrame.jsp');echo curl_get('http://202.117.64.25/xsxxviewAction.do');</pre>  <br /> curl_get.php  <pre class="sycode" name="code"><?phpfunction curl_get($durl, $data=array()) {  $cookiejar = realpath('cookie.txt');  $t = parse_url($durl);  $ch = curl_init();  curl_setopt($ch, CURLOPT_URL,$durl);  curl_setopt($ch, CURLOPT_TIMEOUT,5);  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);  curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);  curl_setopt($ch, CURLOPT_REFERER, "http://$t[host]/");  curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiejar);  curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiejar);  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);  curl_setopt($ch, CURLOPT_ENCODING, 1); //gzip 解码  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);  if($data) {    curl_setopt($ch, CURLOPT_POST, 1);    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);  }  $r = curl_exec($ch);  curl_close($ch);  return $r;}</pre> </p> <p class="sougouAnswer">  <p class="yy">   你的流程和代码都有问题!正确的流程应该是:   <br /> 1、访问 http://202.117.64.25/   <br /> 获取 cookie。因为他的 sessionid 在这个页面发出的   <br /> 2、访问 http://202.117.64.25/loginAction.do 并发送 post 表单数据   <br /> 3、第2步返回的是一个框架页,你得根据需要进入某个框架   <br /> 比如访问 http://202.117.64.25/menu/s_top.jsp 可以得到已登录信息:欢迎光临 黄小龙   <br /> 测试代码   <pre class="sycode" name="code"><xmp><?phpinclude 'curl/curl_get.php';$url = 'http://202.117.64.25/';curl_get($url);$url = "http://202.117.64.25/loginAction.do";$d = 'dllx=dldl&zjh=201224080126&mm=201224080126';curl_get($url, $d);echo curl_get('http://202.117.64.25/menu/s_top.jsp');echo curl_get('http://202.117.64.25/menu/mainFrame.jsp');echo curl_get('http://202.117.64.25/xsxxviewAction.do');</pre>   <br /> curl_get.php   <pre class="sycode" name="code"><?phpfunction curl_get($durl, $data=array()) {  $cookiejar = realpath('cookie.txt');  $t = parse_url($durl);  $ch = curl_init();  curl_setopt($ch, CURLOPT_URL,$durl);  curl_setopt($ch, CURLOPT_TIMEOUT,5);  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);  curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);  curl_setopt($ch, CURLOPT_REFERER, "http://$t[host]/");  curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiejar);  curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiejar);  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);  curl_setopt($ch, CURLOPT_ENCODING, 1); //gzip 解码  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);  if($data) {    curl_setopt($ch, CURLOPT_POST, 1);    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);  }  $r = curl_exec($ch);  curl_close($ch);  return $r;}</pre>  </p>  <br /> 非常感谢你,成功了,但是还是不知道为什么?  <br /> 这是我最终代码  <br />  <pre class="sycode" name="code"><?phpinclude './curl/curl_get.php';$url = "http://202.117.64.25/loginAction.do";$d = 'dllx=dldl&zjh=201224080126&mm=201224080126';curl_get($url, $d);echo curl_get('http://202.117.64.25/xszxcxAction.do?oper=tjcx', 'zxxnxq=2014-2015-1-1&zxXaq=0&zxJxl=0011&zxZc=11&zxJc=2&zxxq=2&pageSize=20&page=1&currentPage=1&pageNo=');?></pre>                    </div>

                  

	 	
                    <div class="">
                        <ul class="m-news-opt fix">
                            <li class="opt-item">
                                <a href='/PHPjiqiao-139113.html' target='_blank'><p>< 上一篇</p><p class="ellipsis">php多维数组去重</p></a>
                            </li>
                            <li class="opt-item ta-r">
                                 <a href='/PHPjiqiao-139115.html' target='_blank'><p>下一篇 ></p><p class="ellipsis">求一套企业网站源码php+mysql的!</p></a>
                            </li>
                        </ul>
                    </div>
                    
                    
                    
                    
                </div>
              
                    </div>
                
                  

                    <div class="g-title fix">
                        <h2 class="title-txt">人气教程排行</h2>
                    </div>
                    <div class="m-rank u-dashed mb40">
			
                        <ul>
						
 <li class="rank-item">
                                <a href="/PHPjiqiao-379253.html" title='php如何获取跳转前的url' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num top">1</span>
                                   php如何获取跳转前的url                                </a>
                            </li>							  								  														  <li class="rank-item">
                                <a href="/PHPjiqiao-379019.html" title='php格林威治时间转换成当前时间的方法' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num second">2</span>
                                   php格林威治时间转换成当前时间的方法                                </a>
                            </li>								  														  								  <li class="rank-item">
                                <a href="/PHPjiqiao-366629.html" title='为什么php不能做大型系统?' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num third">3</span>
                                   为什么php不能做大型系统?                                </a>
                            </li>														  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-207623.html" title='range函数怎么用' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">4</span>
                                   range函数怎么用                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-162433.html" title='php中计算页面加载时间几种方法总结_PHP教程' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">5</span>
                                   php中计算页面加载时间几种方法总结_PHP教程                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-140221.html" title='求帮助,关于paypal支付返回值修改订单状态' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">6</span>
                                   求帮助,关于paypal支付返回值修改订单状态                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-103588.html" title='typecho怎么配置文章内容页?' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">7</span>
                                   typecho怎么配置文章内容页?                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-99213.html" title='PhpStorm左侧structure不显示文件的方法列表是这么回事?' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">8</span>
                                   PhpStorm左侧structure不显示文件的方法列表是这么回事?                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-92208.html" title='查看PHP的环境变量_PHP' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">9</span>
                                   查看PHP的环境变量_PHP                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-170.html" title='PHP Primary script unknown 解决方法总结' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">10</span>
                                   PHP Primary script unknown 解决方法总结                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-148.html" title='php的命名空间与自动加载实现方法' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">11</span>
                                   php的命名空间与自动加载实现方法                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-133.html" title='解决laravel 出现ajax请求419(unknown status)的问题' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">174次</span>
                                    <span class="g-sort-num ">12</span>
                                   解决laravel 出现ajax请求419(unknown status)的问题                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-462817.html" title='php 如何删除mysql记录' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">13</span>
                                   php 如何删除mysql记录                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-388448.html" title='PHP如何替换数组中的指定元素' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">14</span>
                                   PHP如何替换数组中的指定元素                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-124270.html" title='怎么去除字符串中非汉字、非字母、非数字的字符' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">15</span>
                                   怎么去除字符串中非汉字、非字母、非数字的字符                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-112291.html" title='mysql如何一次执行多条SQL语句' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">16</span>
                                   mysql如何一次执行多条SQL语句                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-110669.html" title='修改header里面的Connection为close解决方法' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">17</span>
                                   修改header里面的Connection为close解决方法                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-153.html" title='PHP基于session.upload_progress 实现文件上传进度显示功能详解' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">18</span>
                                   PHP基于session.upload_progress 实现文件上传进度显示功能详解                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-125.html" title='php5.6.x到php7.0.x特性小结' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">173次</span>
                                    <span class="g-sort-num ">19</span>
                                   php5.6.x到php7.0.x特性小结                                </a>
                            </li>							  								  							<li class="rank-item">
                                <a href="/PHPjiqiao-378118.html" title='php为什么会出现504错误' class="item-name ellipsis" target="_blank">
                                    <span class="g-art-count fr">172次</span>
                                    <span class="g-sort-num ">20</span>
                                   php为什么会出现504错误                                </a>
                            </li>

                        </ul>
                    </div>
                </div>
            </div>
            <!-- / 教程内容页 -->
        </div>
    </div>
  
<!-- 页尾 -->
<div class="footer">
   本站所有资源全部来源于网络,若本站发布的内容侵害到您的隐私或者利益,请联系我们删除!</div>
<!-- / 页尾 -->

 <script type="text/javascript" src="/kan/js/read.js"></script>

<div style="display:none">
<div class="login-box" id="login-dialog">
<div class="login-top"><a class="current" rel="nofollow" id="login1" onclick="setTab('login',1,2);" >登录</a></div>
<div class="login-form" id="nav-signin">
 <!-- <div class="login-ico"><a rel="nofollow" class="qq" id="qqlogin" target="_blank" href="/user-center-qqlogin.html"> QQ </a></div>  -->


<div class="login-box-form" id="con_login_1">
<form id="loginform" action="/user-center-login.html" method="post" onsubmit="return false;">
<p class="int-text">
<input class="email" id="username" name="username" type="text" value="用户名或Email" onfocus="if(this.value=='用户名或Email'){this.value='';}" onblur="if(this.value==''){this.value='用户名或Email';};" ></p>
<p class="int-text">
<input class="password1" type="password" id="password" name="password"  value="******"  onBlur="if(this.value=='') this.value='******';" onFocus="if(this.value=='******') this.value='';" >
</p>
<p class="int-info">
                <label class="ui-label">&nbsp;</label>
                <label for="agreement" class="ui-label-checkbox">
                <input type="checkbox" value="" name="cookietime" id="cookietime" checked="checked" value="2592000">
                <input type="hidden" name="notforward" id="notforward" value="1">
                <input  type="hidden" name="dosubmit" id="dosubmit" value="1">记住我的登录 </label>                           
       <a rel="nofollow" class="aright" href="/user-center-forgetpwd.html" target="_blank"> 忘记密码? </a></p>
  <p class="int-btn"><a rel="nofollow" id="loginbt"  class="loginbtn"><span>登录</span></a></p> 
  </form>
</div>
<form id="regform" action="/user-center-reg.html" method="post">
<div  class="login-reg" style="display: none;" id="con_login_2">
<input type="hidden" name="t" id="t"/>
  <p class="int-text">
    <input  id="email" name="email" type="text" value="Email" onfocus="if(this.value=='Email'){this.value='';}" onblur="if(this.value==''){this.value='Email';};"></p>
    <p class="int-text">
    <input id="uname" name="username" type="text" value="用户名或昵称" onfocus="if(this.value=='用户名或昵称'){this.value='';}" onblur="if(this.value==''){this.value='用户名或昵称';};"></p>
  <p class="int-text">
  <input  type="password" id="pwd" name="password" value="******"  onBlur="if(this.value=='') this.value='******';" onFocus="if(this.value=='******') this.value='';"> </p>
  <p class="int-text1"><span class="inputbox">
    <input id="validate" name="validate" type="text" value="验证码" onfocus="if(this.value=='验证码'){this.value='';}" onblur="if(this.value==''){this.value='验证码';};">
    </span><span class="yzm-img"><img src="/user-checkcode-index" alt="看不清楚换一张"  id="indexlogin"></p>
  <p class="int-info">
    <label>
      <input value="" name="agreement" id="agreement" CHECKED="checked" type="checkbox">
      我已阅读<a rel="nofollow" href="/user-center-agreement.html">用户协议</a>及<a rel="nofollow" href="/user-center-agreement.html">版权声明</a></label>
  </p>
  <p class="int-btn"><input type="hidden" name="dosubmit"/>
<a rel="nofollow" class="loginbtn"  id="register"><span>注册</span></a></p>
</div>
 </form>
</div>
</div>

</div>















</div>
 
<script type="text/javascript" src="/kan/js/foot_js.js"></script>   
<script>
var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "https://hm.baidu.com/hm.js?6dc1c3c5281cf70f49bc0bc860ec24f2";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
</script>
 <script type="text/javascript" src="/layui/layui.js"></script>
    <script>
    layui.use('code', function() {
        layui.code({
            elem: 'pre', //默认值为.layui-code
            about: false,
            skin: 'notepad',
            title: 'php怎么实现数据库验证跳转代码块',
            encode: true //是否转义html标签。默认不开启
        });
    });
    </script>

</body>

</html>