当前位置：Gxlcms > mysql > Hive导入ApacheNginx等日志与分析

Hive导入ApacheNginx等日志与分析

时间：2021-07-01 10:21:17 帮助过：23人阅读

将nginx日志导入到hive中的两种方法 1 在hive中建表 CREATE TABLE apachelog (ipaddress STRING, identd STRING, user STRING

将nginx日志导入到hive中的两种方法

1 在hive中建表

导入后日志格式为

203.208.60.91 - - 05/May/2011:01:18:47 +0800 GET /robots.txt HTTP/1.1 404 1238 Mozilla/5.0

此方法支持hive中函数parse_url(referer,"HOST")

第二种方法导入

注意：这个方法在建表后，使用查询语句等前要先执行

hive> add jar /home/hjl/hive/lib/hive_contrib.jar;

或者设置hive/conf/hive-default.conf 添加

hive.aux.jars.path
file:///usr/local/hadoop/hive/lib/hive-contrib-0.7.0-cdh3u0.jar

保存配置

203.208.60.91 - - [05/May/2011:01:18:47 +0800] "GET /robots.txt HTTP/1.1" 404 1238 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +)"

此方法中的字段类型stringfrom deserializer 经测试不支持parse_url(referer,"HOST")获取域名

可以用select split(referer,"/")[2] from apilog 获取域名

如果文件数据是纯文本，可以使用 STORED AS TEXTFILE。如果数据需要压缩，，使用 STORED AS SEQUENCE 。

导入日志命令

hive>load data local inpath '/home/log/map.gz' overwrite into table log;

导入日志支持.gz等格式

导入日志后进行分析例句

统计行数
select count(*) from nginxlog;

统计IP数
select count(DISTINCT ip) from nginxlog;

排行
select t2.ip,t2.xx from (SELECT ip, COUNT(*) AS xx FROM nginxlog GROUP by ip) t2 sort by t2.xx desc

hive>SELECT * from apachelog WHERE ipaddress = '216.211.123.184';

hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

hive> set mapred.reduce.tasks=2;
hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

hive>CREATE TABLE ipsummary (ipaddress STRING, numrequest INT);
hive>INSERT OVERWRITE TABLE ipsummary SELECT ipaddress, COUNT(1) FROM apachelog GROUP BY ipaddress;

hive>SELECT ipsummary.ipaddress, ipsummary.numrequest FROM (SELECT MAX(numrequest) AS themax FROM ipsummary) ipsummarymax JOIN ipsummary ON ipsummarymax.themax = ipsummary.numrequest;

hive查询结果导出为csv的方法（未测试）

hive> set hive.io.output.fileformat=CSVTextFile;
hive> insert overwrite local directory '/tmp/CSVrepos/' select * from S where ... ;

Hive导入ApacheNginx等日志与分析

人气教程排行