用SQL玩转数据挖掘之MADlib(一)——安装
时间:2021-07-01 10:21:17
帮助过:41人阅读
二、安装
1. 确定安装平台
MADlib最新发布版本是1.11,可以安装在PostgreSQL、Greenplum和HAWQ中,在不同的数据库中安装过程也不尽相同。我是安装在HAWQ2.1.1.0中。
2. 下载MADlib二进制安装压缩包
下载地址为:https://network.pivotal.io/products/pivotal-hdb。2.1.1.0版本的HAWQ提供了四个安装文件,如图2所示。经过测试,只有MADlib 1.10.0版本的文件可以正常安装。
图2
3. 安装MADlib
以下命令需要使用gpadmin用户,在HAWQ的master主机上执行。
(1)解压缩
[plain] view plain
copy
- tar -zxvf madlib-ossv1.10.0_pv1.9.7_hawq2.1-rhel5-x86_64.tar.gz
(2)安装MADlib的gppkg文件
[plain] view plain
copy
- gppkg -i madlib-ossv1.10.0_pv1.9.7_hawq2.1-rhel5-x86_64.gppkg
该命令在HAWQ集群的所有节点(master和segment)上创建MADlib的安装目录和文件,缺省目录为/usr/local/hawq_2_1_1_0/madlib。
(3)在指定数据库中部署MADlib
[plain] view plain
copy
- $GPHOME/madlib/bin/madpack install -c /dm -s madlib -p hawq
该命令在HAWQ的dm数据库中建立madlib schema,-p参数指定平台为HAWQ。命令执行后可以查看在madlib schema中创建的数据库对象。
[plain] view plain
copy
- dm=# set search_path=madlib;
- SET
- dm=# \dt
- List of relations
- Schema | Name | Type | Owner | Storage
- --------+------------------+-------+---------+-------------
- madlib | migrationhistory | table | gpadmin | append only
- (1 row)
-
- dm=# \ds
- List of relations
- Schema | Name | Type | Owner | Storage
- --------+-------------------------+----------+---------+---------
- madlib | migrationhistory_id_seq | sequence | gpadmin | heap
- (1 row)
-
- dm=# select type,count(*)
- dm-# from (select p.proname as name,
- dm(# case when p.proisagg then ‘agg‘
- dm(# when p.prorettype = ‘pg_catalog.trigger‘::pg_catalog.regtype then ‘trigger‘
- dm(# else ‘normal‘
- dm(# end as type
- dm(# from pg_catalog.pg_proc p, pg_catalog.pg_namespace n
- dm(# where n.oid = p.pronamespace and n.nspname=‘madlib‘) t
- dm-# group by rollup (type);
- type | count
- --------+-------
- agg | 135
- normal | 1324
- | 1459
- (3 rows)
可以看到,MADlib部署应用程序madpack首先创建数据库模式madlib,然后在该模式中创建数据库对象,包括
一个表,一个序列,1324个普通函数,135个聚合函数。所有的机器学习和数据挖掘模型、算法、操作和功能都是通过调用这些函数实际执行的。
(4)验证安装
[plain] view plain
copy
- $GPHOME/madlib/bin/madpack install-check -c /dm -s madlib -p hawq
该命令通过执行29个模型的77个案例,验证所有模型都能正常工作。命令输出如下:
[plain] view plain
copy
- [gpadmin@hdp3 Madlib]$ $GPHOME/madlib/bin/madpack install-check -c /dm -s madlib -p hawq
- madpack.py : INFO : Detected HAWQ version 2.1.
- TEST CASE RESULT|Module: array_ops|array_ops.sql_in|PASS|Time: 1851 milliseconds
- TEST CASE RESULT|Module: bayes|gaussian_naive_bayes.sql_in|PASS|Time: 24222 milliseconds
- TEST CASE RESULT|Module: bayes|bayes.sql_in|PASS|Time: 70634 milliseconds
- TEST CASE RESULT|Module: crf|crf_train_small.sql_in|PASS|Time: 27186 milliseconds
- TEST CASE RESULT|Module: crf|crf_train_large.sql_in|PASS|Time: 32602 milliseconds
- TEST CASE RESULT|Module: crf|crf_test_small.sql_in|PASS|Time: 22410 milliseconds
- TEST CASE RESULT|Module: crf|crf_test_large.sql_in|PASS|Time: 21711 milliseconds
- TEST CASE RESULT|Module: elastic_net|elastic_net_install_check.sql_in|PASS|Time: 931563 milliseconds
- TEST CASE RESULT|Module: graph|sssp.sql_in|PASS|Time: 18174 milliseconds
- TEST CASE RESULT|Module: linalg|svd.sql_in|PASS|Time: 72105 milliseconds
- TEST CASE RESULT|Module: linalg|matrix_ops.sql_in|PASS|Time: 58312 milliseconds
- TEST CASE RESULT|Module: linalg|linalg.sql_in|PASS|Time: 2836 milliseconds
- TEST CASE RESULT|Module: pmml|table_to_pmml.sql_in|PASS|Time: 34508 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_rf.sql_in|PASS|Time: 35993 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_ordinal.sql_in|PASS|Time: 15540 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_multinom.sql_in|PASS|Time: 12546 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_glm_poisson.sql_in|PASS|Time: 7321 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_glm_normal.sql_in|PASS|Time: 8597 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_glm_ig.sql_in|PASS|Time: 8861 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_glm_gamma.sql_in|PASS|Time: 26212 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_glm_binomial.sql_in|PASS|Time: 12977 milliseconds
- TEST CASE RESULT|Module: pmml|pmml_dt.sql_in|PASS|Time: 9401 milliseconds
- TEST CASE RESULT|Module: prob|prob.sql_in|PASS|Time: 1917 milliseconds
- TEST CASE RESULT|Module: sketch|support.sql_in|PASS|Time: 143 milliseconds
- TEST CASE RESULT|Module: sketch|mfv.sql_in|PASS|Time: 720 milliseconds
- TEST CASE RESULT|Module: sketch|fm.sql_in|PASS|Time: 7301 milliseconds
- TEST CASE RESULT|Module: sketch|cm.sql_in|PASS|Time: 19777 milliseconds
- TEST CASE RESULT|Module: svm|svm.sql_in|PASS|Time: 205677 milliseconds
- TEST CASE RESULT|Module: tsa|arima_train.sql_in|PASS|Time: 75680 milliseconds
- TEST CASE RESULT|Module: tsa|arima.sql_in|PASS|Time: 76236 milliseconds
- TEST CASE RESULT|Module: conjugate_gradient|conj_grad.sql_in|PASS|Time: 6757 milliseconds
- TEST CASE RESULT|Module: knn|knn.sql_in|PASS|Time: 9835 milliseconds
- TEST CASE RESULT|Module: lda|lda.sql_in|PASS|Time: 20510 milliseconds
- TEST CASE RESULT|Module: stats|wsr_test.sql_in|PASS|Time: 2766 milliseconds
- TEST CASE RESULT|Module: stats|t_test.sql_in|PASS|Time: 3686 milliseconds
- TEST CASE RESULT|Module: stats|robust_and_clustered_variance_coxph.sql_in|PASS|Time: 17499 milliseconds
- TEST CASE RESULT|Module: stats|pred_metrics.sql_in|PASS|Time: 14032 milliseconds
- TEST CASE RESULT|Module: stats|mw_test.sql_in|PASS|Time: 1852 milliseconds
- TEST CASE RESULT|Module: stats|ks_test.sql_in|PASS|Time: 2465 milliseconds
- TEST CASE RESULT|Module: stats|f_test.sql_in|PASS|Time: 2358 milliseconds
- TEST CASE RESULT|Module: stats|cox_prop_hazards.sql_in|PASS|Time: 39932 milliseconds
- TEST CASE RESULT|Module: stats|correlation.sql_in|PASS|Time: 10520 milliseconds
- TEST CASE RESULT|Module: stats|chi2_test.sql_in|PASS|Time: 3581 milliseconds
- TEST CASE RESULT|Module: stats|anova_test.sql_in|PASS|Time: 1801 milliseconds
- TEST CASE RESULT|Module: svec_util|svec_test.sql_in|PASS|Time: 14043 milliseconds
- TEST CASE RESULT|Module: svec_util|gp_sfv_sort_order.sql_in|PASS|Time: 3399 milliseconds
- TEST CASE RESULT|Module: utilities|text_utilities.sql_in|PASS|Time: 6579 milliseconds
- TEST CASE RESULT|Module: utilities|sessionize.sql_in|PASS|Time: 3901 milliseconds
- TEST CASE RESULT|Module: utilities|pivot.sql_in|PASS|Time: 15634 milliseconds
- TEST CASE RESULT|Module: utilities|path.sql_in|PASS|Time: 9321 milliseconds
- TEST CASE RESULT|Module: utilities|encode_categorical.sql_in|PASS|Time: 7665 milliseconds
- TEST CASE RESULT|Module: utilities|drop_madlib_temp.sql_in|PASS|Time: 153 milliseconds
- TEST CASE RESULT|Module: assoc_rules|assoc_rules.sql_in|PASS|Time: 31975 milliseconds
- TEST CASE RESULT|Module: convex|lmf.sql_in|PASS|Time: 66775 milliseconds
- TEST CASE RESULT|Module: glm|poisson.sql_in|PASS|Time: 19117 milliseconds
- TEST CASE RESULT|Module: glm|ordinal.sql_in|PASS|Time: 23446 milliseconds
- TEST CASE RESULT|Module: glm|multinom.sql_in|PASS|Time: 18780 milliseconds
- TEST CASE RESULT|Module: glm|inverse_gaussian.sql_in|PASS|Time: 20931 milliseconds
- TEST CASE RESULT|Module: glm|gaussian.sql_in|PASS|Time: 23795 milliseconds
- TEST CASE RESULT|Module: glm|gamma.sql_in|PASS|Time: 43365 milliseconds
- TEST CASE RESULT|Module: glm|binomial.sql_in|PASS|Time: 39437 milliseconds
- TEST CASE RESULT|Module: linear_systems|sparse_linear_sytems.sql_in|PASS|Time: 5405 milliseconds
- TEST CASE RESULT|Module: linear_systems|dense_linear_sytems.sql_in|PASS|Time: 3331 milliseconds
- TEST CASE RESULT|Module: recursive_partitioning|random_forest.sql_in|PASS|Time: 294832 milliseconds
- TEST CASE RESULT|Module: recursive_partitioning|decision_tree.sql_in|PASS|Time: 91311 milliseconds
- TEST CASE RESULT|Module: regress|robust.sql_in|PASS|Time: 55325 milliseconds
- TEST CASE RESULT|Module: regress|multilogistic.sql_in|PASS|Time: 25330 milliseconds
- TEST CASE RESULT|Module: regress|marginal.sql_in|PASS|Time: 73750 milliseconds
- TEST CASE RESULT|Module: regress|logistic.sql_in|PASS|Time: 76501 milliseconds
- TEST CASE RESULT|Module: regress|linear.sql_in|PASS|Time: 7517 milliseconds
- TEST CASE RESULT|Module: regress|clustered.sql_in|PASS|Time: 40661 milliseconds
- TEST CASE RESULT|Module: sample|sample.sql_in|PASS|Time: 890 milliseconds
- TEST CASE RESULT|Module: summary|summary.sql_in|PASS|Time: 14644 milliseconds
- TEST CASE RESULT|Module: kmeans|kmeans.sql_in|PASS|Time: 52173 milliseconds
- TEST CASE RESULT|Module: pca|pca_project.sql_in|PASS|Time: 229016 milliseconds
- TEST CASE RESULT|Module: pca|pca.sql_in|PASS|Time: 523230 milliseconds
- TEST CASE RESULT|Module: validation|cross_validation.sql_in|PASS|Time: 33685 milliseconds
- [gpadmin@hdp3 Madlib]$
可以看到,所有案例都已经正常执行,说明MADlib安装成功。
三、卸载
卸载过程基本上是安装的逆过程。
1. 删除madlib模式
方法1,使用madpack部署应用程序。
[plain] view plain
copy
- $GPHOME/madlib/bin/madpack uninstall -c /dm -s madlib -p hawq
方法2,使用SQL命令手工删除模式。
[sql] view plain
copy
- drop schema madlib cascade;
2. 删除其它遗留数据库对象