时间:2021-07-01 10:21:17 帮助过:3人阅读
the conresponding hive sql script shows as follows:
abc_incremental.sql--handles the inserted data. -- points_core.tb_acc_rdm_rel is append only, so no update and delete is related!!!! INSERT OVERWRITE TABLE ods.abc_incremental PARTITION(pt_log_d = ‘${hivevar:p_partition_d}‘) SELECT id, last_update_timestamp FROM staging.staging_abc AS a WHERE pt_log_d = ‘${hivevar:p_partition_d}‘; --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- -- delete hive table partition, the delete file job would be done in shell script,since this table is external table. ALTER TABLE staging.staging_abc DROP IF EXISTS PARTITION(pt_log_d=‘${hivevar:p_partition_to_delete}‘); --------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------
in the hql file ,we reference the passing parameter using ${hivevar:variable_name}.
Note: we can also using hiveconf instead of hivevar, but for each parameter, we should use it as the style the parameter passed in. the difference between hivevar and hiveconf is:
common requierment like:
I have implemented a common shell script, by calling such shell script, we can pass in parameter values just as the style we used in hive.
the implementation code shows as follows:
globle_config.sh
#!/bin/bash OP_HOME_BIN=/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin HIVE_HOME_BIN=/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/bin MYSQL_HOME_BIN=/usr/local/mysql/bin DORIS_HOST=192.168.1.101 DORIS_PORT=9030 DORIS_USER_NAME=dev_readonly DORIS_PASSWORD=dev_readonly# # this function provide functionality to copy the provided file # input parameter: # $1: the working directory # $2: the absolute path of the file to be copyed. # result: the absolute path the copyed file, the copyed file was located at the working folder. function func_copy_file() { if [ $# != 2 ]; then echo "missing parameter, type in like: repaceContent /opt/ /opt/a.sql" exit -1 fi working_dir=$1 source_file=$2 # check(s) if the file to be copyed exists. if [ ! -f $source_file ]; then echo "file : " $source_file " to be copied does not exist" exit -1 fi # check(s) if the working dir exists. if [ ! -d "$working_dir" ]; then echo "the working directory : " $source_file " does not exist" exit -1 fi # checks if the file already exists, $result holds the copied file name(absolute path) result=${working_dir}/$(generate_datatime_random) while [ -f $result ]; do result=${working_dir}/$(generate_datatime_random) done # copy file cp ${source_file} ${result} echo ${result} } # this function provide functionality to generate a ramdom string based on current system timestamp. # input parameter: # N/A # result: ramdom string based on current system timestamp. function generate_datatime_random() { # date=$(date -d -0days +%Y%m%d) # #随机数以时间戳纳秒用于防止目录冲突 # randnum=$(date +%s%N) echo $(date -d -0days +%Y%m%d)_$(date +%s%N) } #replace the specifed string to the target string in the provided file # $1: the absolute path of the file to be replaced. # $2: the source_string for the replacement. # $3: the target string for the replacement. # result: none function func_repace_content() { if [ $# != 3 ]; then echo "missing parameter, type in like: repaceContent /opt/a.sql @name ‘lenmom‘" exit -1 fi echo "begin replacement" file_path=$1 #be careful of regex expression. source_content=$2 replace_content=$3 if [ ! -f $file_path ]; then echo "file : " $file_path " to be replaced does not exist" exit -1 fi echo "repalce all ["${source_content} "] in file: "${file_path} " to [" ${replace_content}"]" sed -i "s/${source_content}/${replace_content}/g" $file_path } # this function provide(s) functionality to execute doris sql script file # Input parameters: # $1: the absolute path of the .sql file to be executed. # other paramer(s) are optional, of provided, it‘s the parameters paire to pass in the script file before execution. # result: 0, if execute success; otherwise, -1. function func_execute_doris_sql_script() { echo "imput parameters: "$@ parameter_number=$# if [ $parameter_number -lt 1 ]; then echo "missing parameter, must contain the script file to be executed. other parameters are optional,such as" echo "func_execute_doris_sql_script /opt/a.sql @name ‘lenmom‘" exit -1 fi # copy the file to be executed and wait for parameter replacement. working_dir="$( cd $(dirname $0) pwd )" file_to_execute=$(func_copy_file "${working_dir}" "$1") if [ $? != 0 ]; then exit -1 fi if [ $parameter_number -gt 1 ]; then for ((i = 2; i <= $parameter_number; i += 2)); do case $i in 2) func_repace_content "$file_to_execute" "$2" "$3" ;; 4) func_repace_content "$file_to_execute" "$4" "$5" ;; 6) func_repace_content "$file_to_execute" "$6" "$7" ;; 8) func_repace_content "$file_to_execute" "$8" "$9" ;; esac done fi if [ $? != 0 ]; then exit -1 fi echo "begin to execute script in doris, the content is:" cat $file_to_execute echo MYSQL_HOME="$MYSQL_HOME_BIN/mysql" if [ ! -f $MYSQL_HOME ]; then # `which is {app_name}` return code is 1, so we should ignore it. MYSQL_HOME=$(which is mysql) # print mysql location in order to override the globle shell return code to 0 ($?) echo "mysql location is: "$MYSQL_HOME fi $MYSQL_HOME -h $DORIS_HOST -P $DORIS_PORT -u$DORIS_USER_NAME -p$DORIS_PASSWORD <"$file_to_execute" if [ $? != 0 ]; then rm -f $file_to_execute echo execute failed exit -1 else rm -f $file_to_execute echo execute success exit 0 fi } # this function provide(s) functionality to load data into doris by execute the specified load sql script file. # Input parameters: # $1: the absolute path of the .sql file to be executed.ll # $2: the label holder to be replaced. # result: 0, if execute success; otherwise, -1. function doris_load_data() { if [ $# -lt 2 ]; then echo "missing parameter, type in like: doris_load_data /opt/a.sql label_place_holder" exit -1 fi if [ ! -f $1 ]; then echo "file : " $1 " to execute does not exist" exit -1 fi func_execute_doris_sql_script $@ $(generate_datatime_random) }
sql script file wich name load_user_label_from_hdfs.sql
LOAD LABEL user_label.fct_usr_label_label_place_holder ( DATA INFILE("hdfs://nameservice1/user/hive/warehouse/usr_label.db/usr_label/*") INTO TABLE fct_usr_label COLUMNS TERMINATED BY "\\x01" FORMAT AS "parquet" (member_id ,mobile ,corp ,province ,channel_name ,new_usr_type ,gender ,age_type ,last_login_type) ) WITH BROKER ‘doris-hadoop‘ ( "dfs.nameservices"="nameservice1", "dfs.ha.namenodes.nameservice1"="namenodexxx,namenodexxx1", "dfs.namenode.rpc-address.nameservice1.namenodexxx"="hadoop-datanode06:8020", "dfs.namenode.rpc-address.nameservice1.namenodexxx1"="hadoop-namenode01:8020", "dfs.client.failover.proxy.provider"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" ) PROPERTIES ( "timeout"="3600", "max_filter_ratio"="0");
in this file, the load label has a place holder named label_place_holder of which the value should be passed in by the invoking shell file.
shell file:
user_label_load.sh
#!/bin/bash CURRENT_DIR=$(cd `dirname $0`; pwd) echo "CURRENT_DIR:"${CURRENT_DIR} APPLICATION_ROOT_DIR=$(cd ${CURRENT_DIR}/..;pwd) echo "APPLICATION_ROOT_DIR:"${APPLICATION_ROOT_DIR} source ${APPLICATION_ROOT_DIR}/globle_config.sh
#load doris data by calling common shell function doris_load_data $CURRENT_DIR/load_user_label_from_hdfs.sql "label_place_holder"
or we can also call using function like:
func_execute_doris_sql_script "label_place_holder" $(generate_datatime_random)
if you have mutiple parameter to pass in , just use
func_execute_doris_sql_script {full_path_of_sql_file} "{parameter0_name}" "{parameter0_value}" "{parameter1_name}" "{parameter1_value}" "{parameter2_name}" "{parameter2_value}" ......
in the shell terminal, just execute the shell file would be fine.
sh user_label_load.sh
the shell file include the passed in parameters for invoking the sql script in doris.
query the load result in doris:
doris: shell invoke .sql script for doris and passing values for parameters in sql script.
标签:this load sts nta lin led variables diff fine