时间:2021-07-01 10:21:17 帮助过:5人阅读
Now we have to decide what kind of function we want. There are essentially two choices to be made:
is the function an aggregate function or not? (we will learn more about aggregate functions later)
which type of return value should the function return? Here we have 4 options:
Type |
Description |
---|---|
STRING |
A string literal. Translates to char* in C |
INTEGER |
A normal integer. Translates to a 64 bit integer in C |
REAL |
A floating point number. Translates to double in C |
DECIAML |
This one isn‘t really finished at this time. MySQL treats it as STRING |
Let‘s talk about non-aggregate functions first. Now we have to declare and implement some functions the MySQL server needs to use our UDF. But first some structs we‘ll need for that:
UDF_INIT:
Type |
Name |
Description |
---|---|---|
my_bool |
maybe_null |
1 if function can return NULL |
unsigned int |
decimals |
for REAL functions |
unsigned long |
max_length |
For string functions |
char * |
ptr |
free pointer for function data |
my_bool |
const_item |
0 if result is independent of argument |
UDF_ARGS:
Type |
Name |
Description |
---|---|---|
unsigned int |
arg_count |
Number of argument |
enum Item_result * |
arg_type |
Array containing the types of the arguments |
char ** |
args |
Array of pointer to the arguments |
unsigned long * |
lengths |
Array of the argument‘s lengths (only needed for strings) |
char * |
maybe_null |
Array of "maybe_null" flags (1 if argument maybe null) |
char ** |
attributes |
Array of pointers to the arguments‘ attributes (see chapter x for details) |
unsigned long * |
attribute_lengths |
Array of attributes lengths |
now let‘s take a look at the functions:
De-/Initialization:
extern "C" my_bool MyTest_init(UDF_INIT *initid, UDF_ARGS *args, char *message) { // The most important thing to do here is setting up the memory // you need... // Lets say we need a lonlong type variable to keep a checksum // Although we do not need one in this case longlong* i = new longlong; // create the variable *i = 0; // set it to a value // store it as a char pointer in the pointer variable // Make sure that you don`t run in typecasting troubles later!! initid->ptr = (char*)i; // check the arguments format if (args->arg_count != 1) { strcpy(message,"MyTest() requires one arguments"); return 1; } if (args->arg_type[0] != INT_RESULT) { strcpy(message,"MyTest() requires an integer"); return 1; } return 0; } extern "C" void MyTest_deinit(UDF_INIT *initid) { // Here you have to free the memory you allocated in the // initialization function delete (longlong*)initid->ptr; }
The actual function:
extern "C" longlong MyTest(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) { /* So finally this is the part were we do the real work. This function is called for every record and the current value(s) or better pointers to the current values are stroed in the UDF_ARGS variable. We have to get the values, do our calculation and return the result. NOTE: You can access the memory allocated in MyTest_init through the UDF_INIT variable. In this example we will just add 5 to every value...*/ return *((longlong*)args->args[0])+5; }
All done! Now we have to compile the library and copy it to a directory where our OS can find it. On Windows that would be anywhere the PATH System variable says. Personally I use the MySQL servers bin directory. You have to make sure that the library is in one of those directories otherwise MySQL can‘t use it! And also make sure to export all the functions MySQL needs!
And at last we have to tell MySQL about it. This is really straightforward: Just execute the following SQL command:
CREATE [AGGREGATE] FUNCTION MyTest RETURNS [INTEGER|STRING|REAL|DECIMAL] SONAME the_libraries_exact_nameNow you can use it like any other function.
Now some words to aggregate functions. When your UDF is an aggregate function you have to add some more functions and some functions are used in a different way. The calling sequence is:
MyTest_init
to allocate memory (just like a normal UDF)MyTest_clear
for the first row in each groupMyTest_add
for each row that belongs to the same groupMyTest
to get the result when the group changes or the last row has been processedMyTest_deinit
to free any used memoryNow let‘s look at the new functions needed for the aggregate function. In this example we‘ll simply add up all the values. (like the native SUM
function)
void MyTest_clear(UDF_INIT *initid, char *is_null, char *error) { /* The clear function resets the sum to 0 for each new group Of course you have to allocate a longlong variable in the init function and assign it to the pointer as seen above */ *((longlong*)initid->ptr) = 0; } void MyTest_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) { // For each row the current value is added to the sum *((longlong*)initid->ptr) = *((longlong*)initid->ptr) + *((longlong*)args->args[0]); } longlong MyTest(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) { // And in the end the sum is returned return *((longlong*)initid->ptr); }
Here are some things you should know when you write more complex UDFs:
A string function should return a pointer to the result and set *result
and *length
to the contents and length of the return value. For example:
memcpy(result, "result string", 13); *length = 13;
The result buffer that is passed to the MyTest
function is 255 bytes long. If your result fits in this, you don‘t have to worry about memory allocation for results.
If your string function needs to return a string longer than 255 bytes, you must allocate the space for it with malloc
or new in your MyTest_init
function or your MyTest
function and free it in yourMyTest_deinit
function. You can store the allocated memory in the ptr slot in the UDF_INIT
structure for reuse by future MyTest
calls.
To indicate an error return in the main function, set *error to 1: If MyTest() sets *error to 1 for any row, the function value is NULL for the current row and for any subsequent rows processed by the statement in which MyTest() was invoked.
For more information see the MySQL Online Manual
Here are some guidelines that you should follow if you want to make sure your UDF runs smoothly
Do not call any other applications or processes inside an UDF!
Do not store any information locally! (This goes for shared libraries in general)
Do not allocate any global or static variables!
Always check the type of your arguments. As you can see MySQL converts everything to char pointers. This can lead to major troubles if you convert a string literal to a integer pointer and so on.
Be extra careful with the memory allocation! If you have memory leaks you can bring down the sever in no time.
Debugging an UDF can be pretty nerve wracking because every time your UDF crashes it takes down the whole MySQL server along with it. So I wrote a little command line tool to work around that problem. Just execute it after compilation and it does the rest. Meaning, it emulates a call to the function by calling an "SELECT
" command and then passing the results to the library and printing out the result on the command line . So when the UDF produces some serious errors only the little helper goes down and not the whole server. It is still in "beta" so don‘t expect to much....
MySQL Online Manual
摘自:研究如何编写和使用MySQL UDF 2008/06/30
MySQL的UDF(User Defined Function)类似于一种API, 用户根据一定的规范用C/C++(或采用C调用规范的语言)编写一组函数(UDF),然后编译成动态链接库,通过CREATE FUNCTION和DROP FUNCTION语句来加载和卸载UDF。UDF被加载后可以像调用MySQL的内置函数一样来调用它,并且服务器在启动时会自动加载原来存在的UDF。
CREATE [AGGREGATE] FUNCTION function_name RETURNS {STRING|INTEGER|REAL|DECIMAL} SONAME shared_library_name
function_name 为函数名 ,RETURNS子句表明UDF的返回值类型,目前DECIMAL和STRING类型是等价的 。
shared_library_name是UDF所在的动态链接库的名称(不含路径),该文件需放在可被系统搜索到的目录中。
在MySQL 5.1中,动态链接库需要放在由plugin_dir变量所指定的插件目录中。
DROP FUNCTION function_name
对于每个UDF,需要编写若干相关C/C++函数,下面用XXX()代表在SQL语句中调用UDF,xxx()代表相应的C/C++函数调用,实际使用时,两者应相同。
SQL类型 | C/C++类型 |
---|---|
STRING/DECIMAL | char * |
INTEGER | long long |
REAL | double |
对于聚合UDF, 还需要另外两个函数。
所以函数必须是线程安全的。可以参考《MySQL参考手册》第30章第3节以获得更详细的说明。此外,在MySQL源代码包中有编写UDF的示例,位置为sql/udf_example.c.
当SQL语句调用XXX()时,服务器首先调用xxx_init()进行参数检查或内存分配,若xxx_init()出错,则SQL语句被终止,主函数和清理函数将不会被调用。然后,服务器对每一行记录调用一次主函数,最后清理函数xxx_deinit()被调用。
用类似的命令来编译UDF:
shell> gcc -shared -o udf_example.so udf_example.c
将经过编译得到的动态链接库(.so文件)放到系统可以找到并加载该文件的目录中。然后通过CREATE FUNCTION语句让服务器加载动态链接库中的函数。
udf_test.c
#include <mysql.h> #include <time.h> #include <string.h> my_bool datetime_now_init( UDF_INIT *initid, UDF_ARGS *args, char *message ) { if ( args->arg_count ) { strcpy( message, "datetime_now() does not need any argument." ); return(1); } return(0); } char *datetime_now( UDF_INIT *initid __attribute__( (unused) ), UDF_ARGS *args __attribute__( (unused) ), char *result, unsigned long *length, char *is_null, char *error __attribute__( (unused) ) ) { const time_t now_time = time( NULL ); strftime( result, 255, "%Y/%m/%d %I:%M:%S %p %Z %z", localtime( &now_time ) ); *length = strlen( result ); return(result); }
编译
shell> gcc -fPIC -Wall -O3 -I/usr/local/mysql/include -shared -o udf_test.so udf_test.c
安装
mysql> CREATE FUNCTION datetime_now RETURNS STRING SONAME ‘udf_test.so‘; Query OK, 0 rows affected (0.00 sec)
使用
mysql> SELECT datetime_now(); +----------------------------------+ | datetime_now() | +----------------------------------+ | 2008/06/27 05:27:03 PM CST +0800 | +----------------------------------+ 1 row in set (0.00 sec)
存储过程中的函数是一组SQL语句的集合,其能力受限于SQL的表达能力,它还要受限于MySQL自身的一些限制。它主要用于较高层次的数据处理。
UDF的限制相对来说要小很多,与编写服务器代码时的限制相同,只要符合相应的约定即可。它的定位偏向于底层,主要用于提供一些基础的数据处理功能。另外要注意的是,UDF一旦出错,会导致服务器崩溃!
红黑联盟上文章
UDF顾名思义,就是User defined Function,用户定义函数。我们知道,MySQL本身支持很多内建的函数,此外还可以通过创建存储方法来定义函数。UDF为用户提供了一种更高效的方式来创建函数。
UDF与普通函数类似,有参数,也有输出。分为两种类型:单次调用型和聚集函数。前者能够针对每一行数据进行处理,后者则用于处理Group By这样的情况。
既然MySQL本身提供了大量的函数,并且也支持定义函数,为什么我们还需要UDF呢?这主要基于以下几点:
当然UDF也是有缺点的,这是因为UDF也处于mysqld的内存空间中,不谨慎的内存使用很容易导致mysqld crash掉。
UDF的API包括
为了便于理解,这里给出两种UDF类型的API调用图:
下面将就上述几个API进行详细的讲解:
1). name_init
原型:my_boolname_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
UDF_INIT结构体如下:
字段 |
类型 |
描述 |
maybe_null |
my_bool |
如果为1表示该UDF可以返回NULL |
decimals |
unsigned int |
返回值为实数时,表示精度,范围0~30 |
max_length |
unsigned long |
对于返回值为INTEGER类型值为21,对于REAL类型值为17,对于字符串类型,存储函数最长参数的长度 |
ptr |
char* |
额外的指针,我们可以在这里分配内存。通过initd传递给其他API |
const_item |
my_bool |
为1表示函数总是返回相同的值 |
extension |
void* |
用于扩展? |
UDF_ARGS结构体如下:
字段 |
类型 |
描述 |
arg_count |
unsigned int |
参数个数 |
arg_type |
enum Item_result* |
参数类型数组,记录每一个参数的类型,可以是STRING_RESULT、REAL_RESULT、INT_RESULT以及DECIMAL_RESULT |
args |
char ** |
同样是一个数组,用于存储实际数据。 STRING_RESULT与DECIMAL_RESULT类型为char*,INT_RESULT类型为long long*,REAL_RESULT类型为double*,或者一个NULL指针 |
lengths |
unsigned long* |
数组,用于存储每一个参数的长度 |
maybe_null |
char * |
该数组用于表明每个参数是否可以为NULL,例如 |
attributes |
char ** |
每个参数的名字 |
attribute_lengths |
unsigned long* |
每个参数名字的长度 |
extension |
void* |
用于扩展? |
Message:用于打印错误信息,该指针本身提供长度为MYSQL_ERRMSG_SIZE,来存储信息;
2).name_deinit
原型:void name_deinit(UDF_INIT*initid)
该函数会进行一些内存释放和清理的工作,在之前我们提到initid->ptr,我们可以在该区域·进行内存的动态分配,这里就可以直接进行内存释放。
3).name()
原型:针对不同的返回值类型,有不同的函数原型:
返回值类型 |
函数原型 |
STRING or DECIMAL |
char *name(UDF_INIT *initid, UDF_ARGS *args, char *result, unsigned long *length, char *is_null, char *error) |
INTEGER |
long long name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) |
REAL |
double name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error); |
当返回值为STRING类型时,参数result开辟一个buffer来存储返回值,但不超过766字节,在length参数中存储了字符串的长度。
每个函数原型还包括了is_null和error参数,当*is_null被设置为1时,返回值为NULL,设置*error为1,表明发生了错误。
4).name_add()和name_clear()
原型:
void name_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null,char *error) void name_clear(UDF_INIT *initid, char *is_null, char *error)
对于聚合类型的UDF,name_addd和name_clear会被反复调用。
下面将举两个简单的例子,一个单次调用型函数,一个聚集类型函数,来描述写一个UDF的过程。
1)接受一个参数,并返回该参数的值
/* 初始化 */ my_booludf_int_init( UDF_INIT * initid, UDF_ARGS * args, char *message ) { if ( args->arg_count != 1 ) /* 检查参数个数 */ { strcpy( message, "udf_intexample() can onlyaccept one argument" ); return(1); } if ( args->arg_type[0] != INT_RESULT ) /* 检查参数类型 */ { strcpy( message, "udf_intexample() argumenthas to be an integer" ); return(1); } return(0); } /* 清理操作 */ voidudf_int_deinit( UDF_INIT * initid ) { } /* 主函数 */ long long udf_int( UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error ) { long long num = (*(long long *) args->args[0]); /* 获取第一个参数值 */ return(num); }
2)接受一个浮点数类型的参数,并对每个分组进行求和
/* 初始化 */ my_booludf_floatsum_init( UDF_INIT * initid, UDF_ARGS * args, char *message ) { double *total = (double *) malloc( sizeof(double) ); if ( total == NULL ) /* 内存分配失败 */ { strcpy( message, "udf_floatsum:alloc mem failed!" ); return(1); } *total = 0; initid->ptr = (char *) total; if ( args->arg_count != 1 ) /* 检查参数个数 */ { strcpy( message, "too moreargs,only one!" ); return(1); } if ( args->arg_type[0] != REAL_RESULT ) /* 检查参数类型 */ { strcpy( message, "wrongtype" ); return(1); } initid->decimals = 3; /* 设置返回值精度 */ return(0); } /* 清理、释放在init函数中分配的内存 */ voidudf_floatsum_deinit( UDF_INIT * initid ) { free( initid->ptr ); } /* 每一行都会调用到该函数 */ voidudf_floatsum_add( UDF_INIT * initid, UDF_ARGS * args, char *is_null, char *error ) { double* float_total; float_total = (double *) initid->ptr; if ( args->args[0] ) *float_total += *(double *) args->args[0]; } /* 每个分组完成后,返回结果 */ doubleudf_floatsum( UDF_INIT * initid, UDF_ARGS * args, char *is_null, char *error ) { double* float_total; float_total = (double *) initid->ptr; return(*float_total); } /* 在进行下一个分组前调用,设置initid->ptr指向的值为0,以便下一次分组统计 */ voidudf_floatsum_clear( UDF_INIT * initid, char *is_null, char *error ) { double *float_total; float_total = (double *) initid->ptr; *float_total = 0; }
3) Mysql-udf-http是一个开源的UDF,可以利用HTTP协议进行REST操作。什么是REST操作呢?REST是一种web service架构风格,其实现基于HTTP协议的四种方法:POST、GET、PUT以及DELETE操作,在mysql-udf-http里分别对应的函数是http_post、http_get()、http_put()、http_delete()。
源码下载:http://curl.haxx.se/download/curl-7.21.1.tar.gz
./configure–prefix={mysql安装目录} –with-mysql=/usr/local/webserver/mysql/bin/mysql_config
Make&& make install
该UDF的实现原理比较简单,主要使用libcurl库函数来实现http协议通信,总共三百多行代码。这里有使用和介绍http://blog.s135.com/mysql-udf-http/
有些比较有趣的功能:
例如,我们可以通过GET方法获取微博中的个人信息,其中1821798401为用户ID
selecthttp_get(‘http://api.t.sina.com.cn/statuses/user_timeline/1821798401.json?count=1&source=1561596835‘)
UDF具有非常高的自由度,你可以编写你任何想要实现的功能函数,甚至可以引用MySQL内核的代码和变量。
当然,UDF也有着局限性,如下:
a) 在mysql库下必须有func表,并且在??skip?grant?tables开启的情况下,UDF会被禁止;
b) 当UDF挂掉时,有可能会导致mysqld crash掉;
c) 所有的UDF的函数必须是线程安全的,如果非要用全局变量,需要加互斥,尽量在name_init中分配资源,并在name_deinit中释放
d) 需要有insert权限
其他:
java mysql自定义函数UDF之调用c函数
标签: