当前位置：Gxlcms > mysql > Mysql源码学习――没那么简单的Hash_MySQL

Mysql源码学习――没那么简单的Hash_MySQL

时间：2021-07-01 10:21:17 帮助过：17人阅读

bitsCN.com

Hash链表的应用比较常见，其目的就是为了将不同的值映射到不同的位置，查找的时候直接找到相应的位置，而不需要传统的顺序遍历或是二分查找，从而达到减少查询时间的目的。常规的hash是预定义一定的桶(bucket)，规定一个hash函数，然后进行散列。然而Mysql中的hash没有固定的bucket，hash函数也是动态变化的，本文就进行非深入介绍。

基本结构体

Hash的结构体定义以及相关的函数接口定义在include/hash.h和mysys/hash.c两个文件中。下面是HASH结构体的定义。

typedef struct st_hash {

size_t key_offset,key_length; /* Length of key if const length */

size_t blength;

ulong records;

uint flags;

DYNAMIC_ARRAY array; /* Place for hash_keys */

my_hash_get_key get_key;

void (*free)(void *);

CHARSET_INFO *charset;

} HASH;

成员名说明 key_offset hash时key的offset，在不指定hash函数的情况下有意义 key_length key的长度，用于计算key值 blength 非常重要的辅助结构，初始为1，动态变化，用于hash函数计算，这里理解为bucket length（其实不是真实的bucket数） records 实际的记录数 flags 是否允许存在相同的元素，取值为HASH_UNIQUE(1)或者0 array 存储元素的数组 get_key 用户定义的hash函数，可以为NULL free 析构函数，可以为NULL charset 字符集

HASH结构体里面包含了一个动态数组结构体DYNAMIC_ARRAY，这里就一并介绍了。其定义在include/my_sys.h中。

typedef struct st_dynamic_array

{

uchar *buffer;

uint elements,max_element;

uint alloc_increment;

uint size_of_element;

} DYNAMIC_ARRAY;

成员名说明 buffer 一块连续的地址空间，用于存储数据，可以看成一个数组空间 elements 元素个数 max_element 元素个数上限 alloc_increment 当元素达到上限时，即buffer满时，按照alloc_increment进行扩展 size_of_element 每个元素的长度

初始化函数

Hash初始化函数对外提供两个，my_hash_init和my_hash_init2，其区别即是否定义了growth_size（用于设置DYNAMIC_ARRAY的alloc_increment）。代码在mysys/hash.c中。

#define my_hash_init(A,B,C,D,E,F,G,H) /

_my_hash_init(A,0,B,C,D,E,F,G,H)

#define my_hash_init2(A,B,C,D,E,F,G,H,I) /

_my_hash_init(A,B,C,D,E,F,G,H,I)

/**

@brief Initialize the hash

@details

Initialize the hash, by defining and giving valid values for

its elements. The failure to allocate memory for the

hash->array element will not result in a fatal failure. The

dynamic array that is part of the hash will allocate memory

as required during insertion.

@param[in,out] hash The hash that is initialized

@param[in] charset The charater set information

@param[in] size The hash size

@param[in] key_offest The key offset for the hash

@param[in] key_length The length of the key used in

the hash

@param[in] get_key get the key for the hash

@param[in] free_element pointer to the function that

does cleanup

@return inidicates success or failure of initialization

@retval 0 success

@retval 1 failure

my_bool

_my_hash_init(HASH *hash, uint growth_size, CHARSET_INFO *charset,

ulong size, size_t key_offset, size_t key_length,

my_hash_get_key get_key,

void (*free_element)(void*), uint flags)

{

DBUG_ENTER("my_hash_init");

DBUG_PRINT("enter",("hash: 0x%lx size: %u", (long) hash, (uint) size));

hash->records=0;

hash->key_offset=key_offset;

hash->key_length=key_length;

hash->blength=1;

hash->get_key=get_key;

hash->free=free_element;

hash->flags=flags;

hash->charset=charset;

DBUG_RETURN(my_init_dynamic_array_ci(&hash->array,

sizeof(HASH_LINK), size, growth_size));

}

可以看到，_my_hash_init函数主要是初始化HASH结构体和hash->array（DYNAMIC_ARRAY结构体）。

动态HASH函数

我们首先来看下hash函数的定义：

static inline char*

my_hash_key(const HASH *hash, const uchar *record, size_t *length,

my_bool first)

{

if (hash->get_key)

return (char*) (*hash->get_key)(record,length,first);

*length=hash->key_length;

return (char*) record+hash->key_offset;

}

static uint my_hash_mask(my_hash_value_type hashnr, size_t buffmax,

size_t maxlength)

{

if ((hashnr & (buffmax-1)) < maxlength) return (hashnr & (buffmax-1));

return (hashnr & ((buffmax >> 1) -1));

}

my_hash_key参数说明 hash HASH链表结构 record 带插入的元素的值 length 带插入元素的值长度 first 辅助参数

"3">

			 
															my_hash_mask参数									说明															hashnr									my_hash_key的计算结果															buffmax									hash结构体中的blength															maxlength									实际桶的个数							    你可能要问我怎么有两个？其实这和我们平时使用的差不多，第一个函数my_hash_key是根据我们的值进行Hash Key计算，一般我们在计算后，会对hash key进行一次模运算，以便计算结果在我们的bucket中。即my_hash_key的结果作为my_hash_mask的第一个输入参数。其实到这里都是非常好理解的，唯一让我蛋疼的是my_hash_mask的实现，其计算结果是和第二和第三个参数有关，即Hash结构体中的blength和records有关。动态变化的，我去..
	    
	 
	 
	看到这里我迷惑了，我上网经过各种百度，谷歌，终于让我找到了一封Mysql Expert的回信：
	 
	Hi!
	"Yan" == Yan Yu  writes:
	 
	Yan> Dear MySQL experts:
	Yan>      Thank you so much for your reply to my previous Qs, they are very 
	Yan> helpful!
	Yan>      Could someone please help me understand function my_hash_insert() 
	Yan> in mysys/hash.cc?
	Yan> what are lines 352 -429 trying to achieve?  Are they just some 
	Yan> optimization to shuffle existing
	Yan> hash entries in the table (since the existing hash entries may be in 
	Yan> the wrong slot due to chaining
	Yan> in the case of hash collision)?
	 
	The hash algorithm is based on dynamic hashing without empty slots.
	This means that when you insert a new key, in some cases a small set
	of old keys needs to be moved to other buckets.  This is what the code
	does.
	 
	Regards,
	Monty
	      红色注释的地方是重点，dynamic hash，原来如此，动态hash，第一次听说，在网上下了个《Dynamic Hash Tables》的论文，下面图解下基本原理。
	 
	
	
	 
	 
	     动态Hash的本质是Hash函数的设计，图中给出的动态hash函数只是论文中提到的一个例子。下面就具体解读下Mysql中的hash插入——my_hash_insert
	 
	my_hash_insert非深入解析
	 
	    首先给出my_hash_insert的源代码,代码在mysys/hash.c中。
	 
	my_bool my_hash_insert(HASH *info, const uchar *record)
	{
	    int flag;
	    size_t idx,halfbuff,first_index;
	    my_hash_value_type hash_nr;
	    uchar *UNINIT_VAR(ptr_to_rec),*UNINIT_VAR(ptr_to_rec2);
	    HASH_LINK *data,*empty,*UNINIT_VAR(gpos),*UNINIT_VAR(gpos2),*pos;
	 
	    if (HASH_UNIQUE & info->flags)
	    {
	        uchar *key= (uchar*) my_hash_key(info, record, &idx, 1);
	        if (my_hash_search(info, key, idx))
	            return(TRUE);               /* Duplicate entry */
	    }
	 
	    flag=0;
	    if (!(empty=(HASH_LINK*) alloc_dynamic(&info->array)))
	        return(TRUE);               /* No more memory */
	 
	    data=dynamic_element(&info->array,0,HASH_LINK*);
	    halfbuff= info->blength >> 1;
	 
	    idx=first_index=info->records-halfbuff;
	    if (idx != info->records)                /* If some records */
	    {
	        do
	        {
	            pos=data+idx;
	            hash_nr=rec_hashnr(info,pos->data);
	            if (flag == 0)              /* First loop; Check if ok */
	                if (my_hash_mask(hash_nr, info->blength, info->records) != first_index)
	                    break;
	            if (!(hash_nr & halfbuff))
	            {                       /* Key will not move */
	                if (!(flag & LOWFIND))
	                {
	                    if (flag & HIGHFIND)
	                    {
	                        flag=LOWFIND | HIGHFIND;
	                        /* key shall be moved to the current empty position */
	                        gpos=empty;
	                        ptr_to_rec=pos->data;
	                        empty=pos;              /* This place is now free */
	                    }
	                    else
	                    {
	                        flag=LOWFIND | LOWUSED;     /* key isn't changed */
	                        gpos=pos;
	                        ptr_to_rec=pos->data;
	                    }
	                }
	                else
	                {
	                    if (!(flag & LOWUSED))
	                    {
	                        /* Change link of previous LOW-key */
	                        gpos->data=ptr_to_rec;
	                        gpos->next= (uint) (pos-data);
	                        flag= (flag & HIGHFIND) | (LOWFIND | LOWUSED);
	                    }
	                    gpos=pos;
	                    ptr_to_rec=pos->data;
	                }
	            }
	            else
	            {                       /* key will be moved */
	                if (!(flag & HIGHFIND))
	                {
	                    flag= (flag & LOWFIND) | HIGHFIND;
	                    /* key shall be moved to the last (empty) position */
	                    gpos2 = empty; empty=pos;
	                    ptr_to_rec2=pos->data;
	                }
	                else
	                {
	                    if (!(flag & HIGHUSED))
	                    {
	                        /* Change link of previous hash-key and save */
	                        gpos2->data=ptr_to_rec2;
	                        gpos2->next=(uint) (pos-data);
	                        flag= (flag & LOWFIND) | (HIGHFIND | HIGHUSED);
	                    }
	                    gpos2=pos;
	                    ptr_to_rec2=pos->data;
	                }
	            }
	        }
	        while ((idx=pos->next) != NO_RECORD);
	 
	        if ((flag & (LOWFIND | LOWUSED)) == LOWFIND)
	        {
	            gpos->data=ptr_to_rec;
	            gpos->next=NO_RECORD;
	        }
	        if ((flag & (HIGHFIND | HIGHUSED)) == HIGHFIND)
	        {
	            gpos2->data=ptr_to_rec2;
	            gpos2->next=NO_RECORD;
	        }
	    }
	  /* Check if we are at the empty position */
	 
	  idx= my_hash_mask(rec_hashnr(info, record), info->blength, info->records + 1);
	  pos=data+idx;
	  if (pos == empty)
	  {
	    pos->data=(uchar*) record;
	    pos->next=NO_RECORD;
	  }
	  else
	  {
	    /* Check if more records in same hash-nr family */
	    empty[0]=pos[0];
	    gpos= data + my_hash_rec_mask(info, pos, info->blength, info->records + 1);
	    if (pos == gpos)
	    {
	      pos->data=(uchar*) record;
	      pos->next=(uint) (empty - data);
	    }
	    else
	    {
	      pos->data=(uchar*) record;
	      pos->next=NO_RECORD;
	      movelink(data,(uint) (pos-data),(uint) (gpos-data),(uint) (empty-data));
	    }
	  }
	  if (++info->records == info->blength)
	    info->blength+= info->blength;
	  return(0);
	}
	    同时给出动态hash函数如下：
	 
	static uint my_hash_mask(my_hash_value_type hashnr, size_t buffmax,
	                         size_t maxlength)
	{
	  if ((hashnr & (buffmax-1)) < maxlength) return (hashnr & (buffmax-1));
	  return (hashnr & ((buffmax >> 1) -1));
	}
	  
	   可以看出，hash函数是hash key与buffmax的模运算，buffmax即HASH结构中的blength，由my_hash_insert中最后几行代码可知：info->blength+= info->blength; 其初始值为1，即blength = 2^n，而且blengh始终是大于records。这个动态hash函数的基本意思是key%（2^n）。依然用图解这个动态hash函数。
	 
	
	    hash函数基本清楚了，但是mysql的具体实现还是比较值得探讨的。那封回信中也提到了without empty slots，是的，它这种实现方式是根据实际的数据量进行桶数的分配。我这里大概说下代码的流程（有兴趣的还需要大家自己仔细琢磨）。
			根据flag去判断是否是否唯一Hash，如果是唯一Hash，去查找Hash表中是否存在重复值(dupliacate entry)，存在则报错。
			进行桶分裂，对应代码中的if (idx != info->records)分支。这个分支有些费解，稍微提示下：gpos和ptr_to_rec指示了低位需要移动的数据，gpos2和ptr_to_rec2只是了高位需要移动的数据。LOWFIND表示低位存在值，LOWUSED表示低位是否进行了调整。HIGH的宏意义基本相同。if (!(hash_nr & halfbuff)) 用于判断hash值存在高位还是低位。
			计算新值对应的bucket号，插入。如果此位置上存在元素(一般情况都存在，除非是empty，概率比较小)，调整原始元素的位置
	摘自 心中无码
						bitsCN.com



                  

	 	
                    
                        
                            
                                < 上一篇
Mysql源码学习――打造专属语法_MySQL
                            
                            
                                 下一篇 >
mysql中varchar与integer的互转_MySQL


                
                  

                    
                        人气教程排行
                    
                    
			
                        
						
 
                                
                                    355次
                                    1
                                   对数据库模式进行规范化处理，是在数据库设计的什么阶段？                                
                            
							  								  														  
                                
                                    355次
                                    2
                                   mysql如何删除多个表格数据库数据                                
                            
								  														  								  
                                
                                    355次
                                    3
                                   Oracle购买价格和服务费计算方式                                
                            
														  								  							
                                
                                    355次
                                    4
                                   MYSQL查看和新增表分区                                
                            
							  								  							
                                
                                    354次
                                    5
                                   TRACE32调试技巧                                
                            
							  								  							
                                
                                    353次
                                    6
                                   Oracle数据库教程：ORA-01031:权限不足                                
                            
							  								  							
                                
                                    353次
                                    7
                                   Oracle物化视图失效的几种情况及测试                                
                            
							  								  							
                                
                                    352次
                                    8
                                   mysql-sql语句查询多个字段不等于零怎么写？                                
                            
							  								  							
                                
                                    351次
                                    9
                                   如何提高mysql大批量数据更新（update）的效率？                                
                            
							  								  							
                                
                                    350次
                                    10
                                   mysql如何从ibd文件恢复数据                                
                            
							  								  							
                                
                                    350次
                                    11
                                   ORACLEROLLUP和CUBE函数                                
                            
							  								  							
                                
                                    350次
                                    12
                                   IBMDB2赋权[SQL0551N]                                
                            
							  								  							
                                
                                    350次
                                    13
                                   mysql不等于符号写法                                
                            
							  								  							
                                
                                    349次
                                    14
                                   redhat8mysql安装具体过程_MySQL                                
                            
							  								  							
                                
                                    349次
                                    15
                                   SQLServer出现Error:1326错误（管理器无法连接远程数据库）问题解决方案                                
                            
							  								  							
                                
                                    348次
                                    16
                                   centos下配置mysql数据库自动备份                                
                            
							  								  							
                                
                                    348次
                                    17
                                   MySQL中使用SQL语句对字段进行重命名                                
                            
							  								  							
                                
                                    347次
                                    18
                                   数据库中的表以行和列来组织数据，每一行称为每一列称为                                
                            
							  								  							
                                
                                    347次
                                    19
                                   windowsservice-mysql安装出错，远程调用失败                                
                            
							  								  							
                                
                                    346次
                                    20
                                   SQL连接查询语法及使用