时间:2021-07-01 10:21:17 帮助过:13人阅读
是对聚类生成的树的描述,树上的每一个点都是node,只有叶子结点才是word,每张图进来提取特征,利用descriptor算特征距离,最终落到叶子上(单词),所有特征的单词构成该图片的词向量。
检索入口描述:
1 database: 2 nEntries: 4 3 usingDI: 0 4 diLevels: 0 5 invertedIndex: 6 - //(wordId:0) 7 - { imageId:1, weight:1.6807896319101980e-03 } 8 - { imageId:2, weight:3.2497152852064880e-03 } 9 - { imageId:3, weight:3.6665308718065778e-03 } 10 - //(wordId:1) 11 - { imageId:1, weight:4.0497295661974788e-03 } 12 - //(wordId:2) 13 [] 14 - 15 [] 16 - 17 - { imageId:2, weight:3.9149658655580431e-03 } 18 - 19 - { imageId:3, weight:4.4171079458813099e-03 } 20 - 21 - { imageId:1, weight:2.0248647830987394e-03 } 22 - { imageId:3, weight:4.4171079458813099e-03 }
检索入口:
根据voc字典里的描述,word id 2对应node id 21, 而node id 21对应的权值为0,也就是说word 2太普通了,在用来生成视觉词汇表的4张图里都出现了(参考中文文章里的“的”、“在”、“和”等常见词),不具有代表性, 于是根本就没有对应入口id,这是合理的。
开源出来的代码不是对相同word的入口进行加1投票,而是直接计算单词对应的所有EntryId分数,最后排序取前n个。分数可以有L1 L2 KL等几种计算方式
queryL1,C++不熟看了半天,用到map函数,注释:
1 void Database::queryL1(const BowVector &vec, QueryResults &ret, int max_results, int max_id) const 3 { 4 BowVector::const_iterator vit; 5 6 std::map<EntryId, double> pairs; 7 std::map<EntryId, double>::iterator pit; 8 9 for(vit = vec.begin(); vit != vec.end(); ++vit) 10 { 11 const WordId word_id = vit->first; 12 const WordValue& qvalue = vit->second; 13 14 const IFRow& row = m_ifile[word_id]; 15 16 // IFRows are sorted in ascending entry_id order 18 for(auto rit = row.begin(); rit != row.end(); ++rit) 19 { 20 const EntryId entry_id = rit->entry_id; 21 const WordValue& dvalue = rit->word_weight; 22 23 if((int)entry_id < max_id || max_id == -1) 24 { 25 double value = fabs(qvalue - dvalue) - fabs(qvalue) - fabs(dvalue); 26 27 pit = pairs.lower_bound(entry_id); 28 if(pit != pairs.end() && !(pairs.key_comp()(entry_id, pit->first))) 29 { 30 pit->second += value; //如果已经有entry_id,累加和 31 } 32 else 33 { //如果没有,插入此id 34 pairs.insert(pit, std::map<EntryId, double>::value_type(entry_id, value)); 35 } 36 } 38 } // for each inverted row 39 } // for each query word 40 41 // move to vector 42 ret.reserve(pairs.size()); 43 for(pit = pairs.begin(); pit != pairs.end(); ++pit) 44 { 45 ret.push_back(Result(pit->first, pit->second)); 46 } 47 48 // resulting "scores" are now in [-2 best .. 0 worst] 50 // sort vector in ascending order of score 51 std::sort(ret.begin(), ret.end()); 52 // (ret is inverted now --the lower the better--) 54 // cut vector 55 if(max_results > 0 && (int)ret.size() > max_results) 56 ret.resize(max_results); 57 58 // complete and scale score to [0 worst .. 1 best] 59 // ||v - w||_{L1} = 2 + Sum(|v_i - w_i| - |v_i| - |w_i|) 60 // for all i | v_i != 0 and w_i != 062 // scaled_||v - w||_{L1} = 1 - 0.5 * ||v - w||_{L1} 63 QueryResults::iterator qit; 64 for(qit = ret.begin(); qit != ret.end(); qit++) 65 qit->Score = -qit->Score/2.0; 66 }
开源词袋模型DBow3原理&源码
标签:gty ons 忽略 数据 and key list 函数 div