update test case.

This commit is contained in:
shibing624 2022-03-12 15:19:14 +08:00
parent 8680b75b2a
commit c15ce6e124
2 changed files with 36 additions and 6 deletions

View File

@ -26,11 +26,39 @@ similarities相似度计算、语义匹配搜索工具包。
# Feature
### 文本相似度比较方法
### 文本相似度计算
- 余弦相似Cosine Similarity两向量求余弦
- 点积Dot Product两向量归一化后求内积
- [RankBM25](similarities/literalsim.py)BM25的变种算法对query和文档之间的相似度打分得到docs的rank排序
- 汉明距离Hamming Distance编辑距离Levenshtein Distance欧氏距离Euclidean Distance曼哈顿距离Manhattan Distance
#### 语义模型
- BERT模型文本向量表征
- SentenceBERT文本匹配模型
- CoSENT文本匹配模型
#### 字面模型
- Word2Vec文本浅层语义表征
- 同义词词林
- 知网Hownet义原匹配
- BM25、RankBM25
- TFIDF
- SimHash
### 图像相似度计算
#### 语义模型
- [CLIP(Contrastive Language-Image Pre-Training)](similarities/imagesim.py)
- VGG(doing)
- ResNet(doing)
#### 特征提取
- pHash, dHash, wHash, aHash
- SIFT, Scale Invariant Feature Transform(SIFT)
- SURF, Speeded Up Robust Features(SURF)(doing)
### 图文相似度计算
- [CLIP(Contrastive Language-Image Pre-Training)](similarities/imagesim.py)
### 匹配搜索
- [SemanticSearch](https://github.com/shibing624/similarities/blob/main/similarities/similarity.py#L99)向量相似检索使用Cosine
Similarty + topk高效计算比一对一暴力计算快一个数量级
@ -364,3 +392,4 @@ version = {1.0.1}
- [shibing624/text2vec](https://github.com/shibing624/text2vec)
- [qwertyforce/image_search](https://github.com/qwertyforce/image_search)
- [ImageHash - Official Github repository](https://github.com/JohannesBuchner/imagehash)
- [openai/CLIP](https://github.com/openai/CLIP)

View File

@ -144,17 +144,18 @@ class QPSSimTestCase(unittest.TestCase):
b = sents2[:100]
r = m.similarity(a, b)
for i in range(len(a)):
print(r[i][i], labels[i])
print(r[i], labels[i])
spend_time = time() - t1
print('[sim] spend time:', spend_time, ' seconds, count:', len(a), ', qps:', len(a) / spend_time)
m.add_corpus(sents2)
t1 = time()
size = 100
size = 20
r = m.most_similar(sents1[:size], topn=5)
# print(r)
spend_time = time() - t1
print('[search] spend time:', spend_time, ' seconds, count:', size, ', qps:', size / spend_time)
self.assertTrue(len(r) > 0)
if __name__ == '__main__':
unittest.main()