image_text_search_cn

图像&文本的跨模态相似性比对检索 SDK【中文】

背景介绍

OpenAI 发布了两个新的神经网络：CLIP 和 DALL·E。它们将 NLP（自然语言识别）与图像识别结合在一起，对日常生活中的图像和语言有了更好的理解。
之前都是用文字搜文字，图片搜图片，现在通过CLIP这个模型，可是实现文字搜图片，图片搜文字。其实现思路就是将图片跟文本映射到同一个向量空间。如此，就可以实现图片跟文本的跨模态相似性比对检索。

特征向量空间（由图片 & 文本组成）

SDK功能：

图像&文本特征向量提取
相似度计算
softmax计算置信度

运行例子

运行成功后，命令行应该看到下面的信息:


x
...
# 测试文本：
[INFO ] - texts: [在雪地里有两条狗, 一只猫在桌子上, 夜晚的伦敦]


# 测试图片：


xxxxxxxxxx
# 向量维度：
[INFO ] - Vector dimension: 512

# 生成图片向量：
[INFO ] - image embeddings: [0.22221693, 0.16178696, ..., -0.06122274, 0.13340257]

# 中文分词 & 翻译（取Top5）：
[INFO ] - Tokens : [在, 雪地, 里, 有, 两条, 狗]
[INFO ] - 在雪地里有两条狗: 
[ There are two dogs in the snow,  In the snow there are two dogs,  There were two dogs in the snow,  There are two dogs in the snow.,  There are two dogs in the snow@@ .@@ ()]

[INFO ] - Tokens : [一只, 猫, 在, 桌子, 上]
[INFO ] - 一只猫在桌子上: 
[ A cat is on the table,  A cat is on the desk,  A cat is on the desk.,  A cat is on the table@@ .@@ 3,  A cat is on the table@@ .@@ 7@@ 16]

[INFO ] - Tokens : [夜晚, 的, 伦敦]
[INFO ] - 夜晚的伦敦: 
[ Night in London,  London at night,  Even@@ ing London,  Late at night in London,  Late in London]


# 生成文本向量(取翻译的第一条生成向量) & 计算相似度：
[INFO ] - text [在雪地里有两条狗] embeddings: [0.111746386, 0.08818339, ..., -0.15732905, -0.54234475]
[INFO ] - Similarity: 28.510675%

[INFO ] - text [一只猫在桌子上] embeddings: [0.08841644, 0.043696217, ..., -0.16612083, -0.11383227]
[INFO ] - Similarity: 12.206457%

[INFO ] - text [夜晚的伦敦] embeddings: [-0.038869947, 0.003223464, ..., -0.177596, 0.114676386]
[INFO ] - Similarity: 14.038936%


#softmax 置信度计算：
[INFO ] - texts: [在雪地里有两条狗, 一只猫在桌子上, 夜晚的伦敦]
[INFO ] - probs: [0.9956493, 0.0019198752, 0.0024309014]

SDK代码下载地址：

Github链接

Gitee链接

点击返回网站首页