Multimodal image search
In this tutorial, we'll walk through how to use Lexy to create a multimodal search application. We'll use the CLIP model from OpenAI to create embeddings for images, and then use those embeddings to find matching images for a given text query, or vice versa.
Create collection
Let's first create a collection to store our images. We'll use the images_tutorial
collection for this tutorial.
Create index and binding
Define index
First we'll define our index to store our embedded images. We use *.embeddings.clip
as the transformer model name
to indicate that we want to use the CLIP embeddings model, but that the embedding field can use any model that matches
this pattern, including image.embeddings.clip
and text.embeddings.clip
.
# define index fields
index_fields = {
"embedding": {"type": "embedding", "extras": {"dims": 512, "model": "*.embeddings.clip"}},
}
# create index
idx = lx.create_index(
index_id='image_tutorial_index',
description='Index for images tutorial',
index_fields=index_fields
)
idx
We'll use the CLIP image embeddings transformer available on HuggingFace. This transformer uses the CLIP model from OpenAI to create embeddings for images.
The CLIP model is a transformer model that was trained on a large dataset of images and text pairs. The model learns to map images and text to a shared embedding space, where the embeddings of matching images and text are close together. We can use this model to create embeddings for images, and then use those embeddings to find matching images for a given text query, or vice versa.
[<Transformer('image.embeddings.clip', description='Embed images using 'openai/clip-vit-base-patch32'.')>,
<Transformer('text.embeddings.clip', description='Embed text using 'openai/clip-vit-base-patch32'.')>,
<Transformer('text.embeddings.minilm', description='Text embeddings using "sentence-transformers/all-MiniLM-L6-v2"')>,
<Transformer('text.embeddings.openai-3-large', description='Text embeddings using OpenAI's "text-embedding-3-large" model')>,
<Transformer('text.embeddings.openai-3-small', description='Text embeddings using OpenAI's "text-embedding-3-small" model')>,
<Transformer('text.embeddings.openai-ada-002', description='OpenAI text embeddings using model text-embedding-ada-002')>]
Create binding
We'll create a binding that will process images added to our images_tutorial
collection using the CLIP image
embeddings transformer, and store the results in image_tutorial_index
.
binding = lx.create_binding(
collection_name='images_tutorial',
transformer_id='image.embeddings.clip',
index_id='image_tutorial_index'
)
binding
<Binding(id=3, status=ON, collection='images_tutorial', transformer='image.embeddings.clip', index='image_tutorial_index')>
Upload images to the collection
Let's upload some images from the image-text-demo dataset to the collection. This dataset is from HuggingFace
datasets and requires the datasets
package to be installed.
# import test data from HuggingFace datasets - requires `pip install datasets`
from datasets import load_dataset
data = load_dataset("shabani1/image-text-demo", split="train")
# add documents to the collection
for i, row in enumerate(data, start=1):
print(i, row['text'])
lx.upload_documents(files=row['image'],
filenames=row['text'] + '.jpg',
collection_name='images_tutorial')
1 aerial shot of futuristic city with large motorway
2 aerial shot of modern city at sunrise
3 butterfly landing on the nose of a cat
4 cute kitten walking through long grass
5 fluffy dog sticking out tongue with yellow background
6 futuristic city with led lit tower blocks
7 futuristic wet city street after rain with red and blue lights
8 ginger striped cat with long whiskers laid on wooden table
9 happy dog walking through park area holding ball
10 happy ginger dog sticking out its tongue sat in front of dirt path
11 happy small fluffy white dog running across grass
12 kitten raising paw to sky with cyan background
13 modern city skyline at sunrise with pink to blue sky
14 modern neon lit city alleyway
15 new york city street view with yellow cabs
16 puppy with big ears sat with orange background
17 suburban area with city skyline in distance
18 three young dogs on dirt road
19 top down shot of black and white cat with yellow background
20 two dogs playing in the snow
21 two dogs running on dirt path
[<Document("<Image(aerial shot of futuristic city with large motorway.jpg)>")>,
<Document("<Image(aerial shot of modern city at sunrise.jpg)>")>,
<Document("<Image(butterfly landing on the nose of a cat.jpg)>")>,
<Document("<Image(cute kitten walking through long grass.jpg)>")>,
<Document("<Image(fluffy dog sticking out tongue with yellow background.jpg)>")>,
<Document("<Image(futuristic city with led lit tower blocks.jpg)>")>,
<Document("<Image(futuristic wet city street after rain with red and blue lights.jpg)>")>,
<Document("<Image(ginger striped cat with long whiskers laid on wooden table.jpg)>")>,
<Document("<Image(happy dog walking through park area holding ball.jpg)>")>,
<Document("<Image(happy ginger dog sticking out its tongue sat in front of dirt path.jpg)>")>,
<Document("<Image(happy small fluffy white dog running across grass.jpg)>")>,
<Document("<Image(kitten raising paw to sky with cyan background.jpg)>")>,
<Document("<Image(modern city skyline at sunrise with pink to blue sky.jpg)>")>,
<Document("<Image(modern neon lit city alleyway.jpg)>")>,
<Document("<Image(new york city street view with yellow cabs.jpg)>")>,
<Document("<Image(puppy with big ears sat with orange background.jpg)>")>,
<Document("<Image(suburban area with city skyline in distance.jpg)>")>,
<Document("<Image(three young dogs on dirt road.jpg)>")>,
<Document("<Image(top down shot of black and white cat with yellow background.jpg)>")>,
<Document("<Image(two dogs playing in the snow.jpg)>")>,
<Document("<Image(two dogs running on dirt path.jpg)>")>]
Query index
Let's first define some helper functions to display our image results.
import httpx
from IPython.display import display, HTML
from PIL import Image
def image_from_url(url):
response = httpx.get(url)
response.raise_for_status()
return Image.open(response)
def display_results_html(records):
html_content = ""
for r in records:
d = r['document']
thumbnail_url = d.thumbnail_url
fname = d.meta.get('filename')
score = f"score: {r['distance']:.4f}"
# Creating a row for each result with image on the left and text on the right
html_content += f"""
<div style='display: flex; align-items: center; margin-bottom: 20px; margin-top: 20px;'>
<img src='{thumbnail_url}' style='width: auto; height: auto; margin-right: 20px;'/>
<div>
<p>{fname}</p>
<p>{score}</p>
</div>
</div>
"""
# Display all results as HTML
display(HTML(html_content))
Query by text
We can query our index by text to find matching images.
two dogs playing in the snow.jpg
score: 13.4786
three young dogs on dirt road.jpg
score: 13.8796
two dogs running on dirt path.jpg
score: 13.9199
happy ginger dog sticking out its tongue sat in front of dirt path.jpg
score: 14.1915
ginger striped cat with long whiskers laid on wooden table.jpg
score: 14.2613
aerial shot of modern city at sunrise.jpg
score: 12.9919
suburban area with city skyline in distance.jpg
score: 13.0167
modern city skyline at sunrise with pink to blue sky.jpg
score: 13.0856
aerial shot of futuristic city with large motorway.jpg
score: 13.2318
futuristic wet city street after rain with red and blue lights.jpg
score: 13.2840
Query by image
We can also query our index by image to find matching images.
butterfly landing on the nose of a cat.jpg
score: 8.9913
puppy with big ears sat with orange background.jpg
score: 9.2752
fluffy dog sticking out tongue with yellow background.jpg
score: 9.3786
two dogs running on dirt path.jpg
score: 9.5351
cute kitten walking through long grass.jpg
score: 9.6472
img = image_from_url('https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Night_in_the_Greater_Tokyo_Area_ISS054.jpg/2560px-Night_in_the_Greater_Tokyo_Area_ISS054.jpg')
img
suburban area with city skyline in distance.jpg
score: 7.7127
futuristic city with led lit tower blocks.jpg
score: 8.0037
aerial shot of futuristic city with large motorway.jpg
score: 8.3442
modern city skyline at sunrise with pink to blue sky.jpg
score: 8.4371
aerial shot of modern city at sunrise.jpg
score: 8.9889
img = image_from_url('https://upload.wikimedia.org/wikipedia/commons/e/ed/Shanghai_skyline_2018%28cropped%29.jpg')
img
aerial shot of futuristic city with large motorway.jpg
score: 6.2110
futuristic city with led lit tower blocks.jpg
score: 6.7713
aerial shot of modern city at sunrise.jpg
score: 7.0736
modern city skyline at sunrise with pink to blue sky.jpg
score: 7.4314
new york city street view with yellow cabs.jpg
score: 7.8167